Make the rexi:stream2 interface unacked message limit configurable#2360
Conversation
|
FWIW 1 of the 12 nodes of this cluster was down for maintenance during testing. |
src/rexi/src/rexi.erl
Outdated
| stream2(Msg) -> | ||
| stream2(Msg, 10, 300000). | ||
| Limit = config:get_integer("rexi", "stream_limit", 10), | ||
| stream2(Msg, Limit, 300000). |
There was a problem hiding this comment.
Minor nit, but it might be prudent to go stream/1 --> stream/2 --> stream/3 rather than stream/1 --> stream/3 on the off chance the timeout param in stream/2 is converted from a hardcoded value to a config lookup like you did here (we also shouldn't be duplicating magic numbers).
There was a problem hiding this comment.
I see what you mean and I agree on not repeating constants.
87020f7 to
2aa2a01
Compare
|
I've been performing additional benchmarks, focussed on replication recently. I was not able to demonstrate a replication performance gain due to e.g. The stream limit 5 and 10 replication time results were within error of each other for q=192 and 1M docs. I believe this indicates replication's dominant bottleneck is not rexi's streaming performance. |
|
Alright, @wenli200133 ran some partitioned query benchmarks for this recently, here is what she found:
For higher Q values, there was no statistically significant variation in performance with the stream limit. Note the large instability at 100 and 1000 for q=1. Again this was for 1000 keys queried using startkey and limit. |
|
@ksnavely what does that mean for |
|
@wohali for q=1 I don't believe we're going to see a performance change given an e.g. 10 -> 5 unacked message limit for Based on testing I would not recommend increasing the limit -- the q=1 case is precisely where Li saw instability at a stream limit of 100, 1000. |
|
This is good data, especially because it's a surprising result! (Our hypothesis was that increasing the Given that |
|
+1 to update the default to 5, but I think we should leave the new configurability in case Couch users encounter unexpected behavior in production environments. |
…2360) Also lower the default stream_limit to 5 based on the results of performance testing. Co-authored-by: Adam Kocoloski <kocolosk@apache.org> Co-authored-by: Kyle Snavely <kjsnavely@gmail.com>
Overview
Experimentation with the
rexi:stream2interface's unacked message limit demonstrates possible performance gains for multiple database behaviors.The
rexi:stream2interface is used in a few different behaviors:_changes_feedstreamingEach area deserves performance benchmarking. This change introduces the stream limit configurability but leaves the default value unchanged at 10 unacked messages. Credit to Adam K. and others at Cloudant for recommending the change, I just benchmarked it.
Testing recommendations
The change itself is pretty simple. The impact on performance has more study to go but looks interesting. I performed some benchmarking for view queries but more testing is recommended before actually tuning this parameter in production.
View Query throughput
I benchmarked view queries with some infrastructure we have at Cloudant. This was against a 12 node cluster based on
24 cores, 2x (E5-2690-V3-DodecaCore)CPUs with 110 GiB of memory and 4~1TB INTEL SSDSC2BA012T4SSDs in a RAID10 array. 1000 keys are randomly requested from a 10M document database using thestartkeyandlimitquery parameters for 3 minutes.The default stream limit is 10 unacked messages. I saw interesting results when I lowered that limit to just 5.
At q=192 we see a 57% view row throughput performance increase. A sample of one iteration at
rexi.stream_limit = 10v.5demonstrates an improvement in individual request latency and not just parallelization. Latency units are microseconds.Related Issues or Pull Requests
None to my knowledge.
Checklist
rel/overlay/etc/default.ini