New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CCMSG-796: fix poor performance for lots of update requests #490
Conversation
Signed-off-by: Lev Zemlyanov <lev@confluent.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @levzem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One final question, what is the purpose of the sleep time here, would 1MS work as well? follow up comment on issue
Signed-off-by: Lev Zemlyanov <lev@confluent.io>
// wait for internal buffer to be less than max.buffered.records configuration | ||
long maxWaitTime = clock.milliseconds() + config.flushTimeoutMs(); | ||
while (numRecords.get() >= config.maxBufferedRecords()) { | ||
clock.sleep(TimeUnit.SECONDS.toMillis(1)); | ||
clock.sleep(WAIT_TIME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why this wait time was chosen? Would this value change if there were x times more requests?
I suppose perf tests will give more backing to what the value would be
Could be overkill but this made me think of setting a rate limiter on addToRequestToRecordMap
. Suppose we hit the rate limit, we increase the rate limit as well as the wait time accordingly to better performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we actually don't want to rate limit, we want to run as fast as possible and only wait if we dont have enough buffer space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see 👍🏽
Signed-off-by: Lev Zemlyanov lev@confluent.io
Problem
data sets that have frequent updates will result in slow performance because of long wait times
Solution
reduce the wait times to 10ms to fix performance bottlenecking
remove the unique constraint by relying on
BulkItemResponse::getItemId
which returns the order of the request in the bulk request, which we can use to map the request to theSinkRecord
it came fromDoes this solution apply anywhere else?
If yes, where?
Test Strategy
modified a test to upsert multiple records with the same key
existing reporter tests cover the rest of the test cases
PERF TEST RESULTS:
configs
v10.0.2 with the same key for every single record
a steady 6.7K records/s
this PR:
the first rate is a steady 32K records/s (the dip is from a rebalance) with unique keys
the second rate is a steady 6.7K records/s with identical records which is what v10.0.2 also had, therefore this PR addresses the performance regression
Testing done:
Release Plan
backporting to
11.0.x
where this was introduced