Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCMSG-796: fix poor performance for lots of update requests #490

Merged
merged 2 commits into from Feb 9, 2021

Conversation

levzem
Copy link
Contributor

@levzem levzem commented Feb 1, 2021

Signed-off-by: Lev Zemlyanov lev@confluent.io

Problem

data sets that have frequent updates will result in slow performance because of long wait times

Solution

reduce the wait times to 10ms to fix performance bottlenecking

remove the unique constraint by relying on BulkItemResponse::getItemId which returns the order of the request in the bulk request, which we can use to map the request to the SinkRecord it came from

Does this solution apply anywhere else?
  • yes
  • no
If yes, where?

Test Strategy

modified a test to upsert multiple records with the same key
existing reporter tests cover the rest of the test cases

PERF TEST RESULTS:

configs

{
  "task.max": 1,
  "linger.ms": 1000,
  "batch.size": 2000,
  "max.buffered.records": 20000,
  "max.in.flight.requests": 5,
}

v10.0.2 with the same key for every single record
Screen Shot 2021-02-08 at 4 24 40 PM

a steady 6.7K records/s

this PR:
Screen Shot 2021-02-08 at 4 02 20 PM

the first rate is a steady 32K records/s (the dip is from a rebalance) with unique keys
the second rate is a steady 6.7K records/s with identical records which is what v10.0.2 also had, therefore this PR addresses the performance regression

Testing done:
  • Unit tests
  • Integration tests
  • System tests
  • Manual tests - performance testing

Release Plan

backporting to 11.0.x where this was introduced

Signed-off-by: Lev Zemlyanov <lev@confluent.io>
Copy link
Contributor

@dosvath dosvath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @levzem

Copy link
Contributor

@dosvath dosvath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final question, what is the purpose of the sleep time here, would 1MS work as well? follow up comment on issue

Signed-off-by: Lev Zemlyanov <lev@confluent.io>
// wait for internal buffer to be less than max.buffered.records configuration
long maxWaitTime = clock.milliseconds() + config.flushTimeoutMs();
while (numRecords.get() >= config.maxBufferedRecords()) {
clock.sleep(TimeUnit.SECONDS.toMillis(1));
clock.sleep(WAIT_TIME);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why this wait time was chosen? Would this value change if there were x times more requests?
I suppose perf tests will give more backing to what the value would be

Could be overkill but this made me think of setting a rate limiter on addToRequestToRecordMap. Suppose we hit the rate limit, we increase the rate limit as well as the wait time accordingly to better performance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we actually don't want to rate limit, we want to run as fast as possible and only wait if we dont have enough buffer space

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see 👍🏽

@Prathibha-m Prathibha-m self-requested a review February 5, 2021 00:29
@levzem levzem merged commit 7a2f9ed into confluentinc:11.0.x Feb 9, 2021
@levzem levzem deleted the wait branch February 9, 2021 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants