CCMSG-796: fix poor performance for lots of update requests #490

levzem · 2021-02-01T18:24:27Z

Signed-off-by: Lev Zemlyanov lev@confluent.io

Problem

data sets that have frequent updates will result in slow performance because of long wait times

Solution

reduce the wait times to 10ms to fix performance bottlenecking

remove the unique constraint by relying on BulkItemResponse::getItemId which returns the order of the request in the bulk request, which we can use to map the request to the SinkRecord it came from

Does this solution apply anywhere else?

yes
no

If yes, where?

Test Strategy

modified a test to upsert multiple records with the same key
existing reporter tests cover the rest of the test cases

PERF TEST RESULTS:

configs

{
  "task.max": 1,
  "linger.ms": 1000,
  "batch.size": 2000,
  "max.buffered.records": 20000,
  "max.in.flight.requests": 5,
}

v10.0.2 with the same key for every single record

a steady 6.7K records/s

this PR:

the first rate is a steady 32K records/s (the dip is from a rebalance) with unique keys
the second rate is a steady 6.7K records/s with identical records which is what v10.0.2 also had, therefore this PR addresses the performance regression

Testing done:

Unit tests
Integration tests
System tests
Manual tests - performance testing

Release Plan

backporting to 11.0.x where this was introduced

Signed-off-by: Lev Zemlyanov <lev@confluent.io>

dosvath

LGTM, thanks @levzem

dosvath

One final question, what is the purpose of the sleep time here, would 1MS work as well? follow up comment on issue

Signed-off-by: Lev Zemlyanov <lev@confluent.io>

Prathibha-m · 2021-02-04T21:42:39Z

src/main/java/io/confluent/connect/elasticsearch/ElasticsearchClient.java

    // wait for internal buffer to be less than max.buffered.records configuration
    long maxWaitTime = clock.milliseconds() + config.flushTimeoutMs();
    while (numRecords.get() >= config.maxBufferedRecords()) {
-      clock.sleep(TimeUnit.SECONDS.toMillis(1));
+      clock.sleep(WAIT_TIME);


Is there a reason why this wait time was chosen? Would this value change if there were x times more requests?
I suppose perf tests will give more backing to what the value would be

Could be overkill but this made me think of setting a rate limiter on addToRequestToRecordMap. Suppose we hit the rate limit, we increase the rate limit as well as the wait time accordingly to better performance

we actually don't want to rate limit, we want to run as fast as possible and only wait if we dont have enough buffer space

I see 👍🏽

CCMSG-796: fix poor performance for lots of update requests

1328f1a

Signed-off-by: Lev Zemlyanov <lev@confluent.io>

levzem requested a review from a team as a code owner February 1, 2021 18:24

levzem mentioned this pull request Feb 1, 2021

slow calls to elastic, and wondering about need for the -- clock.sleep(TimeUnit.SECONDS.toMillis(1)) #487

Closed

dosvath approved these changes Feb 1, 2021

View reviewed changes

dosvath suggested changes Feb 1, 2021

View reviewed changes

levzem requested a review from dosvath February 2, 2021 16:56

levzem force-pushed the wait branch from d1f72bd to bc021a1 Compare February 2, 2021 16:59

levzem requested a review from a team February 2, 2021 17:02

remove unique key constraint

19cee88

Signed-off-by: Lev Zemlyanov <lev@confluent.io>

levzem force-pushed the wait branch from bc021a1 to 19cee88 Compare February 4, 2021 19:14

Prathibha-m reviewed Feb 4, 2021

View reviewed changes

dosvath approved these changes Feb 4, 2021

View reviewed changes

Prathibha-m self-requested a review February 5, 2021 00:29

Prathibha-m approved these changes Feb 5, 2021

View reviewed changes

levzem merged commit 7a2f9ed into confluentinc:11.0.x Feb 9, 2021

levzem deleted the wait branch February 9, 2021 00:29

karayel mentioned this pull request Sep 11, 2021

How do I improve sink connector performance ? #580

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CCMSG-796: fix poor performance for lots of update requests #490

CCMSG-796: fix poor performance for lots of update requests #490

levzem commented Feb 1, 2021 •

edited

dosvath left a comment

dosvath left a comment

Prathibha-m Feb 4, 2021

levzem Feb 4, 2021

Prathibha-m Feb 5, 2021

CCMSG-796: fix poor performance for lots of update requests #490

CCMSG-796: fix poor performance for lots of update requests #490

Conversation

levzem commented Feb 1, 2021 • edited

Problem

Solution

Does this solution apply anywhere else?

If yes, where?

Test Strategy

Testing done:

Release Plan

dosvath left a comment

Choose a reason for hiding this comment

dosvath left a comment

Choose a reason for hiding this comment

Prathibha-m Feb 4, 2021

Choose a reason for hiding this comment

levzem Feb 4, 2021

Choose a reason for hiding this comment

Prathibha-m Feb 5, 2021

Choose a reason for hiding this comment

levzem commented Feb 1, 2021 •

edited