KAFKA-4843: More efficient round-robin scheduler #2643

enothereska · 2017-03-05T15:33:50Z

Improves streams efficiency by more than 200K requests/second (small 100 byte requests)
Gets streams efficiency very close to pure consumer (see results in https://jenkins.confluent.io/job/system-test-kafka-branch-builder/746/console)
Maintains same fairness across tasks
Schedules all records in the queue in-between poll() calls, not just one per task.

enothereska · 2017-03-05T15:35:08Z

@mjsax @dguy @guozhangwang have a look when you can. This led to most of the efficiency increases in streams consumption vs. simple consumer. Streams performance went from ~60MB/s -> 97MB/s (request size is 100 bytes).

asfbot · 2017-03-05T16:13:34Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2013/
Test FAILed (JDK 8 and Scala 2.12).

asfbot · 2017-03-05T16:18:36Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2012/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-03-05T16:18:56Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2015/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-03-05T17:49:06Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2017/
Test FAILed (JDK 8 and Scala 2.11).

asfbot · 2017-03-05T17:50:23Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2014/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-03-05T18:20:56Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2015/
Test FAILed (JDK 8 and Scala 2.12).

enothereska · 2017-03-05T19:52:06Z

Full benchmark run: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/748/console

mjsax · 2017-03-05T23:27:19Z

From 60 to 100? Wow! That's pretty impressive! Great work @enothereska !

mjsax

LGTM.

mjsax · 2017-03-05T23:31:30Z

tests/kafkatest/benchmarks/streams/streams_simple_benchmark_test.py

        self.replication = 1


    @cluster(num_nodes=9)
-    @matrix(test=["produce", "consume", "count", "processstream", "processstreamwithsink", "processstreamwithstatestore", "processstreamwithcachedstatestore", "kstreamktablejoin", "kstreamkstreamjoin", "ktablektablejoin"], scale=[1, 2, 3])
+    @matrix(test=["produce", "consume", "count", "processstream", "processstreamwithsink", "processstreamwithstatestore", "processstreamwithcachedstatestore", "kstreamktablejoin", "kstreamkstreamjoin", "ktablektablejoin"], scale=[1, 3])


Why do you skip scale=2 ?

I've increased the length of each test (by adding more records). I skip scale=2 to shorten the run of this test a bit. Also scale=3 is good enough to show performance (in small scale).

dguy · 2017-03-06T08:56:18Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java

+                        while (true) {
+                            int numProcessed = task.process();
+                            totalNumBuffered += numProcessed;
+                            if (numProcessed == 0)


nit: we should use java style

if() { .. }

dguy · 2017-03-06T09:06:54Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java

+                        // process this task's records to completion before
+                        // context switching to another task
+                        while (true) {
+                            int numProcessed = task.process();


nit: final

dguy · 2017-03-06T09:13:59Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java

-
+                        // process this task's records to completion before
+                        // context switching to another task
+                        while (true) {


perhaps this could be:

int numProcessed; while ((numProcessed = task.process()) != 0) { ..}

I just have an aversion to while(true) ... break

dguy · 2017-03-06T09:18:02Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java

-
+                        // process this task's records to completion before
+                        // context switching to another task
+                        while (true) {


Do we need to now check if we should be committing in here now? Before i think we'd process one record for each task and then check if we need to commit. Now we will process however many records that have been buffered before we commit. Could be quite a lot of tasks, which could mean that we are not committing as frequently as desired. This could obviously lead to having lots of duplicates during failure scenarios.
It probably doesn't make that much of a difference when the commit interval is large, but if it is small it might be significant.

So we have two options: 1) we could check if we need to commit after each record (previous) or 2) check if we need to commit after each batch of records from poll() has been processed (this patch). This patch speeds up the common case (no failures) while relying on EoS to eliminate duplicates if there is failure.

Yeah - i'm cool with it either way. Just wanted to throw it out there as it is a change in behaviour.

@guozhangwang we will have to change the definition of the poll sensor to be for a bunch of requests for for each one individually. I think that's ok.

miguno · 2017-03-06T12:48:51Z

@enothereska wrote:

This led to most of the efficiency increases in streams consumption vs. simple consumer. Streams performance went from ~60MB/s -> 97MB/s (request size is 100 bytes).

Hey, great job!

asfbot · 2017-03-06T15:03:55Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2031/
Test FAILed (JDK 8 and Scala 2.12).

dguy · 2017-03-06T15:39:22Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java

+                        int numProcessed;
+                        while ((numProcessed = task.process()) != 0) {
+                            totalNumBuffered += numProcessed;
+                        }
                        requiresPoll = requiresPoll || task.requiresPoll();


Will this always be true now?

There is still one path in task.process() where requiresPoll is set to false.

dguy · 2017-03-06T15:40:50Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java

+                        int numProcessed;
+                        while ((numProcessed = task.process()) != 0) {
+                            totalNumBuffered += numProcessed;
+                        }
                        requiresPoll = requiresPoll || task.requiresPoll();

                        streamsMetrics.processTimeSensor.record(computeLatency(), timerStartedMs);


Shouldn't this now move into the loop above?

I'd like to do stats on a batch of requests, not each one individually anymore. The reason is that now that we do the processing in a tight loop, the performance of computeLatency (which calls time.milliseconds()) matters more than before (when perf was slower to start with). Basically either that or this sensor needs to be a DEBUG sensor.

I this case should we divide the latency by the number of records processed?

Yes, agreed.

asfbot · 2017-03-06T17:18:30Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2033/
Test FAILed (JDK 8 and Scala 2.11).

asfbot · 2017-03-06T17:18:36Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2030/
Test FAILed (JDK 7 and Scala 2.10).

enothereska · 2017-03-06T18:08:39Z

Known problem: testReprocessingFromScratchAfterResetWithIntermediateUserTopic FAILED
java.lang.IllegalStateException: No entry found for connection 0
https://issues.apache.org/jira/browse/KAFKA-4842 (not streams related)

guozhangwang · 2017-03-06T21:14:58Z

Some new jenkins failures worth mentioning:

org.apache.kafka.streams.integration.JoinIntegrationTest.testLeftKStreamKStream

java.lang.IllegalStateException: No entry found for connection 0
Stacktrace

java.lang.IllegalStateException: No entry found for connection 0
	at org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:166)
	at org.apache.kafka.clients.ClusterConnectionStates.connectionState(ClusterConnectionStates.java:156)
	at org.apache.kafka.clients.NetworkClient.connectionFailed(NetworkClient.java:232)
	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.checkDisconnects(ConsumerNetworkClient.java:356)
	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:231)
	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.pollNoWakeup(ConsumerNetworkClient.java:259)
	at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:911)

org.apache.kafka.streams.integration.ResetIntegrationTest > testReprocessingFromScratchAfterResetWithIntermediateUserTopic FAILED
    java.lang.IllegalStateException: No entry found for connection 0

guozhangwang · 2017-03-27T16:56:24Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java

     * Process one record
     *
-     * @return number of records left in the buffer of this task's partition group after the processing is done
+     * @return number of records processed


In this case could we just return a boolean?

guozhangwang · 2017-03-27T16:57:28Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java

+                long processLatency = computeLatency();
+                streamsMetrics.processTimeSensor.record(processLatency / (double) totalProcessedSinceLastCommit,
+                    timerStartedMs);
+                maybeCommit(this.timerStartedMs);


OK makes sense.

guozhangwang · 2017-03-27T16:57:54Z

@enothereska Just one minor comment otherwise LGTM.

…-robin

enothereska · 2017-03-27T17:36:26Z

@guozhangwang addressed, however let's hold off from committing since I've seen some unit test failures in trunk that need more testing. Thanks.

asfbot · 2017-03-27T19:13:15Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2419/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-03-27T19:14:49Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2423/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-03-27T19:18:02Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2419/
Test PASSed (JDK 7 and Scala 2.10).

…-robin

asfbot · 2017-03-28T09:41:09Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2447/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-03-28T09:43:44Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2451/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-03-28T11:47:36Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2447/
Test FAILed (JDK 8 and Scala 2.12).

enothereska · 2017-03-28T17:55:32Z

org.apache.kafka.streams.integration.ResetIntegrationTest > testReprocessingFromScratchAfterResetWithoutIntermediateUserTopic STARTED seems to hang. This is consistent with what I see in trunk.

enothereska · 2017-03-28T18:11:25Z

retest this please

asfbot · 2017-03-28T18:13:44Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2471/
Test FAILed (JDK 8 and Scala 2.11).

asfbot · 2017-03-28T18:15:01Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2467/
Test FAILed (JDK 7 and Scala 2.10).

asfbot · 2017-03-28T18:15:21Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2467/
Test FAILed (JDK 8 and Scala 2.12).

enothereska · 2017-03-28T18:25:37Z

A recent commit has broken checkstyle. Not related to this PR.

guozhangwang · 2017-03-28T18:25:49Z

@enothereska Jenkins failures seem to be due to another early commit on DualSocketChannel.

…-robin

asfbot · 2017-03-28T20:36:11Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2474/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-03-28T20:42:02Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2470/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-03-28T20:43:32Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2470/
Test PASSed (JDK 7 and Scala 2.10).

guozhangwang · 2017-03-28T20:55:15Z

retest this please.

asfbot · 2017-03-28T21:41:44Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2472/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-03-28T21:45:32Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2476/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-03-28T22:26:18Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2472/
Test PASSed (JDK 8 and Scala 2.12).

guozhangwang · 2017-03-29T22:01:01Z

LGTM. Merged to trunk.

Eno Thereska added 7 commits March 5, 2017 08:09

More efficient round robin

c3f9a17

Tighter loop

138a491

Increased records further

caba483

Temporarily use only 2 relevant tests for branch builder

187f9b6

Undo previous

87f06a8

Temporary reduce number of tests for quick branch builder turnaround

aaa14d1

Re-enable full tests

6c616ad

Eno Thereska added 2 commits March 5, 2017 16:58

Increase timeout to match increased records

6414962

Speed up tests by running with scale 1 and 3 only

c1c47ef

mjsax approved these changes Mar 5, 2017

View reviewed changes

dguy reviewed Mar 6, 2017

View reviewed changes

Merge remote-tracking branch 'apache-kafka/trunk' into trunk

978f925

Eno Thereska added 2 commits March 6, 2017 14:17

Damian's comments

b78545e

Merge remote-tracking branch 'apache-kafka/trunk' into trunk

57856b0

dguy reviewed Mar 6, 2017

View reviewed changes

guozhangwang reviewed Mar 27, 2017

View reviewed changes

Eno Thereska added 2 commits March 27, 2017 18:28

Int->Boolean

1b0b168

Merge remote-tracking branch 'origin/trunk' into minor-schedule-round…

06e3380

…-robin

Merge remote-tracking branch 'origin/trunk' into minor-schedule-round…

7b98e22

…-robin

Merge remote-tracking branch 'origin/trunk' into minor-schedule-round…

d4c3ae8

…-robin

asfgit closed this in 84a14fe Mar 29, 2017

enothereska deleted the minor-schedule-round-robin branch March 30, 2017 06:34

KAFKA-4843: More efficient round-robin scheduler #2643

KAFKA-4843: More efficient round-robin scheduler #2643

Conversation

enothereska commented Mar 5, 2017

enothereska commented Mar 5, 2017

asfbot commented Mar 5, 2017

asfbot commented Mar 5, 2017

asfbot commented Mar 5, 2017

asfbot commented Mar 5, 2017

asfbot commented Mar 5, 2017

asfbot commented Mar 5, 2017

enothereska commented Mar 5, 2017

mjsax commented Mar 5, 2017

mjsax left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miguno commented Mar 6, 2017

asfbot commented Mar 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asfbot commented Mar 6, 2017

asfbot commented Mar 6, 2017

enothereska commented Mar 6, 2017

guozhangwang commented Mar 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guozhangwang commented Mar 27, 2017

enothereska commented Mar 27, 2017 • edited Loading

asfbot commented Mar 27, 2017

asfbot commented Mar 27, 2017

asfbot commented Mar 27, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

enothereska commented Mar 28, 2017

enothereska commented Mar 28, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

enothereska commented Mar 28, 2017

guozhangwang commented Mar 28, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

guozhangwang commented Mar 28, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

guozhangwang commented Mar 29, 2017

enothereska commented Mar 27, 2017 •

edited

Loading