KAFKA-3896: Fix KStreamRepartitionJoinTest #2405

guozhangwang · 2017-01-19T04:43:22Z

The root cause of this issue is that in InternalTopicManager we are creating topics one-at-a-time, and for this test, there are 31 topics to be created, as a result it is possible that the consumer could time out during the assignment in rebalance, and the next leader has to do the same again because of "makeReady" calls are one-at-a-time.

This patch batches the topics into a single create request and also use the StreamsKafkaClient directly to fetch metadata for validating the created topics. Also optimized a bunch of inefficient code in InternalTopicManager and StreamsKafkaClient.

Minor cleanup: make the exception message more informative in integration tests.

asfbot · 2017-01-19T05:32:37Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1013/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-01-19T05:34:50Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1015/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-01-19T05:37:39Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1013/
Test PASSed (JDK 7 and Scala 2.10).

guozhangwang · 2017-01-19T06:08:24Z

retest this please

asfbot · 2017-01-19T06:58:01Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1017/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-01-19T07:30:39Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1015/
Test PASSed (JDK 8 and Scala 2.12).

guozhangwang · 2017-01-19T15:27:12Z

retest this please

asfbot · 2017-01-19T16:16:11Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1027/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-01-19T16:16:31Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1025/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-01-19T16:25:58Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1025/
Test PASSed (JDK 7 and Scala 2.10).

guozhangwang · 2017-01-19T16:54:23Z

retest this please

asfbot · 2017-01-19T17:42:58Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1027/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-01-19T17:45:41Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1029/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-01-19T19:16:26Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1027/
Test PASSed (JDK 7 and Scala 2.10).

guozhangwang · 2017-01-19T19:26:05Z

retest this please

asfbot · 2017-01-19T20:16:45Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1038/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-01-19T20:17:34Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1036/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-01-19T20:21:15Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1036/
Test PASSed (JDK 7 and Scala 2.10).

guozhangwang · 2017-01-19T20:54:25Z

retest this please

guozhangwang · 2017-01-19T23:15:39Z

retest this please

asfbot · 2017-01-20T00:05:01Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1045/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-01-20T00:05:49Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1043/
Test FAILed (JDK 7 and Scala 2.10).

asfbot · 2017-01-20T00:06:51Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1043/
Test PASSed (JDK 8 and Scala 2.12).

…into K3896-fix-kstream-repartition-join-test

guozhangwang · 2017-01-20T08:16:16Z

I had a comment in the original KAFKA-4060 PR to batch the requests, but it was not addressed somehow. Ping @hjafarpour @mjsax @dguy @enothereska for review.

asfbot · 2017-01-20T11:18:59Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1066/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-01-20T11:21:49Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1064/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-01-20T11:25:37Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1064/
Test PASSed (JDK 8 and Scala 2.12).

dguy

Left a few minor comments. But i think we need a test for InternalTopicManager.getNumPartitions

dguy · 2017-01-20T16:12:38Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/InternalTopicManager.java

-                Collection<MetadataResponse.TopicMetadata> topicsMetadata = streamsKafkaClient.fetchTopicsMetadata();
-                validateTopicPartitons(topics, topicsMetadata);
-                Map<InternalTopicConfig, Integer> topicsToBeCreated = filterExistingTopics(topics, topicsMetadata);
+                Map<String, Integer> existingTopicPartitions = getExistingTopicNamesPartitions();


These could (should) be final?

dguy · 2017-01-20T16:12:56Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/InternalTopicManager.java

+     * Get the number of partitions for the given topics
+     */
+    public Map<String, Integer> getNumPartitions(final Set<String> topics) {
+        Map<String, Integer> existingTopicPartitions = getExistingTopicNamesPartitions();


dguy · 2017-01-20T16:15:40Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/InternalTopicManager.java

-    private Map<InternalTopicConfig, Integer> filterExistingTopics(final Map<InternalTopicConfig, Integer> topicsPartitionsMap, Collection<MetadataResponse.TopicMetadata> topicsMetadata) {
-        Map<String, Integer> existingTopicNamesPartitions = getExistingTopicNamesPartitions(topicsMetadata);
+    private Map<InternalTopicConfig, Integer> validateTopicPartitons(final Map<InternalTopicConfig, Integer> topicsPartitionsMap,
+                                                                     final Map<String, Integer> existingTopicNamesPartitions) {
        Map<InternalTopicConfig, Integer> nonExistingTopics = new HashMap<>();


I'd probably change the name of this to topicsToBeCreated or something similar. Also final

dguy · 2017-01-20T16:16:58Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/InternalTopicManager.java

    }

-    private Map<String, Integer> getExistingTopicNamesPartitions(Collection<MetadataResponse.TopicMetadata> topicsMetadata) {
+    private Map<String, Integer> getExistingTopicNamesPartitions() {


getExistingPartitionCountByTopic ?

dguy · 2017-01-20T16:17:35Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/InternalTopicManager.java

    }

-    private Map<String, Integer> getExistingTopicNamesPartitions(Collection<MetadataResponse.TopicMetadata> topicsMetadata) {
+    private Map<String, Integer> getExistingTopicNamesPartitions() {
        // The names of existing topics
        Map<String, Integer> existingTopicNamesPartitions = new HashMap<>();


existinPartitionCountByTopic?
final?

dguy · 2017-01-20T16:26:40Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamPartitionAssignor.java

                do {
-                    partitions = streamThread.restoreConsumer.partitionsFor(topic.name());
-                } while (partitions == null || partitions.size() != numPartitions);
+                    Map<String, Integer> partitions = internalTopicManager.getNumPartitions(topicNamesToMakeReady);


My preference here would be to extract this logic into a method like:

private boolean allTopicsCreated(final Set<String> topicNamesToMakeReady, final Map<InternalTopicConfig, Integer> topicsToMakeReady) { final Map<String, Integer> partitions = internalTopicManager.getNumPartitions(topicNamesToMakeReady); for (Map.Entry<InternalTopicConfig, Integer> entry : topicsToMakeReady.entrySet()) { final Integer numPartitions = partitions.get(entry.getKey().name()); if (numPartitions == null || !numPartitions.equals(entry.getValue())) { return false; } } return true; }

and then have:

while(!allTopicsCreated(topicNamesToMakeReady, topicsToMakeReady) { // should we add a small sleep here? }

I think it makes the code cleaner. Removes the temporary variable and the break (neither of which i like!)

mjsax

I second @dguy comments. There are a few more vars that can be final, too. Otherwise, LGTM.

guozhangwang · 2017-01-21T21:46:42Z

@dguy @mjsax addressed your comments, please take a look again.

asfbot · 2017-01-21T22:14:31Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1089/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-01-21T22:18:48Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1087/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-01-21T22:46:51Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1087/
Test PASSed (JDK 8 and Scala 2.12).

mjsax

One nit comment. Otherwise LGTM.

mjsax · 2017-01-22T04:10:31Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/InternalTopicManager.java

+                if (!existingTopicNamesPartitions.get(topic.name()).equals(topicsPartitionsMap.get(topic))) {
+                    throw new StreamsException("Existing internal topic " + topic.name() + " has invalid partitions." +
+                            " Expected: " + topicsPartitionsMap.get(topic) + " Actual: " + existingTopicNamesPartitions.get(topic.name()) +
+                            ". Use 'kafka.tools.StreamsResetter' tool to clean up invalid topics before processing.");


"Use 'kafka.tools.StreamsResetter' tool"
-> "Use '" + kafka.tools.StreamsResetter.getClass().getName() + "' tool"

The reason we do not use the class directly is that streams does not depend on kafka.tools for not, and I'd rather not doing that until we have enough motivations to do so.

dguy

LGMT

asfbot · 2017-01-24T00:12:55Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1126/
Test FAILed (JDK 8 and Scala 2.12).

asfbot · 2017-01-24T00:58:07Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1128/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-01-24T03:09:41Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1126/
Test FAILed (JDK 7 and Scala 2.10).

guozhangwang · 2017-01-24T03:30:53Z

retest this please

asfbot · 2017-01-24T04:24:25Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1133/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-01-24T04:26:38Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1133/
Test FAILed (JDK 8 and Scala 2.12).

asfbot · 2017-01-24T06:39:47Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1135/
Test FAILed (JDK 8 and Scala 2.11).

guozhangwang · 2017-01-24T18:48:51Z

I'm investigating the jenkins failure in another JIRA / PR. Could we merge this PR as is @hachikuji ?

hachikuji · 2017-01-24T18:54:30Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamPartitionAssignor.java

-            final Integer numPartitions = entry.getValue().numPartitions;
+        // first construct the topics to make ready
+        Map<InternalTopicConfig, Integer> topicsToMakeReady = new HashMap<>();
+        Set<String> topicNamesToMakeReady = new HashSet<>();


Feels like this collection is redundant. You can get the name from InternalTopicConfig perhaps?

Yes, I can, but then I need to do another foreach anyways to extract the names when calling the function, while doing it here saves that.

hachikuji

LGTM

The root cause of this issue is that in InternalTopicManager we are creating topics one-at-a-time, and for this test, there are 31 topics to be created, as a result it is possible that the consumer could time out during the assignment in rebalance, and the next leader has to do the same again because of "makeReady" calls are one-at-a-time. This patch batches the topics into a single create request and also use the StreamsKafkaClient directly to fetch metadata for validating the created topics. Also optimized a bunch of inefficient code in InternalTopicManager and StreamsKafkaClient. Minor cleanup: make the exception message more informative in integration tests. Author: Guozhang Wang <wangguoz@gmail.com> Reviewers: Damian Guy, Matthias J. Sax, Jason Gustafson Closes #2405 from guozhangwang/K3896-fix-kstream-repartition-join-test (cherry picked from commit 7837d3e) Signed-off-by: Guozhang Wang <wangguoz@gmail.com>

guozhangwang · 2017-01-24T20:01:19Z

Thanks for your reviews @dguy @mjsax @hachikuji . Merged to trunk and 0.10.2.

The root cause of this issue is that in InternalTopicManager we are creating topics one-at-a-time, and for this test, there are 31 topics to be created, as a result it is possible that the consumer could time out during the assignment in rebalance, and the next leader has to do the same again because of "makeReady" calls are one-at-a-time. This patch batches the topics into a single create request and also use the StreamsKafkaClient directly to fetch metadata for validating the created topics. Also optimized a bunch of inefficient code in InternalTopicManager and StreamsKafkaClient. Minor cleanup: make the exception message more informative in integration tests. Author: Guozhang Wang <wangguoz@gmail.com> Reviewers: Damian Guy, Matthias J. Sax, Jason Gustafson Closes apache#2405 from guozhangwang/K3896-fix-kstream-repartition-join-test

minor changes for jenkins builds

3fb4e84

guozhangwang added 2 commits January 19, 2017 22:22

Merge branch 'trunk' of https://git-wip-us.apache.org/repos/asf/kafka …

1877ac3

…into K3896-fix-kstream-repartition-join-test

batching create topics

5fe7084

guozhangwang changed the title ~~KAFKA-3896: Fix KStreamRepartitionJoinTest [WIP]~~ KAFKA-3896: Fix KStreamRepartitionJoinTest Jan 20, 2017

dguy reviewed Jan 20, 2017

View reviewed changes

mjsax reviewed Jan 20, 2017

View reviewed changes

guozhangwang added 2 commits January 20, 2017 11:07

address github comments

efc7179

github comments

0b3d346

mjsax reviewed Jan 22, 2017

View reviewed changes

mjsax approved these changes Jan 23, 2017

View reviewed changes

dguy approved these changes Jan 23, 2017

View reviewed changes

rebase from apache trunk

08b90e5

hachikuji reviewed Jan 24, 2017

View reviewed changes

hachikuji approved these changes Jan 24, 2017

View reviewed changes

asfgit closed this in 7837d3e Jan 24, 2017

guozhangwang deleted the K3896-fix-kstream-repartition-join-test branch July 15, 2017 22:07

KAFKA-3896: Fix KStreamRepartitionJoinTest #2405

KAFKA-3896: Fix KStreamRepartitionJoinTest #2405

Conversation

guozhangwang commented Jan 19, 2017 • edited Loading

asfbot commented Jan 19, 2017

asfbot commented Jan 19, 2017

asfbot commented Jan 19, 2017

guozhangwang commented Jan 19, 2017

asfbot commented Jan 19, 2017

asfbot commented Jan 19, 2017

guozhangwang commented Jan 19, 2017

asfbot commented Jan 19, 2017

asfbot commented Jan 19, 2017

asfbot commented Jan 19, 2017

guozhangwang commented Jan 19, 2017

asfbot commented Jan 19, 2017

asfbot commented Jan 19, 2017

asfbot commented Jan 19, 2017

guozhangwang commented Jan 19, 2017

asfbot commented Jan 19, 2017

asfbot commented Jan 19, 2017

asfbot commented Jan 19, 2017

guozhangwang commented Jan 19, 2017

guozhangwang commented Jan 19, 2017

asfbot commented Jan 20, 2017

asfbot commented Jan 20, 2017

asfbot commented Jan 20, 2017

guozhangwang commented Jan 20, 2017

asfbot commented Jan 20, 2017

asfbot commented Jan 20, 2017

asfbot commented Jan 20, 2017

dguy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjsax left a comment

Choose a reason for hiding this comment

guozhangwang commented Jan 21, 2017

asfbot commented Jan 21, 2017

asfbot commented Jan 21, 2017

asfbot commented Jan 21, 2017

mjsax left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dguy left a comment

Choose a reason for hiding this comment

asfbot commented Jan 24, 2017

asfbot commented Jan 24, 2017

asfbot commented Jan 24, 2017

guozhangwang commented Jan 24, 2017

asfbot commented Jan 24, 2017

asfbot commented Jan 24, 2017

asfbot commented Jan 24, 2017

guozhangwang commented Jan 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hachikuji left a comment

Choose a reason for hiding this comment

guozhangwang commented Jan 24, 2017

guozhangwang commented Jan 19, 2017 •

edited

Loading