-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-3896: Fix KStreamRepartitionJoinTest #2405
KAFKA-3896: Fix KStreamRepartitionJoinTest #2405
Conversation
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
retest this please |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
retest this please |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
retest this please |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
retest this please |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
retest this please |
1 similar comment
retest this please |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
…into K3896-fix-kstream-repartition-join-test
I had a comment in the original KAFKA-4060 PR to batch the requests, but it was not addressed somehow. Ping @hjafarpour @mjsax @dguy @enothereska for review. |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few minor comments. But i think we need a test for InternalTopicManager.getNumPartitions
Collection<MetadataResponse.TopicMetadata> topicsMetadata = streamsKafkaClient.fetchTopicsMetadata(); | ||
validateTopicPartitons(topics, topicsMetadata); | ||
Map<InternalTopicConfig, Integer> topicsToBeCreated = filterExistingTopics(topics, topicsMetadata); | ||
Map<String, Integer> existingTopicPartitions = getExistingTopicNamesPartitions(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could (should) be final?
* Get the number of partitions for the given topics | ||
*/ | ||
public Map<String, Integer> getNumPartitions(final Set<String> topics) { | ||
Map<String, Integer> existingTopicPartitions = getExistingTopicNamesPartitions(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
private Map<InternalTopicConfig, Integer> filterExistingTopics(final Map<InternalTopicConfig, Integer> topicsPartitionsMap, Collection<MetadataResponse.TopicMetadata> topicsMetadata) { | ||
Map<String, Integer> existingTopicNamesPartitions = getExistingTopicNamesPartitions(topicsMetadata); | ||
private Map<InternalTopicConfig, Integer> validateTopicPartitons(final Map<InternalTopicConfig, Integer> topicsPartitionsMap, | ||
final Map<String, Integer> existingTopicNamesPartitions) { | ||
Map<InternalTopicConfig, Integer> nonExistingTopics = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably change the name of this to topicsToBeCreated
or something similar. Also final
} | ||
|
||
private Map<String, Integer> getExistingTopicNamesPartitions(Collection<MetadataResponse.TopicMetadata> topicsMetadata) { | ||
private Map<String, Integer> getExistingTopicNamesPartitions() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getExistingPartitionCountByTopic
?
} | ||
|
||
private Map<String, Integer> getExistingTopicNamesPartitions(Collection<MetadataResponse.TopicMetadata> topicsMetadata) { | ||
private Map<String, Integer> getExistingTopicNamesPartitions() { | ||
// The names of existing topics | ||
Map<String, Integer> existingTopicNamesPartitions = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
existinPartitionCountByTopic
?
final
?
do { | ||
partitions = streamThread.restoreConsumer.partitionsFor(topic.name()); | ||
} while (partitions == null || partitions.size() != numPartitions); | ||
Map<String, Integer> partitions = internalTopicManager.getNumPartitions(topicNamesToMakeReady); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference here would be to extract this logic into a method like:
private boolean allTopicsCreated(final Set<String> topicNamesToMakeReady, final Map<InternalTopicConfig, Integer> topicsToMakeReady) {
final Map<String, Integer> partitions = internalTopicManager.getNumPartitions(topicNamesToMakeReady);
for (Map.Entry<InternalTopicConfig, Integer> entry : topicsToMakeReady.entrySet()) {
final Integer numPartitions = partitions.get(entry.getKey().name());
if (numPartitions == null || !numPartitions.equals(entry.getValue())) {
return false;
}
}
return true;
}
and then have:
while(!allTopicsCreated(topicNamesToMakeReady, topicsToMakeReady) {
// should we add a small sleep here?
}
I think it makes the code cleaner. Removes the temporary variable and the break (neither of which i like!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second @dguy comments. There are a few more vars that can be final
, too. Otherwise, LGTM.
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit comment. Otherwise LGTM.
if (!existingTopicNamesPartitions.get(topic.name()).equals(topicsPartitionsMap.get(topic))) { | ||
throw new StreamsException("Existing internal topic " + topic.name() + " has invalid partitions." + | ||
" Expected: " + topicsPartitionsMap.get(topic) + " Actual: " + existingTopicNamesPartitions.get(topic.name()) + | ||
". Use 'kafka.tools.StreamsResetter' tool to clean up invalid topics before processing."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Use 'kafka.tools.StreamsResetter' tool"
-> "Use '" + kafka.tools.StreamsResetter.getClass().getName() + "' tool"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we do not use the class directly is that streams does not depend on kafka.tools
for not, and I'd rather not doing that until we have enough motivations to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
retest this please |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
I'm investigating the jenkins failure in another JIRA / PR. Could we merge this PR as is @hachikuji ? |
final Integer numPartitions = entry.getValue().numPartitions; | ||
// first construct the topics to make ready | ||
Map<InternalTopicConfig, Integer> topicsToMakeReady = new HashMap<>(); | ||
Set<String> topicNamesToMakeReady = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like this collection is redundant. You can get the name from InternalTopicConfig
perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can, but then I need to do another foreach
anyways to extract the names when calling the function, while doing it here saves that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The root cause of this issue is that in InternalTopicManager we are creating topics one-at-a-time, and for this test, there are 31 topics to be created, as a result it is possible that the consumer could time out during the assignment in rebalance, and the next leader has to do the same again because of "makeReady" calls are one-at-a-time. This patch batches the topics into a single create request and also use the StreamsKafkaClient directly to fetch metadata for validating the created topics. Also optimized a bunch of inefficient code in InternalTopicManager and StreamsKafkaClient. Minor cleanup: make the exception message more informative in integration tests. Author: Guozhang Wang <wangguoz@gmail.com> Reviewers: Damian Guy, Matthias J. Sax, Jason Gustafson Closes #2405 from guozhangwang/K3896-fix-kstream-repartition-join-test (cherry picked from commit 7837d3e) Signed-off-by: Guozhang Wang <wangguoz@gmail.com>
Thanks for your reviews @dguy @mjsax @hachikuji . Merged to trunk and 0.10.2. |
The root cause of this issue is that in InternalTopicManager we are creating topics one-at-a-time, and for this test, there are 31 topics to be created, as a result it is possible that the consumer could time out during the assignment in rebalance, and the next leader has to do the same again because of "makeReady" calls are one-at-a-time. This patch batches the topics into a single create request and also use the StreamsKafkaClient directly to fetch metadata for validating the created topics. Also optimized a bunch of inefficient code in InternalTopicManager and StreamsKafkaClient. Minor cleanup: make the exception message more informative in integration tests. Author: Guozhang Wang <wangguoz@gmail.com> Reviewers: Damian Guy, Matthias J. Sax, Jason Gustafson Closes apache#2405 from guozhangwang/K3896-fix-kstream-repartition-join-test
The root cause of this issue is that in InternalTopicManager we are creating topics one-at-a-time, and for this test, there are 31 topics to be created, as a result it is possible that the consumer could time out during the assignment in rebalance, and the next leader has to do the same again because of "makeReady" calls are one-at-a-time.
This patch batches the topics into a single create request and also use the StreamsKafkaClient directly to fetch metadata for validating the created topics. Also optimized a bunch of inefficient code in InternalTopicManager and StreamsKafkaClient.
Minor cleanup: make the exception message more informative in integration tests.