-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-4916: test streams with brokers failing (for 0.10.2) #2793
Conversation
0.10.2 version of #2719 |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Known Kafka failures. |
retest this please |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Jenkins seems down |
retest this please |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
retest this please |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@enothereska There's a bunch of small stuff from other PRs that I can't easily judge whether they are safe in a backport. (The significant different diffstat was a dead giveaway, and differences in the licenses are almost certainly best resolved to ours when cherry-picking unless you were making changes to licenses).
We should either revert those changes or have someone conclude that they are safe to combine. And since some of them involve system tests, it might warrant running the tests before merging.
@@ -95,15 +114,17 @@ public void close() { | |||
private Map<InternalTopicConfig, Integer> validateTopicPartitions(final Map<InternalTopicConfig, Integer> topicsPartitionsMap, | |||
final Map<String, Integer> existingTopicNamesPartitions) { | |||
final Map<InternalTopicConfig, Integer> topicsToBeCreated = new HashMap<>(); | |||
for (InternalTopicConfig topic: topicsPartitionsMap.keySet()) { | |||
for (Map.Entry<InternalTopicConfig, Integer> entry : topicsPartitionsMap.entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be unrelated to the original patch, it seems to be from findbugs cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks.
// flush state | ||
firstException.compareAndSet(null, flushAllState()); | ||
// only commit after all state has been flushed and there hasn't been an exception | ||
if (firstException.get() == null) { | ||
firstException.set(commitOffsets()); | ||
// TODO: currently commit failures will not be thrown to users |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also seems unrelated. It's in another patch that's being backported anyway, but probably shouldn't have made it into a cherry-pick.
@@ -211,7 +227,17 @@ public Double apply(Long value1, Long value2) { | |||
"cntByCnt" | |||
).to(stringSerde, longSerde, "tagg"); | |||
|
|||
return new KafkaStreams(builder, props); | |||
final KafkaStreams streamsClient = new KafkaStreams(builder, props); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also seems unrelated, this is from a patch for KAFKA-4885
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have eagle eyes :) Good catch again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I actually want the new change since I rely on exception handlers to catch problems.
@@ -169,7 +172,7 @@ public void run() { | |||
public void onCompletion(final RecordMetadata metadata, final Exception exception) { | |||
if (exception != null) { | |||
exception.printStackTrace(); | |||
System.exit(-1); | |||
System.exit(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like half the change from KAFKA-4039?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably harmless but reverting back.
@@ -50,6 +50,7 @@ def __init__(self, test_context, kafka, streams_class_name, user_test_args): | |||
self.kafka = kafka | |||
self.args = {'streams_class_name': streams_class_name, | |||
'user_test_args': user_test_args} | |||
self.log_level = "DEBUG" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From KAFKA-4702?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a good time to turn on logging in 0.10.2 to help debug. So this is good.
@@ -94,7 +95,7 @@ def abortThenRestart(self): | |||
self.logger.info("Restarting Kafka Streams on " + str(node.account)) | |||
self.start_node(node) | |||
|
|||
def wait(self, timeout_sec=360): | |||
def wait(self, timeout_sec=1440): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From KAFKA-4843?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I consciously upped this to match trunk since the new test also requires it. So this is good.
@@ -27,7 +27,7 @@ class StreamsSmokeTest(KafkaTest): | |||
""" | |||
|
|||
def __init__(self, test_context): | |||
super(StreamsSmokeTest, self).__init__(test_context, num_zk=1, num_brokers=2, topics={ | |||
super(StreamsSmokeTest, self).__init__(test_context, num_zk=1, num_brokers=3, topics={ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From KAFKA-4716?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually needed for this PR since this PR changes the replication factor to 3 in SmokeTestClient.java. It just so happened that in trunk we had already done that change, so it appears as if it is part of KAFKA_4716, but it's part of this PR.
@enothereska Might be easier to do the cherry-pick cleanly now that it's been squashed to a single commit on trunk. It looks like it's just 3 relatively small conflicts -- only reason I didn't resolve and commit myself is that I wasn't sure if we also wanted/needed to run system tests against the cherry-picked version given the size of the change. |
Thanks @ewencp I'll double check. Ran streams test and passed: https://jenkins.confluent.io/job/system-test-kafka-branch-builder-2/245/ |
@ewencp I'm getting on plane, don't commit until I ack since I haven't had a chance to look yet. Thanks. |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
@ewencp this is good to go now. I did a test last night https://jenkins.confluent.io/job/system-test-kafka-branch-builder-2/246/. I have another scheduled waiting on branch builder, but I think it's good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Author: Eno Thereska <eno@confluent.io> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #2793 from enothereska/KAFKA-4916-0.10.2
No description provided.