-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-5362: Follow up to Streams EOS system test #3542
Conversation
Call for review @enothereska @dguy @bbejeck @guozhangwang |
@@ -259,7 +259,7 @@ private void setState(final State newState) { | |||
log.info("{} State transition from {} to {}.", logPrefix, oldState, newState); | |||
} | |||
state = newState; | |||
if (stateListener != null) { | |||
if (stateListener != null && state != oldState) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\cc @enothereska Just want to bring this to your attention as you recently worked on this.
We got from 2 thread to one and increase number of Streams instances from 2 to 3 to compensate. This is required to make rebalances predicable to allow replacing sleeps with wait-conditions. Also triggered 25 rounds of system tests: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/987/ |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
This should got to Test failures are unrelated. Seems to be a more general Jenkins issues with those test -- I saw multiple reports that this SSL tests failed (guess, re-testing does not help atm). But his PR only does system test stuff anyway. |
Seems like there was a system test failure in the above link? |
@@ -90,7 +100,7 @@ private KafkaStreams createKafkaStreams(final File stateDir, | |||
props.put(StreamsConfig.APPLICATION_ID_CONFIG, APP_ID); | |||
props.put(StreamsConfig.STATE_DIR_CONFIG, stateDir.toString()); | |||
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, kafka); | |||
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 2); | |||
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -54,7 +54,7 @@ | |||
public class EosTestDriver extends SmokeTestUtil { | |||
|
|||
private static final int MAX_NUMBER_OF_KEYS = 100; | |||
private static final long MAX_IDLE_TIME_MS = 300000L; | |||
private static final long MAX_IDLE_TIME_MS = 600000L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Some of the changes are exactly what I had suspected during my on-call. One general question - do we want to consider checking for the actual messages consumed (assuming sent with an incrementing number in the value) as well? |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
26db0ef
to
8b45153
Compare
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
8b45153
to
cf2e286
Compare
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
retest this please |
cf2e286
to
9b624b6
Compare
@dguy No need to trigger retests -- I am playing with this branch to figure out why system tests are still unstable... |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
9b624b6
to
c9ed7e5
Compare
- reduce test runtime by removing sleep calls - improved debugging - minor fix for KafkaStreams state listener callback
- need to check topic-partition position even if no data is returned - should not kill producer on failure - improved debugging
c9ed7e5
to
c6a6e32
Compare
Rebased and extended closing timeout. Triggered system test: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1090/ |
Retriggered system test: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1092/ |
I've just triggered another build as the last one didn't work for some aws issues: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1093/ |
1093 still fails. |
There was a 5 minute gap in the logs for one Streams instance where nothing happened... Might have been environmental. Retriggered: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1094/ |
I had a quick look and it seemed like there may have been some connectivity issues with the brokers and between the brokers |
Retest this please |
The last run 1094 got more than 100 successful runs -- we never got this many before. I guess we should merge this now. The latest changes increasing the timeouts and adding some flush statements for stdout seems to do the trick. |
retest this please |
- improve tests to get rid of calls to `sleep` in Python - fixed some flaky test conditions - improve debugging Author: Matthias J. Sax <matthias@confluent.io> Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com> Closes #3542 from mjsax/failing-eos-system-tests (cherry picked from commit 5106344) Signed-off-by: Guozhang Wang <wangguoz@gmail.com>
Merged to trunk and 1.0. |
sleep
in Python