Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-5362: Follow up to Streams EOS system test #3542

Closed
wants to merge 6 commits into from

Conversation

mjsax
Copy link
Member

@mjsax mjsax commented Jul 18, 2017

  • improve tests to get rid of calls to sleep in Python
  • fixed some flaky test conditions
  • improve debugging

@mjsax
Copy link
Member Author

mjsax commented Jul 18, 2017

Call for review @enothereska @dguy @bbejeck @guozhangwang

@@ -259,7 +259,7 @@ private void setState(final State newState) {
log.info("{} State transition from {} to {}.", logPrefix, oldState, newState);
}
state = newState;
if (stateListener != null) {
if (stateListener != null && state != oldState) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\cc @enothereska Just want to bring this to your attention as you recently worked on this.

@mjsax
Copy link
Member Author

mjsax commented Jul 18, 2017

We got from 2 thread to one and increase number of Streams instances from 2 to 3 to compensate. This is required to make rebalances predicable to allow replacing sleeps with wait-conditions.

Also triggered 25 rounds of system tests: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/987/

@asfgit
Copy link

asfgit commented Jul 18, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/6137/
Test FAILed (JDK 7 and Scala 2.11).

@asfgit
Copy link

asfgit commented Jul 18, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/6122/
Test FAILed (JDK 8 and Scala 2.12).

@mjsax
Copy link
Member Author

mjsax commented Jul 18, 2017

This should got to 0.11.0, too.

Test failures are unrelated. Seems to be a more general Jenkins issues with those test -- I saw multiple reports that this SSL tests failed (guess, re-testing does not help atm). But his PR only does system test stuff anyway.

@enothereska
Copy link
Contributor

Seems like there was a system test failure in the above link?

@@ -90,7 +100,7 @@ private KafkaStreams createKafkaStreams(final File stateDir,
props.put(StreamsConfig.APPLICATION_ID_CONFIG, APP_ID);
props.put(StreamsConfig.STATE_DIR_CONFIG, stateDir.toString());
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, kafka);
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 2);
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -54,7 +54,7 @@
public class EosTestDriver extends SmokeTestUtil {

private static final int MAX_NUMBER_OF_KEYS = 100;
private static final long MAX_IDLE_TIME_MS = 300000L;
private static final long MAX_IDLE_TIME_MS = 600000L;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@bbejeck
Copy link
Contributor

bbejeck commented Jul 18, 2017

Some of the changes are exactly what I had suspected during my on-call.

One general question - do we want to consider checking for the actual messages consumed (assuming sent with an incrementing number in the value) as well?

@asfgit
Copy link

asfgit commented Jul 24, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/6296/
Test FAILed (JDK 7 and Scala 2.11).

@asfgit
Copy link

asfgit commented Jul 24, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/6280/
Test FAILed (JDK 8 and Scala 2.12).

@asfgit
Copy link

asfgit commented Jul 24, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/6304/
Test FAILed (JDK 7 and Scala 2.11).

@asfgit
Copy link

asfgit commented Jul 24, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/6288/
Test FAILed (JDK 8 and Scala 2.12).

@asfgit
Copy link

asfgit commented Jul 24, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/6292/
Test FAILed (JDK 8 and Scala 2.12).

@asfgit
Copy link

asfgit commented Jul 24, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/6308/
Test FAILed (JDK 7 and Scala 2.11).

@dguy
Copy link
Contributor

dguy commented Jul 24, 2017

retest this please

@mjsax
Copy link
Member Author

mjsax commented Jul 24, 2017

@dguy No need to trigger retests -- I am playing with this branch to figure out why system tests are still unstable...

@asfgit
Copy link

asfgit commented Jul 24, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/6311/
Test FAILed (JDK 7 and Scala 2.11).

@asfgit
Copy link

asfgit commented Jul 24, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/6295/
Test FAILed (JDK 8 and Scala 2.12).

 - reduce test runtime by removing sleep calls
 - improved debugging
 - minor fix for KafkaStreams state listener callback
 - need to check topic-partition position even if no data is returned
 - should not kill producer on failure
 - improved debugging
@mjsax
Copy link
Member Author

mjsax commented Oct 2, 2017

Rebased and extended closing timeout. Triggered system test: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1090/

@mjsax
Copy link
Member Author

mjsax commented Oct 5, 2017

@dguy
Copy link
Contributor

dguy commented Oct 5, 2017

I've just triggered another build as the last one didn't work for some aws issues: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1093/

@guozhangwang
Copy link
Contributor

1093 still fails.

@mjsax
Copy link
Member Author

mjsax commented Oct 5, 2017

There was a 5 minute gap in the logs for one Streams instance where nothing happened... Might have been environmental.

Retriggered: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1094/

@dguy
Copy link
Contributor

dguy commented Oct 5, 2017

I had a quick look and it seemed like there may have been some connectivity issues with the brokers and between the brokers

@mjsax
Copy link
Member Author

mjsax commented Oct 6, 2017

Retest this please

@mjsax
Copy link
Member Author

mjsax commented Oct 6, 2017

The last run 1094 got more than 100 successful runs -- we never got this many before. I guess we should merge this now. The latest changes increasing the timeouts and adding some flush statements for stdout seems to do the trick.

@guozhangwang
Copy link
Contributor

retest this please

asfgit pushed a commit that referenced this pull request Oct 7, 2017
 - improve tests to get rid of calls to `sleep` in Python
 - fixed some flaky test conditions
 - improve debugging

Author: Matthias J. Sax <matthias@confluent.io>

Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

Closes #3542 from mjsax/failing-eos-system-tests

(cherry picked from commit 5106344)
Signed-off-by: Guozhang Wang <wangguoz@gmail.com>
@guozhangwang
Copy link
Contributor

Merged to trunk and 1.0.

@asfgit asfgit closed this in 5106344 Oct 7, 2017
@mjsax mjsax deleted the failing-eos-system-tests branch October 9, 2017 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants