Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-4916: test streams with brokers failing #2719

Closed
wants to merge 42 commits into from
Closed

KAFKA-4916: test streams with brokers failing #2719

wants to merge 42 commits into from

Conversation

enothereska
Copy link
Contributor

@enothereska enothereska commented Mar 21, 2017

Several fixes for handling broker failures:

  • default replication value for internal topics is now 3 in test itself (not in streams code, that will require a KIP.
  • streams producer waits for acks from all replicas in test itself (not in streams code, that will require a KIP.
  • backoff time for streams client to try again after a failure to contact controller.
  • fix bug related to state store locks (this helps in multi-threaded scenarios)
  • fix related to catching exceptions property for network errors.
  • system test for all the above

@enothereska
Copy link
Contributor Author

enothereska commented Mar 21, 2017

Already getting an error from this basic test, that needs to be fixed before this PR goes in.
org.apache.kafka.streams.errors.StreamsException: Could not create internal topics.
at org.apache.kafka.streams.processor.internals.InternalTopicManager.makeReady(InternalTopicManager.java:69). Looks like this might not be a real error though, since the replication factor of the internal topics is not maintained. Adding another broker and re-testing.

@asfbot
Copy link

asfbot commented Mar 21, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2303/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented Mar 21, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2307/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Mar 21, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2303/
Test FAILed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented Mar 22, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2320/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented Mar 22, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2320/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented Mar 22, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2324/
Test PASSed (JDK 8 and Scala 2.11).

@enothereska enothereska changed the title KAFKA-4916: test streams with brokers failing [WiP] KAFKA-4916: test streams with brokers failing Mar 22, 2017
@enothereska
Copy link
Contributor Author

@dguy want to have a look? Thanks.

Copy link
Contributor

@dguy dguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enothereska just left a couple of questions

@@ -136,6 +136,9 @@ public void run() {
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafka);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, ByteArraySerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, ByteArraySerializer.class);
props.put(ProducerConfig.RETRIES_CONFIG, 5);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should set this higher? is there any harm in setting it to Integer.MAX_VALUE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

# Pick a random topic and bounce it's leader
topic_index = randint(0, len(self.topics.keys()) - 1)
topic = self.topics.keys()[topic_index]
failures[failure_mode](self, topic, broker_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should maybe kill some more brokers? Or is that going to be too non-deterministic in terms of test failures?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably can. The main problem I have right now is how to sidestep all the Kafka corner cases when it comes to failures while still showing that streams is resilient. Stay tuned.

@asfbot
Copy link

asfbot commented Mar 22, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2321/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented Mar 22, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2325/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Mar 22, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2321/
Test FAILed (JDK 7 and Scala 2.10).

@enothereska enothereska changed the title KAFKA-4916: test streams with brokers failing KAFKA-4916: test streams with brokers failing [WiP] Mar 22, 2017
@enothereska
Copy link
Contributor Author

enothereska commented Mar 22, 2017

@dguy Making as WiP again since 2 tests are failing and I need to investigate.

[2017-03-22 16:25:05,575] ERROR stream-thread [SmokeTest-b25363c8-5bce-4d72-8c51-8fe0b49715f6-StreamThread-1] Streams application error during processing: (org.apache.kafka.streams.processor.internals.StreamThread)
java.lang.IllegalStateException: Attempt to send a request to node 1 which is not ready.
at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:284)
at org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:265)
at org.apache.kafka.streams.processor.internals.StreamsKafkaClient.sendRequest(StreamsKafkaClient.java:238)
at org.apache.kafka.streams.processor.internals.StreamsKafkaClient.createTopics(StreamsKafkaClient.java:170)
at org.apache.kafka.streams.processor.internals.InternalTopicManager.makeReady(InternalTopicManager.java:63)
at org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.prepareTopic(StreamPartitionAssignor.java:615)
at org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.assign(StreamPartitionAssignor.java:445)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:347)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:505)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$1100(AbstractCoordinator.java:93)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:455)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:437)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:788)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:769)
at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:190)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:153)
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:120)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:498)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:342)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:251)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:167)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:351)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:307)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:294)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1033)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:999)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:542)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:326)

@asfbot
Copy link

asfbot commented Mar 31, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2572/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented Mar 31, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2572/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented Mar 31, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2576/
Test PASSed (JDK 8 and Scala 2.11).

@enothereska
Copy link
Contributor Author

@enothereska
Copy link
Contributor Author

@mjsax @dguy can I get a final approve so that @ijuma or others can check in? Also just opened 0.10.2 cherry picked version of this PR. Thanks.

Copy link
Contributor

@dguy dguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enothereska, LGTM

@asfbot
Copy link

asfbot commented Apr 3, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2634/
Test FAILed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Apr 3, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2630/
Test FAILed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented Apr 3, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2630/
Test FAILed (JDK 7 and Scala 2.10).

@enothereska
Copy link
Contributor Author

retest this please

@asfbot
Copy link

asfbot commented Apr 3, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2631/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented Apr 3, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2631/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented Apr 3, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2635/
Test FAILed (JDK 8 and Scala 2.11).

@enothereska
Copy link
Contributor Author

Env failure, unrelated to PR

Copy link
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@enothereska
Copy link
Contributor Author

enothereska commented Apr 3, 2017

@gwenshap @ewencp @ijuma this can go in, thanks. Also the 0.10.2 version of this: #2793. Thanks.

@asfbot
Copy link

asfbot commented Apr 4, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2717/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented Apr 4, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2721/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Apr 4, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2717/
Test FAILed (JDK 7 and Scala 2.10).

@asfgit asfgit closed this in 49f80b2 Apr 5, 2017
@enothereska enothereska deleted the KAFKA-4916-broker-bounce-test branch April 5, 2017 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants