Skip to content

Conversation

@zentol
Copy link
Contributor

@zentol zentol commented Dec 12, 2019

Hardens the schema registry test by retrying the setup of kafka/ZK/registry . For this the existing retry_times function was extended to optionally include a cleanup command; in this case for shutting down previously started processes.
Additionally, if kafka isn't running when shutting it down we now dump the kafka logs to ease debugging.

The last commit in this PR flags the schema test as a pre-commit tests for demonstration purposes.

The exact failure cause of the test is still unknown; what we do know however is that kafka broke down after being successfully started.

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 0c1c909 (Thu Dec 12 10:31:58 UTC 2019)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details
The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Dec 12, 2019

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

Copy link
Member

@GJL GJL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally good, I have left 2 questions.

if ! get_and_verify_schema_subjects_exist; then
echo "Could not start confluent schema registry"
exit 1
return 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not checking the returned code. Previously, the test would prematurely stop, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes previously it would exit immediately but that makes it impossible to introduce any retry behavior. (well I suppose a sub-shell might work)
The exit code of test_confluent_schema_registry#test_setup is implicitly the return value of #start_confluent_schema_registry, and hence is evaluated by the retry loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

}

function retry_times() {
retry_times_with_backoff_and_cleanup $1 $2 "${@:3}" "true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How well does it mitigate the issue? Did you run the test with multiple iterations?

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @zentol and the review @GJL. I've tested the PR locally and it worked. Merging this PR now.

tillrohrmann pushed a commit that referenced this pull request Dec 19, 2019

function stop_kafka_cluster {
$KAFKA_DIR/bin/kafka-server-stop.sh
if ! [[ -z $(./bin/kafka-server-stop) ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is missing ".sh" here, which cause FLINK-15428. I'll fix it.
cc @zentol @GJL @tillrohrmann

@zentol zentol deleted the 13567 branch January 2, 2020 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants