Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Alpakka Kafka snapshot and troubleshoot Kafka integration tests #200

Merged
merged 17 commits into from Jun 23, 2020

Conversation

seglo
Copy link
Member

@seglo seglo commented Jun 4, 2020

Possibly fixes #191.

I suspect this error is a timeout of the polling of the assertion code block in testkit.run. I'm using the default single-expect-default. I also increased the time dilation for travis tests running the integration tests (3x3=9s max for a single-expect-default on travis). I repeated the test successfully 100x (with code I didn't commit here), which revealed a test cleanup problem with the projection name, so I use the guaranteed unique groupId instead.

@seglo seglo force-pushed the seglo/increase-timeout-for-kafka-integration branch 2 times, most recently from d4269e3 to a5023f0 Compare June 10, 2020 20:42
@seglo seglo force-pushed the seglo/increase-timeout-for-kafka-integration branch from a5023f0 to 5218e33 Compare June 10, 2020 20:55
@seglo seglo marked this pull request as draft June 10, 2020 20:57
.travis.yml Outdated
@@ -86,6 +86,8 @@ env:
global:
# Disable Ryuk resource reaper in travis jobs since we always spin up fresh VMs
- TESTCONTAINERS_RYUK_DISABLED=true
# Override default akka.actor.typed.testkit.timefactor
- AKKA_TEST_TIMEFACTOR=5.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that you are trying a few things here. Do you need any input or fresh ideas?

In general the default factor of 1.0 should be enough, also in Travis. If we have specific things that take longer time to initialize, such as Testcontainers I think we should have long explicit timeouts for those things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. The integration test has assertions that fail for a couple reasons: 1) the projection doesn't exist yet because no events have flowed, and 2) the assertion failed because some of the events have flowed, but not all have been processed yet. As a first step I'm trying to inflate the time it takes for a single expectation to rule out any possible bugs. If that works then we could leave it at that, or I could further investigate why it's slow. The testcontainer will already initialize the container to a ready state with an internal health check (AFAICT), but in some cases (maybe 1/100?) it's not long enough.

Copy link
Member Author

@seglo seglo Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ideas for a custom timeout instead of setting it for the whole project:

  1. Override the timefactor for KafkaToSlickIntegrationSpec only
  2. Add a sleep to beforeAll KafkaToSlickIntegrationSpec
  3. Use an override of ProjectionTestKit.run that takes a custom duration
  4. Produce and consume from a test topic with a longer timeout in beforeAll to assure Kafka is fully online

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I don't think timefactor is the way to go, will just hide the problems.
  2. maybe not sleep, but there should be a way to verify that the Kafka testcontainer is up and running before proceeding with the real test.
  3. there is already def run(projection: Projection[_], max: FiniteDuration) but if we know that the environment is up before we start the real test that might not be needed.
  4. yes, something like that if it's not possible to verify with the testcontainer api

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an additional startup check to KafkaContainerCluster upstream in Alpakka Kafka that produces to and consumes from a topic.

akka/alpakka-kafka#1131

I'm going to test it locally with a snapshot, but the real test will be on travis. Once that PR is merged I'll use the snapshot and continue testing, without the custom timeouts.

@seglo
Copy link
Member Author

seglo commented Jun 15, 2020

I haven't reproduced the failure after several hundred runs with the longer timeout. I believe this indicates it's a Kafka container slow initialization issue.

@seglo
Copy link
Member Author

seglo commented Jun 22, 2020

The new check makes the test runtimes more consistent for me locally. Let's run a few hundred times on travis to see if we can reproduce a failure.

Copy link
Member

@patriknw patriknw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, but we should maybe not release 0.3 with Alpakka Kafka snapshot dependency?
We could add a system property for the Alpakka version as we have for Akka so that we can use the snapshot in CI but not in release?

@seglo
Copy link
Member Author

seglo commented Jun 23, 2020

We could add a system property for the Alpakka version as we have for Akka so that we can use the snapshot in CI but not in release?

Sounds good. I'll update the PR and mark it ready for review later today.

@seglo seglo changed the title Increase Kafka integration test timeout Support Alpakka Kafka snapshot and troubleshoot Kafka integration tests Jun 23, 2020
@seglo seglo marked this pull request as ready for review June 23, 2020 15:04
Copy link
Member

@patriknw patriknw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@seglo
Copy link
Member Author

seglo commented Jun 23, 2020

Logged unrelated failure #264

@seglo seglo closed this Jun 23, 2020
@seglo seglo reopened this Jun 23, 2020
@octonato octonato merged commit d94feba into master Jun 23, 2020
@octonato octonato deleted the seglo/increase-timeout-for-kafka-integration branch June 23, 2020 19:18
@octonato
Copy link
Member

I squashed and merged here since there were quite a few experiments in the history.

@seglo
Copy link
Member Author

seglo commented Jun 23, 2020

Thanks. I always squash and merge to avoid too much noise in master.

@patriknw
Copy link
Member

Yes, squash should be our default choice if more than one commit, unless there is good reasons for keeping seperate commits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

failed: KafkaToSlickIntegrationSpec
3 participants