New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Alpakka Kafka snapshot and troubleshoot Kafka integration tests #200
Support Alpakka Kafka snapshot and troubleshoot Kafka integration tests #200
Conversation
…nning test more than once
d4269e3
to
a5023f0
Compare
a5023f0
to
5218e33
Compare
.travis.yml
Outdated
@@ -86,6 +86,8 @@ env: | |||
global: | |||
# Disable Ryuk resource reaper in travis jobs since we always spin up fresh VMs | |||
- TESTCONTAINERS_RYUK_DISABLED=true | |||
# Override default akka.actor.typed.testkit.timefactor | |||
- AKKA_TEST_TIMEFACTOR=5.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that you are trying a few things here. Do you need any input or fresh ideas?
In general the default factor of 1.0 should be enough, also in Travis. If we have specific things that take longer time to initialize, such as Testcontainers I think we should have long explicit timeouts for those things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. The integration test has assertions that fail for a couple reasons: 1) the projection doesn't exist yet because no events have flowed, and 2) the assertion failed because some of the events have flowed, but not all have been processed yet. As a first step I'm trying to inflate the time it takes for a single expectation to rule out any possible bugs. If that works then we could leave it at that, or I could further investigate why it's slow. The testcontainer will already initialize the container to a ready state with an internal health check (AFAICT), but in some cases (maybe 1/100?) it's not long enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some ideas for a custom timeout instead of setting it for the whole project:
- Override the timefactor for
KafkaToSlickIntegrationSpec
only - Add a sleep to
beforeAll
KafkaToSlickIntegrationSpec
- Use an override of
ProjectionTestKit.run
that takes a custom duration - Produce and consume from a test topic with a longer timeout in
beforeAll
to assure Kafka is fully online
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I don't think timefactor is the way to go, will just hide the problems.
- maybe not sleep, but there should be a way to verify that the Kafka testcontainer is up and running before proceeding with the real test.
- there is already
def run(projection: Projection[_], max: FiniteDuration)
but if we know that the environment is up before we start the real test that might not be needed. - yes, something like that if it's not possible to verify with the testcontainer api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an additional startup check to KafkaContainerCluster
upstream in Alpakka Kafka that produces to and consumes from a topic.
I'm going to test it locally with a snapshot, but the real test will be on travis. Once that PR is merged I'll use the snapshot and continue testing, without the custom timeouts.
I haven't reproduced the failure after several hundred runs with the longer timeout. I believe this indicates it's a Kafka container slow initialization issue. |
The new check makes the test runtimes more consistent for me locally. Let's run a few hundred times on travis to see if we can reproduce a failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, but we should maybe not release 0.3 with Alpakka Kafka snapshot dependency?
We could add a system property for the Alpakka version as we have for Akka so that we can use the snapshot in CI but not in release?
Sounds good. I'll update the PR and mark it ready for review later today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Logged unrelated failure #264 |
I squashed and merged here since there were quite a few experiments in the history. |
Thanks. I always squash and merge to avoid too much noise in master. |
Yes, squash should be our default choice if more than one commit, unless there is good reasons for keeping seperate commits |
Possibly fixes #191.
I suspect this error is a timeout of the polling of the assertion code block in
testkit.run
. I'm using the defaultsingle-expect-default
. I also increased the time dilation for travis tests running the integration tests (3x3=9s max for asingle-expect-default
on travis). I repeated the test successfully 100x (with code I didn't commit here), which revealed a test cleanup problem with the projection name, so I use the guaranteed uniquegroupId
instead.