Fix kinesis IT flakiness#12821
Conversation
paul-rogers
left a comment
There was a problem hiding this comment.
Thanks much for fixing this flaky test!
I wonder, is a time-based duration the right strategy? If the test runs on a busy system, won't the task still time out? Is there a way to determine that things are running (perhaps slowly) rather than having hung, etc.?
|
Hi @paul-rogers The supervisor continues to create tasks until the integration test duration elapses (or until success). So multiple tasks may be created in both cases. |
|
@AmatyaAvadhanula, thanks for the explanation. I think that the key point is still valid: tests should not be assume anything about how long things take (except to perhaps impose an overall timeout). For this test, it seems we should poll to detect completion. Do we have the API we'd need to do the polling? |
|
@paul-rogers, thank you for the feedback.
We are polling for a condition (the query count of ingestion to match the number of records in the stream) for success. The test duration here is the timeout you are referring to.
Reverting the ingestion task duration to the older value of 2 minutes is to solve a possible "misconfiguration" in the ingestion spec for the tests which may not be ideal for druid. The test itself doesn't make any assumptions about the task duration |
Kinesis ITs run with a task duration <= supervisor period and this may cause flakiness.
This is fixed by reverting task duration in test supervisors to 120 seconds.
This PR has: