CI: Fix Kinesis integration test race condition in LocalStack readiness and stream reset#691
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughPer-test unique Kinesis stream names replace shared delete/recreate logic; LocalStack container now waits for a "Ready." log before use; Kinesis adapter carries a new describe_timeout parameter from the connection URL; CI Kinesis workflow Python matrix explicitly lists 3.11–3.14. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Runner
participant Fixture as Test Fixture
participant LocalStack as LocalStack Container
participant Kinesis as Kinesis Service
participant Adapter as Kinesis Adapter
rect rgba(200,200,255,0.5)
Test->>Fixture: request kinesis fixture
Fixture->>LocalStack: ensure container started
LocalStack->>LocalStack: _configure() sets log wait strategy
LocalStack->>LocalStack: _connect() wait_for_logs("Ready.", 60s)
LocalStack-->>Fixture: ready
end
rect rgba(200,255,200,0.5)
Fixture->>Fixture: generate unique stream name (UUID)
Fixture->>Kinesis: create stream with unique name
Fixture-->>Test: return connection URL (includes describe-timeout)
end
rect rgba(255,200,200,0.5)
Test->>Adapter: Adapter init (reads describe-timeout)
Adapter->>Kinesis: create consumer/producer with describe_timeout
Adapter-->>Test: stream operations use per-test stream
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@amotl hmm looks like you might need to disable py3.10 again but seems dynamdb related? |
|
@hampsterx: Yes, please disable Python 3.10 for the time being. Please also ignore integration tests against CrateDB Cloud: I've created an issue GH-693 to track this, certainly unrelated to your patch. |
ok even better, have applied the same fixes from kinesis fixture to the dynamo one. 🤞 |
yup have just commit with py3.10 removed but should be noted that its kind of luck timing wise that it works on 3.11 (has had some async improvements) so under load the CI could fail on any version. I think async-kinesis needs some way to say its produced records, wait for them before stopping, which would eliminate this timing issue. Will have a look :) |
|
async-kinesis 2.3.0 fixes:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/io/dynamodb/conftest.py (1)
38-38: Initialize_stream_nameuniquely at construction time (optional hardening).Line 38 can use the same UUID-based pattern as
reset()so the fixture is isolation-safe even before the first reset call.Suggested diff
- self._stream_name = "demo" + self._stream_name = f"demo-{uuid.uuid4().hex[:8]}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/io/dynamodb/conftest.py` at line 38, Initialize the fixture's _stream_name at construction to a unique value instead of the hardcoded "demo": in the fixture's __init__ (or where self._stream_name is set) generate a UUID-based name using the same pattern used in reset() so the attribute is isolation-safe before reset() is called; update the assignment to use that UUID pattern and keep reset() behavior intact (refer to _stream_name and reset()).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/io/dynamodb/conftest.py`:
- Line 38: Initialize the fixture's _stream_name at construction to a unique
value instead of the hardcoded "demo": in the fixture's __init__ (or where
self._stream_name is set) generate a UUID-based name using the same pattern used
in reset() so the attribute is isolation-safe before reset() is called; update
the assignment to use that UUID pattern and keep reset() behavior intact (refer
to _stream_name and reset()).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: eff61ead-0f4f-476d-81ef-39ad953e2945
📒 Files selected for processing (4)
.github/workflows/kinesis.ymlpyproject.tomltests/io/dynamodb/conftest.pytests/io/dynamodb/test_relay.py
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/kinesis.yml
- tests/io/dynamodb/test_relay.py
| # Allow enough time for the consumer to create its LATEST shard iterator | ||
| # before producing events. With LocalStack, stream creation + shard iterator | ||
| # setup can take longer than 1 second on slow systems. | ||
| time.sleep(3) | ||
|
|
||
| # Populate source database with data. | ||
| for event in events: | ||
| table_loader.kinesis_adapter.produce(event) | ||
|
|
||
| # Allow time for the consumer's fetch tasks to poll Kinesis | ||
| # and deliver the records to the processing handler. | ||
| time.sleep(2) |
There was a problem hiding this comment.
Oh. We need to sleep longer than before? Based on your efforts with async-kinesis 2.3.0, I would have expected that we wouldn't need this any longer?
There was a problem hiding this comment.
hah fair question, the 2.3.0 fixes addressed the consumer/producer flush race (records getting lost on close), but this sleep is for a different race: the shard iterator setup 🤦♂️
With start=LATEST, the consumer needs to obtain its shard iterator before any events are produced, otherwise they're missed. The iterator is obtained lazily inside a background fetch task, so there's currently no way for the caller to know when it's actually ready, hence the sleep.
I'll add a wait_ready() method to async-kinesis that exposes an asyncio.Event fired once shard iterators are obtained, will raise another PR for bump the package on crate and we should be good :)
Summary
LocalStackContainerWithKeepalivenot waiting for LocalStack readiness due to MRO bypassingLocalStackContainer.start()describe-timeoutURL parameter to async-kinesis Consumer/ProducerProblem
Kinesis integration tests were flaky (and disabled in CI) due to two race conditions:
Container readiness:
KeepaliveContainer.start()overridesLocalStackContainer.start()in the MRO, so thewait_for_logs("Ready")call never executes. The Kinesis API receives requests before LocalStack is ready.Stream reset:
reset_streams()deletes and recreates thetestdrivestream between tests, but LocalStack's eventual consistency meanscreate_streamcan fail if called too soon afterdelete_stream.Changes
localstack.py: Add_configure()and_connect()hooks to wait for LocalStack's "Ready" log, matching the pattern used byCrateDBContainerandInfluxDB2Containerconftest.py: Replace stream deletion with unique stream names per test (testdrive-<uuid>), eliminating the race entirelyadapter.py: Forwarddescribe-timeoutURL query parameter to async-kinesisConsumerandProducer, giving more time for stream activation after creationkinesis.yml: Re-enable CI matrix for Python 3.10–3.14References
reset_streams()#690Test plan
TC_KEEPALIVE=true(container reuse path)