[tests] Fix Kafka E2E test timeout and retry race condition#2358
[tests] Fix Kafka E2E test timeout and retry race condition#2358
Conversation
Increase Kafka CR readiness timeout from 60s to 300s to account for slow Strimzi startup on QEMU-based CI sandbox (4 JVM pods). Add wait-for-delete before re-applying to prevent race condition where kubectl apply hits a still-deleting resource on retry attempts. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
📝 WalkthroughWalkthroughThe e2e test for Kafka now adds explicit resource deletion verification using Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the Kafka end-to-end test script to ensure proper resource deletion by adding a wait command and increases the timeout for the Kafka readiness check. The review suggests replacing the hardcoded resource name with the $name variable and using the full resource type name for better consistency and maintainability.
| sleep 5 | ||
| kubectl -n tenant-test wait hr kafka-$name --timeout=30s --for=condition=ready | ||
| kubectl wait kafkas -n tenant-test test --timeout=60s --for=condition=ready | ||
| kubectl wait kafkas -n tenant-test test --timeout=300s --for=condition=ready |
There was a problem hiding this comment.
The resource name is hardcoded as test here, which is inconsistent with the rest of the script that uses the $name variable. Additionally, using the full resource name kafka.apps.cozystack.io and consistent flag ordering (placing -n before the command) improves maintainability and aligns with the patterns used elsewhere in this file.
kubectl -n tenant-test wait kafka.apps.cozystack.io $name --timeout=300s --for=condition=ready
There was a problem hiding this comment.
🧹 Nitpick comments (1)
hack/e2e-apps/kafka.bats (1)
44-44: Timeout increase is appropriate; consider using$namevariable for consistency.The 300s timeout aligns well with other heavy tests (MongoDB, FoundationDB). However, the resource name is hardcoded as
testinstead of using the$namevariable, which is inconsistent with the rest of the test (e.g., line 5 uses$name).♻️ Suggested fix for consistency
- kubectl wait kafkas -n tenant-test test --timeout=300s --for=condition=ready + kubectl wait kafkas -n tenant-test $name --timeout=300s --for=condition=ready🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hack/e2e-apps/kafka.bats` at line 44, Replace the hardcoded resource name "test" in the kubectl wait invocation with the existing variable $name to match the rest of the script; locate the kubectl wait call (kubectl wait kafkas -n tenant-test test --timeout=300s --for=condition=ready) and change the resource argument to use $name so it becomes consistent with other references to $name in the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@hack/e2e-apps/kafka.bats`:
- Line 44: Replace the hardcoded resource name "test" in the kubectl wait
invocation with the existing variable $name to match the rest of the script;
locate the kubectl wait call (kubectl wait kafkas -n tenant-test test
--timeout=300s --for=condition=ready) and change the resource argument to use
$name so it becomes consistent with other references to $name in the test.
What this PR does
Fixes Kafka E2E test (
hack/e2e-apps/kafka.bats) that fails consistently in CI due to two issues:Insufficient timeout — Kafka CR readiness timeout was 60s, but Strimzi with 4 JVM pods (2 Kafka + 2 ZooKeeper) on QEMU-based CI sandbox needs 3-5 minutes to fully start. Increased to 300s, consistent with other heavyweight tests (MongoDB 300s, FoundationDB 300s, Harbor 600s).
Retry race condition — On retry attempts,
kubectl applycould hit a still-deleting resource from the previous attempt, resulting in "unchanged" on a deleting object and no new HelmRelease being created. Addedwait --for=deletebefore re-applying, following the existing pattern inharbor.batsandrun-kubernetes.sh.CI evidence: https://github.com/cozystack/cozystack/actions/runs/24135612137/job/70424376181
Release note
Summary by CodeRabbit