test: stabilize distributed sub-DAG cancellation test#2007
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis test file modification updates lifecycle cancellation testing to use an error-carrying channel and adds comprehensive sub-DAG run validation. The changes enhance test verification by checking in-process agent status before cancellation and asserting sub-DAG run status becomes aborted afterward. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/intg/distr/lifecycle_test.go`:
- Around line 168-170: The test currently does a one-shot call to
f.dagWrapper.DAGRunMgr.FindSubDAGRunStatus and asserts the child is Aborted,
which is race-prone; change the assertion to poll until the terminal state is
observed (or a timeout) by repeatedly calling FindSubDAGRunStatus in a loop or
using a test helper like require.Eventually, handling temporary nil/err
responses and only asserting equality to core.Aborted after the poll succeeds;
target the call site that invokes FindSubDAGRunStatus (in lifecycle_test.go) and
replace the direct require.Equal with a retrying/wait-until check that fails the
test if the terminal state isn’t reached within a bounded timeout.
- Line 137: In the require.Eventually polling blocks, replace calls using
context.Background() with the test's cancellable agent.Context so the polling
can be cancelled; specifically call agent.Status(agent.Context) instead of
agent.Status(context.Background()) and agent.FindSubDAGRunStatus(agent.Context,
...) instead of FindSubDAGRunStatus(context.Background(), ...) inside the two
polling closures (the require.Eventually blocks) to ensure proper cancellation
of the polling paths.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8b575628-bba4-4fc1-8e23-70684412c57a
📒 Files selected for processing (1)
internal/intg/distr/lifecycle_test.go
Summary
Root Cause
The flaky Windows job timed out waiting for
DAGRunMgr.GetCurrentStatus/IsRunningto observe the parent as running. The parent was active while the distributed child ran, but that socket-based check could miss the state long enough for the test to fail.Testing
go test ./internal/intg/distr -run 'TestCancellation_SubDAG/cancelPropagatesToSubDAGOnWorker' -count=10go test ./internal/intg/distr -run TestCancellation_SubDAG -count=1make bingo test -timeout=20m -json ./internal/intg/distr -count=1 > /tmp/dagu-intg-distr-after-bin.jsonSummary by CodeRabbit