Skip to content

test: stabilize distributed sub-DAG cancellation test#2007

Merged
yottahmd merged 4 commits intomainfrom
fix-flaky-test-24508081260
Apr 16, 2026
Merged

test: stabilize distributed sub-DAG cancellation test#2007
yottahmd merged 4 commits intomainfrom
fix-flaky-test-24508081260

Conversation

@yottahmd
Copy link
Copy Markdown
Collaborator

@yottahmd yottahmd commented Apr 16, 2026

Summary

  • stabilize the distributed sub-DAG cancellation test by waiting for the parent agent to record the running sub-DAG instead of polling socket-based current status
  • assert the child sub-DAG reaches running state before stopping the parent run
  • verify both the parent run and child sub-DAG finish as aborted

Root Cause

The flaky Windows job timed out waiting for DAGRunMgr.GetCurrentStatus / IsRunning to observe the parent as running. The parent was active while the distributed child ran, but that socket-based check could miss the state long enough for the test to fail.

Testing

  • go test ./internal/intg/distr -run 'TestCancellation_SubDAG/cancelPropagatesToSubDAGOnWorker' -count=10
  • go test ./internal/intg/distr -run TestCancellation_SubDAG -count=1
  • make bin
  • go test -timeout=20m -json ./internal/intg/distr -count=1 > /tmp/dagu-intg-distr-after-bin.json

Summary by CodeRabbit

  • Tests
    • Enhanced internal test coverage for distributed lifecycle management, including improved validation of cancellation synchronization and sub-DAG status verification during shutdown scenarios.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5e5b979d-b61a-4f2c-a802-559e6ccc8dee

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This test file modification updates lifecycle cancellation testing to use an error-carrying channel and adds comprehensive sub-DAG run validation. The changes enhance test verification by checking in-process agent status before cancellation and asserting sub-DAG run status becomes aborted afterward.

Changes

Cohort / File(s) Summary
Lifecycle Test Enhancement
internal/intg/distr/lifecycle_test.go
Replaced done channel synchronization with error-carrying errCh, added sub-DAG run status verification before and after cancellation, including extraction of subRunID, creation of rootRef, and assertions on final sub-DAG abort state.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test: stabilize distributed sub-DAG cancellation test' accurately describes the main change: improving the reliability of a test for distributed sub-DAG cancellation by replacing socket-based polling with deterministic agent status checks.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-flaky-test-24508081260

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/intg/distr/lifecycle_test.go`:
- Around line 168-170: The test currently does a one-shot call to
f.dagWrapper.DAGRunMgr.FindSubDAGRunStatus and asserts the child is Aborted,
which is race-prone; change the assertion to poll until the terminal state is
observed (or a timeout) by repeatedly calling FindSubDAGRunStatus in a loop or
using a test helper like require.Eventually, handling temporary nil/err
responses and only asserting equality to core.Aborted after the poll succeeds;
target the call site that invokes FindSubDAGRunStatus (in lifecycle_test.go) and
replace the direct require.Equal with a retrying/wait-until check that fails the
test if the terminal state isn’t reached within a bounded timeout.
- Line 137: In the require.Eventually polling blocks, replace calls using
context.Background() with the test's cancellable agent.Context so the polling
can be cancelled; specifically call agent.Status(agent.Context) instead of
agent.Status(context.Background()) and agent.FindSubDAGRunStatus(agent.Context,
...) instead of FindSubDAGRunStatus(context.Background(), ...) inside the two
polling closures (the require.Eventually blocks) to ensure proper cancellation
of the polling paths.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8b575628-bba4-4fc1-8e23-70684412c57a

📥 Commits

Reviewing files that changed from the base of the PR and between 8914d0e and 7a16421.

📒 Files selected for processing (1)
  • internal/intg/distr/lifecycle_test.go

Comment thread internal/intg/distr/lifecycle_test.go Outdated
Comment thread internal/intg/distr/lifecycle_test.go Outdated
@yottahmd yottahmd merged commit 3b0da99 into main Apr 16, 2026
10 checks passed
@yottahmd yottahmd deleted the fix-flaky-test-24508081260 branch April 16, 2026 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant