Skip to content

fix: stabilize queued retry dispatch#2192

Merged
yohamta0 merged 3 commits into
mainfrom
fix/ci-queue-dispatch-flakes
May 22, 2026
Merged

fix: stabilize queued retry dispatch#2192
yohamta0 merged 3 commits into
mainfrom
fix/ci-queue-dispatch-flakes

Conversation

@yohamta0
Copy link
Copy Markdown
Collaborator

@yohamta0 yohamta0 commented May 22, 2026

Summary

  • keep queue-dispatched initial runs on their queued attempt while preserving fresh attempts for queued retries
  • avoid duplicate local queue dispatch while a launched child process is still starting
  • fix notification delivery test race by waiting for delivery state

Testing

  • make lint
  • go test ./internal/service/scheduler -run TestQueueDispatcher -count=1
  • go test ./internal/service/chatbridge -run TestNotificationMonitor_SuccessEventsCanBeDeliveredByOptInTransport -count=100
  • go test ./internal/cmd -run TestRetryCommand -count=1
  • go test -race ./internal/intg/... -count=1

Summary by CodeRabbit

  • Bug Fixes

    • Improved queue dispatch retry to intelligently reuse existing queued attempts based on their status
    • Enhanced scheduler startup logic to reliably coordinate execution completion
  • Tests

    • Added test coverage for queue dispatch retry behavior with different trigger types
    • Added test for scheduler startup coordination with execution completion

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d642a68a-6c3d-4bd3-a6cf-e50009d1c060

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR refactors queue dispatch retry and distributed startup coordination. The retry command now looks up queued-dispatch targets and conditionally reuses queued attempts based on trigger type. The scheduler's startup logic shifts from error-only channels to execution-done callbacks, allowing startup checks to account for whether distributed execution has completed and handle completion as a terminal condition with explicit failure paths.

Changes

Queue Dispatch Retry Enhancement

Layer / File(s) Summary
Queue dispatch retry refactoring
internal/cmd/retry.go, internal/cmd/retry_test.go
When queue dispatch retry is enabled, runRetry now queries for retry targets and conditionally reuses queued attempts based on status. New helpers (queueDispatchRetryAttempt, queueDispatchRetryTarget, shouldUseQueuedDispatchAttempt) replace the validate-only approach. Two new tests verify reuse with webhook triggers and new attempt creation with retry triggers.

Distributed Execution Startup Coordination

Layer / File(s) Summary
Execution completion signaling infrastructure
internal/service/scheduler/queue_processor.go, internal/service/scheduler/queue_dispatcher.go
Added errExecutionExitedBeforeStartup error constant and extended startupWaitState with optional execDone callback. Refactored the distributed execution goroutine to signal completion via execDoneCh closure and captured execDoneErr, replacing the prior error-only channel.
Startup status checking with execution completion
internal/service/scheduler/queue_dispatcher.go
Updated waitForStartup to disable retries when execDone callback is present. Modified checkStartupStatus to consult the callback during startup-grace handling, treating execution completion as terminal and propagating execution errors or mapping clean exit to permanent failure.
Startup coordination test
internal/service/scheduler/queue_processor_startup_test.go
Added test verifying waitForStartup keeps local launch in-flight during repeated startup polling, using execution-done callback with context cancellation and mock assertions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • dagucloud/dagu#2180: Modifies dispatcher startup/wait logic in queue_dispatcher.go, directly related to this PR's execDone callback and startup-completion handling refactor.
  • dagucloud/dagu#1967: Implements queue-dispatch retry guard and associated not-queued normalization in internal/cmd/retry.go, overlapping with this PR's queue-dispatch retry target/attempt selection refactor.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: stabilize queued retry dispatch' directly aligns with the main objective of fixing issues with queued retry dispatch behavior and race conditions.
Description check ✅ Passed The PR description covers the Summary section with specific bullet points about the changes, and includes a Testing section with commands executed. However, it lacks a formal Changes list and Related Issues section as specified in the template.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ci-queue-dispatch-flakes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/cmd/retry_test.go (1)

234-237: ⚡ Quick win

Assert the preserved trigger type in the reuse case.

This subtest proves the queued attempt ID is reused, but it would still pass if the execution path rewrote the reused run to core.TriggerTypeRetry. Adding an assertion for latestStatus.TriggerType == core.TriggerTypeWebhook would lock in the behavior this change is preserving.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/cmd/retry_test.go` around lines 234 - 237, Add an assertion after
reading latestStatus in the reuse-case subtest to ensure the preserved trigger
type isn't overwritten: after calling latestAttempt.ReadStatus(th.Context) and
the existing require.Equal for core.Succeeded, assert that
latestStatus.TriggerType equals core.TriggerTypeWebhook (i.e., require.Equal(t,
core.TriggerTypeWebhook, latestStatus.TriggerType)) so the test fails if the
reused run is mistakenly changed to core.TriggerTypeRetry.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/cmd/retry.go`:
- Around line 177-185: The closure that calls queueDispatchRetryTarget must not
re-resolve ref and return a different attempt than the one whose status/dag were
previously loaded; update the code so it pins to the original attempt ID (the
previously loaded attempt variable) instead of reselecting by ref: either pass
the original attempt.ID into queueDispatchRetryTarget (or add an overload) and
have it validate/return only that attempt, or skip re-resolving and reuse the
already-loaded attempt/status/dag when constructing queuedAttempt; ensure
executeRetry/preparedAttempt and rootRun/trigger/log metadata remain paired with
the same pinned attempt ID and validate any mismatch if queueDispatchRetryTarget
must still requery.

---

Nitpick comments:
In `@internal/cmd/retry_test.go`:
- Around line 234-237: Add an assertion after reading latestStatus in the
reuse-case subtest to ensure the preserved trigger type isn't overwritten: after
calling latestAttempt.ReadStatus(th.Context) and the existing require.Equal for
core.Succeeded, assert that latestStatus.TriggerType equals
core.TriggerTypeWebhook (i.e., require.Equal(t, core.TriggerTypeWebhook,
latestStatus.TriggerType)) so the test fails if the reused run is mistakenly
changed to core.TriggerTypeRetry.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1e5fad76-e984-4765-9949-0fa8c1ee1c43

📥 Commits

Reviewing files that changed from the base of the PR and between a1e205e and fbf760d.

📒 Files selected for processing (6)
  • internal/cmd/retry.go
  • internal/cmd/retry_test.go
  • internal/service/chatbridge/monitor_test.go
  • internal/service/scheduler/queue_dispatcher.go
  • internal/service/scheduler/queue_processor.go
  • internal/service/scheduler/queue_processor_startup_test.go

Comment thread internal/cmd/retry.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fbf760d7a3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/service/scheduler/queue_dispatcher.go
@yohamta0 yohamta0 merged commit a21c612 into main May 22, 2026
10 checks passed
@yohamta0 yohamta0 deleted the fix/ci-queue-dispatch-flakes branch May 22, 2026 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant