refactor: separate dispatch and launcher from engine core#2233
Conversation
ShouldDispatchToCoordinator is control-plane policy: it combines DAG fields with runtime configuration to choose between local and coordinator-dispatched execution. Living in core (the workflow language types), it forced core to import cmn/config, the only such dependency in the language layer. Relocate it to a dedicated leaf package, internal/dispatch, importing only core and cmn/config. Repoint the six call sites across cmd, the frontend API, and the scheduler, and move its test to a black-box package. The core package no longer depends on cmn/config. Co-Authored-By: Claude <noreply@anthropic.com>
The dotenv loader logged eval/read failures through the platform logger and emitted a success Info line. Append failures to the existing DAG.BuildWarnings field instead and drop the success log, removing core's dependency on cmn/logger. Behavior preserved: load failures still surface to callers through the BuildWarnings field the build result already carries. No change to on-disk formats, CLI, API, or configuration; no migration required. Co-Authored-By: Claude <noreply@anthropic.com>
Move the dagu-subprocess launcher (SubCmdBuilder, CmdSpec, Run/Start/StartProcess and helpers) out of package runtime into a dedicated internal/launcher package, so callers can depend on the launcher without importing the execution engine. internal/launcher carries its own panic-recovery helper and imports nothing from internal/runtime, so there is no import cycle. runtime/subcmd.go now re-exports the launcher symbols as type aliases and variables, so every existing caller compiles unchanged; call sites migrate to internal/launcher in follow-up commits before the shim is removed. The now-unused execWithRecovery copy and its runtime/debug import are dropped from manager.go. Behavior preserved: the launcher code moved verbatim, dagu subprocess argument construction is unchanged, and there is no change to on-disk formats, CLI, API, or configuration. No migration required. Co-Authored-By: Claude <noreply@anthropic.com>
…shim Migrate every caller off the runtime re-export shim to internal/launcher directly (frontend, scheduler, worker, the runtime Manager, and test helpers) and delete runtime/subcmd.go. The launcher's own tests move to internal/launcher as launcher_test.go. internal/runtime no longer exposes the subprocess launcher, so callers depend on internal/launcher without pulling the execution engine. The worker, scheduler, and frontend dags paths that used only the launcher drop their internal/runtime import entirely. Behavior preserved: pure import repointing with no logic or argument changes; no on-disk, CLI, API, or configuration changes; no migration. Build, full test compilation, and all touched packages' tests pass. Co-Authored-By: Claude <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR refactors subprocess execution and distributed-coordination infrastructure. The changes extract dispatch logic into a new package, introduce DAG build warning collection, support environment resolution with warnings, and migrate the execution layer from a ChangesRefactor Dispatch and Launcher Infrastructure
🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/core/spec/runtime_env.go (1)
72-80:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDeep-copy mutable slices before loading dotenv.
dag.Clone()does not cloneEnvorBuildWarnings, socloned.LoadDotEnv(ctx)can append into the original DAG's backing arrays when they still have spare capacity. That makesResolveEnvWithWarningsunexpectedly mutate the caller's snapshot and leak warnings/env between retries.Suggested fix
cloned := dag.Clone() if hasRuntimeParams(params) { // Recompute DAG/base-config env entries for the new runtime params instead // of short-circuiting to whatever happened to be on the current snapshot. cloned.Env = nil + } else { + cloned.Env = append([]string(nil), cloned.Env...) } + cloned.BuildWarnings = append([]string(nil), cloned.BuildWarnings...) warningStart := len(cloned.BuildWarnings) cloned.LoadDotEnv(ctx) buildWarnings := append([]string{}, cloned.BuildWarnings[warningStart:]...)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/core/spec/runtime_env.go` around lines 72 - 80, The cloned DAG returned by dag.Clone() shares backing arrays for Env and BuildWarnings, so before calling cloned.LoadDotEnv(ctx) (e.g., inside ResolveEnvWithWarnings where hasRuntimeParams(params) is checked) deep-copy cloned.Env and cloned.BuildWarnings into new slices (use make + copy) when they are non-nil so LoadDotEnv cannot append into the original DAG's backing arrays; ensure you replace cloned.Env and cloned.BuildWarnings with these newly allocated copies prior to calling cloned.LoadDotEnv(ctx).
🧹 Nitpick comments (2)
internal/launcher/launcher.go (1)
602-606: ⚡ Quick winAlign panic log field keys with the existing recovery helper.
The equivalent helper in
internal/runtime/agent/agent.go(logRecoveredPanic) emits this same "Recovered from panic" event with keyserrTypeandstackTrace, whereas this one useserr-typeandstack-trace. Logging the same event under different field names across packages fragments log queries/dashboards. Consider standardizing.♻️ Suggested alignment
logger.Error(ctx, "Recovered from panic", slog.String("err", err.Error()), - slog.String("err-type", fmt.Sprintf("%T", panicObj)), - slog.String("stack-trace", string(stack)), + slog.String("errType", fmt.Sprintf("%T", panicObj)), + slog.String("stackTrace", string(stack)), )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/launcher/launcher.go` around lines 602 - 606, The panic recovery log in the logger.Error call inside launcher.go uses kebab-case keys ("err-type", "stack-trace") which diverges from the helper logRecoveredPanic in internal/runtime/agent/agent.go that emits "errType" and "stackTrace"; update the field keys in the logger.Error invocation (the call that logs "Recovered from panic" in the recovery handler) to use errType and stackTrace (match exact casing) so dashboards/queries remain consistent with logRecoveredPanic.internal/service/scheduler/dag_executor.go (1)
281-287: ⚡ Quick winTag warning logs before this helper emits them.
dagwarning.Lognow runs insideprepareDAGForSubprocess, butHandleJobaddstag.DAGandtag.RunIDto the context only after calling this helper. In the distributed enqueue path, dotenv/build warnings will therefore be logged without the identifiers needed to correlate them to a specific DAG run.Suggested change
func (e *DAGExecutor) HandleJob( ctx context.Context, dag *core.DAG, operation coordinatorv1.Operation, runID string, triggerType core.TriggerType, scheduleTime time.Time, ) error { // For distributed execution with START operation, enqueue for persistence if e.shouldUseDistributedExecution(dag) && operation == coordinatorv1.Operation_OPERATION_START { + ctx = logger.WithValues(ctx, + tag.DAG(dag.Name), + tag.RunID(runID), + ) dag, err := e.prepareDAGForSubprocess(ctx, dag, "") if err != nil { return fmt.Errorf("failed to prepare DAG env for enqueue: %w", err) } - ctx = logger.WithValues(ctx, - tag.DAG(dag.Name), - tag.RunID(runID), - ) logger.Info(ctx, "Enqueueing DAG for distributed execution", slog.Any("worker-selector", dag.WorkerSelector), )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/service/scheduler/dag_executor.go` around lines 281 - 287, The dotenv/build warnings are being logged by dagwarning.Log inside prepareDAGForSubprocess before HandleJob attaches tag.DAG and tag.RunID to the context, so add the DAG/run tags to the context before those warnings are emitted: either (A) move the tagging in HandleJob (where tag.DAG and tag.RunID are set) to occur prior to calling prepareDAGForSubprocess/ResolveEnvWithWarnings, or (B) modify prepareDAGForSubprocess (or the call site around spec.ResolveEnvWithWarnings/result.BuildWarnings) to create a new ctxWithTags = tag.DAG/ tag.RunID added to ctx and pass that ctxWithTags into spec.ResolveEnvWithWarnings and into dagwarning.Log so all emitted warnings include the DAG and RunID identifiers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@internal/core/spec/runtime_env.go`:
- Around line 72-80: The cloned DAG returned by dag.Clone() shares backing
arrays for Env and BuildWarnings, so before calling cloned.LoadDotEnv(ctx)
(e.g., inside ResolveEnvWithWarnings where hasRuntimeParams(params) is checked)
deep-copy cloned.Env and cloned.BuildWarnings into new slices (use make + copy)
when they are non-nil so LoadDotEnv cannot append into the original DAG's
backing arrays; ensure you replace cloned.Env and cloned.BuildWarnings with
these newly allocated copies prior to calling cloned.LoadDotEnv(ctx).
---
Nitpick comments:
In `@internal/launcher/launcher.go`:
- Around line 602-606: The panic recovery log in the logger.Error call inside
launcher.go uses kebab-case keys ("err-type", "stack-trace") which diverges from
the helper logRecoveredPanic in internal/runtime/agent/agent.go that emits
"errType" and "stackTrace"; update the field keys in the logger.Error invocation
(the call that logs "Recovered from panic" in the recovery handler) to use
errType and stackTrace (match exact casing) so dashboards/queries remain
consistent with logRecoveredPanic.
In `@internal/service/scheduler/dag_executor.go`:
- Around line 281-287: The dotenv/build warnings are being logged by
dagwarning.Log inside prepareDAGForSubprocess before HandleJob attaches tag.DAG
and tag.RunID to the context, so add the DAG/run tags to the context before
those warnings are emitted: either (A) move the tagging in HandleJob (where
tag.DAG and tag.RunID are set) to occur prior to calling
prepareDAGForSubprocess/ResolveEnvWithWarnings, or (B) modify
prepareDAGForSubprocess (or the call site around
spec.ResolveEnvWithWarnings/result.BuildWarnings) to create a new ctxWithTags =
tag.DAG/ tag.RunID added to ctx and pass that ctxWithTags into
spec.ResolveEnvWithWarnings and into dagwarning.Log so all emitted warnings
include the DAG and RunID identifiers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 80bd8205-6d94-4933-a546-1e0d22b8c8b8
📒 Files selected for processing (28)
internal/cmd/helper.gointernal/cmd/start.gointernal/core/dag.gointernal/core/dispatch.gointernal/core/spec/builder_test.gointernal/core/spec/runtime_env.gointernal/core/spec/runtime_env_external_test.gointernal/dagwarning/dagwarning.gointernal/dispatch/dispatch.gointernal/dispatch/dispatch_test.gointernal/intg/distr/fixtures_test.gointernal/launcher/launcher.gointernal/launcher/launcher_test.gointernal/runtime/agent/agent.gointernal/runtime/agent/agent_test.gointernal/runtime/manager.gointernal/runtime/manager_test.gointernal/service/frontend/api/v1/api.gointernal/service/frontend/api/v1/dagruns.gointernal/service/frontend/api/v1/dagruns_edit_retry.gointernal/service/frontend/api/v1/dags.gointernal/service/frontend/api/v1/dags_start_internal_test.gointernal/service/scheduler/dag_executor.gointernal/service/scheduler/queue_processor_test.gointernal/service/scheduler/scheduler.gointernal/service/worker/handler.gointernal/service/worker/handler_test.gointernal/test/helper.go
💤 Files with no reviewable changes (1)
- internal/core/dispatch.go
Summary
Testing
Summary by cubic
Separated control-plane policy and subprocess launching from the engine core to reduce coupling, and tightened dotenv warning handling so core records warnings while callers log them. Env rebuilds now surface dotenv warnings via
ResolveEnvWithWarningswithout mutating DAG state.Refactors
internal/dispatchand repointed all call sites; converted test to a black-box test.internal/launcher; repointed API, scheduler, worker, manager, and tests. Moved panic-recovery helper intointernal/launcherand removed the duplicate in manager.DAG.BuildWarningsin core; addedinternal/dagwarningto log warnings outside core. IntroducedResolveEnvWithWarningsand madeResolveEnva thin wrapper. Updated API/scheduler/worker/agent/CLI restore paths to log warnings at boundaries. No migration.Bug Fixes
.envfiles now produce build warnings that are captured and logged by callers.DAG.EnvorDAG.BuildWarningsbacking slices; added tests for slice isolation and warning propagation.Written for commit feddad4. Summary will update on new commits.
Summary by CodeRabbit
Release Notes
Bug Fixes
.envfiles with warning messages during DAG builds.Refactor