Go-SDK: Implement coordinator-mode runtime entry point and task runner#67318
Draft
jason810496 wants to merge 3 commits into
Draft
Go-SDK: Implement coordinator-mode runtime entry point and task runner#67318jason810496 wants to merge 3 commits into
jason810496 wants to merge 3 commits into
Conversation
First step toward landing the Go SDK coordinator-mode runtime (ADR 0003, msgpack-over-IPC). Scaffolding only -- no entry point is wired here, so go-plugin / Edge Worker behaviour is unchanged. Adds the length-prefixed msgpack frame codec and the typed message envelopes the runtime will exchange with the supervisor, the sdkcontext.SdkClientContextKey injection hook on bundlev1.Task so a follow-up PR can swap in a comm-socket-backed sdk.Client, and the small sdk surface tweaks (ConnFromAPIResponse export, VariableClient interface docs, secret-masking TODOs) the comm-socket client will rely on. Pulls in github.com/vmihailenco/msgpack/v5 -- the encoding the supervisor speaks.
Build the comm layer on top of the protocol primitives so subsequent runtime code has a single typed entry point for talking to the supervisor. CoordinatorComm runs a concurrent-safe dispatcher loop that fans inbound frames out to per-request reply channels keyed by a monotonic id, propagates context cancellation, and cleans up pending requests on SendRequest failure. SocketLogHandler streams slog records as structured JSON over the dedicated logs socket so the supervisor can demux task logs without parsing stderr. CoordinatorClient implements the sdk.Client surface (GetVariable honouring AIRFLOW_VAR_* overrides, GetConnection, XCom push/pull, deferral) by routing each method through the dispatcher and translating supervisor not-found responses into the SDK's sentinel errors. No server or task-runner loop is wired yet -- that lands in the next PR in this stack.
Wire the supervisor-launched runtime that speaks ADR 0003's coordinator protocol. execution.Serve dials the comm and logs sockets the supervisor passes via the new --comm/--logs flags, installs SocketLogHandler so slog records reach the supervisor, reads StartupDetails, and drives a single TaskInstance through task_runner.Run. The runner injects a CoordinatorClient into the user task function via sdkcontext.SdkClientContextKey so tasks written against the existing sdk.Client API run unchanged. bundlev1server.Serve grows a mode selector so the same binary still serves go-plugin when no coordinator flags are present, and exits non-zero on partial --comm/--logs misuse. DAG-file parsing is intentionally not part of this stack -- it will land in a follow-up once the parsing protocol settles.
This was referenced May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Third and final PR in the stack carved out of #67154. With the protocol primitives merged in PR1 and the dispatcher / logger / client merged in PR2, this PR wires the entry point: the same bundle binary that today serves go-plugin can now also be launched directly by the Python supervisor, dial the supervisor's comm and logs sockets, and run a single TaskInstance. Coordinator mode is the path that lets the Python supervisor schedule Go tasks without standing up a separate worker process -- it launches the bundle binary as a child, hands it two socket addresses on the CLI, and talks the msgpack-over-IPC protocol directly -- so a Go-task DagRun looks operationally indistinguishable from a Python-task DagRun on the supervisor side. This is the smallest PR in the stack (~650 LOC) because all the heavy lifting -- frame I/O, dispatcher, slog handler,
sdk.Clientre-implementation -- already landed in PR1 and PR2. Dag-file parsing over the coordinator protocol is intentionally not part of this stack and will land in a follow-up once that protocol settles.How
pkg/execution/server.go--execution.Serve(bundle, commAddr, logsAddr)dials both supervisor sockets, defers aCloseon each, installsSocketLogHandleras the slog default before any user code runs, constructs aCoordinatorCommover the comm socket, reads the initialStartupDetails, and dispatches totask_runner.Run. IfServeitself errors before the dispatcher spins up, the deferred close still releases the dialed sockets so the supervisor doesn't see a stuck child.pkg/execution/task_runner.go-- runs a single task. Builds a context carrying theCoordinatorClientundersdkcontext.SdkClientContextKey(PR1 added the injection site inbundlev1.taskFunction.Execute), invokesbundle.LookupTask(dag, task).Execute, and sends the resultingTaskStateMsgback through the dispatcher. Terminal-state delivery isctx.Err()-aware so a cancelled supervisor doesn't leave the runtime blocked on a send.pkg/execution/integration_test.go-- end-to-end test that pipes a fake supervisor against the realServeover an in-memory socket pair, exercises GetVariable / XCom push / deferral, and asserts the emittedTaskStateMsg.bundle/bundlev1/bundlev1server/server.go-- splitsServeinto adecideModeswitch over(--bundle-metadata | --comm/--logs | <none>)so the same binary still serves go-plugin when no coordinator flags are present. Partial use of--comm/--logsis a hard error (ErrCoordinatorFlagsIncomplete), returned tomainso the caller exits non-zero with usage rather than silently falling back to go-plugin.example/bundle/main.go-- propagatesbundlev1server.Serve's error vialog.Fatal, and tightens the example connection-log to log only non-sensitive fields, matching the masker TODOs PR1 added onsdk.Client.GetConnection.What
go-sdk/pkg/execution/{server,task_runner,integration_test}.go.go-sdk/bundle/bundlev1/bundlev1server/server.gowith coordinator-mode dispatch andErrCoordinatorFlagsIncomplete.go-sdk/example/bundle/main.goto propagateServe's error and redact the connection log.Next
Was generative AI tooling used to co-author this PR?