Skip to content

Go SDK: ADRs for bundle packing and coordinator-protocol runtime#67153

Open
jason810496 wants to merge 5 commits into
apache:mainfrom
jason810496:refactor/go-sdk/adrs
Open

Go SDK: ADRs for bundle packing and coordinator-protocol runtime#67153
jason810496 wants to merge 5 commits into
apache:mainfrom
jason810496:refactor/go-sdk/adrs

Conversation

@jason810496
Copy link
Copy Markdown
Member

@jason810496 jason810496 commented May 19, 2026

Final shape of the Go SDK after these ADRs land

Bundle artefact: one self-contained executable

A Go SDK bundle is a single native executable file. It runs directly;
no archive, no extraction, no per-bundle cache directory.

A footer is appended after the OS-defined end of the binary, carrying
the DAG source bytes and the airflow-metadata.yaml manifest, with a
32-byte trailer at EOF (AFBNDL01 magic, little-endian
source_len / metadata_len / footer_ver). The scanner identifies
bundles by reading the last 32 bytes of every regular file in
bundles_folder and matching the magic — filename and extension are
irrelevant.

/opt/airflow/executable-bundles/
├── example
├── pipeline
└── analytics

The manifest schema drops the executable field (the binary is the
file) and redefines source as a display filename for the UI source
view, since the bytes now live in the footer.

Bundle binary: dual-mode Serve

User code stays one line:

func main() { bundlev1server.Serve(&myBundle{}) }

Serve decides at runtime which protocol to speak based on the CLI
arguments and process environment it was invoked with:

Trigger Mode
--bundle-metadata metadata dump
--dump-bundle-spec spec dump
--comm=<host:port> --logs=<host:port> coordinator
AIRFLOW_BUNDLE_MAGIC_COOKIE env var go-plugin
  • Coordinator mode speaks the msgpack-over-IPC coordinator protocol
    to Python's ExecutableCoordinator. The first inbound frame
    (DagFileParseRequest or StartupDetails) selects the
    DAG-parsing one-shot path or the multi-round task-execution path.
    Logs go out as JSON-line records over the logs socket.
  • go-plugin mode is unchanged at the wire and source level.
    Existing Edge Worker deployments keep working without rebuilds.

The two paths share the same bundlev1.BundleProvider implementation,
the same lazy RegisterDags recorder cache, and the same
pkg/worker.Worker task lookup and parameter injection. Only the
sdk.Client backend differs (Execution API URL vs. comm socket), and
the swap happens below the SDK surface — author task code is identical
between the two modes.

Authoring workflow: go tool airflow-go-pack

The packer ships as a standalone binary at go-sdk/cmd/airflow-go-pack,
delivered via the Go 1.24 tool directive. Authors add one line to
their own go.mod:

tool github.com/apache/airflow/go-sdk/cmd/airflow-go-pack

and pack with one command:

go tool airflow-go-pack

By default the packer locates the main package in the current
directory, runs go build internally, execs the freshly built binary
with --dump-bundle-spec to populate the manifest, then appends the
source bytes, manifest, and trailer to the binary. Output is
<bundleName> (or <bundleName>.exe on Windows) and is byte-deterministic
for byte-identical inputs.

Escape hatches: -- <go build flags> for passthrough,
--executable <path> to skip the internal build, --source <path> to
override source detection, --output <path> for a custom output path.

Introspection contract: --dump-bundle-spec

Every bundle binary supports a stable JSON introspection flag:

{
  "format_version": "1.0",
  "sdk": {"language": "go", "version": "<sdk version>"},
  "dags": {"<dag_id>": {"tasks": ["<task_id>", "..."]}}
}

RegisterDags remains the single authoritative source of dag/task
identity — the packer execs the binary to read it, no AST scanning
or hand-written manifests. Third-party tooling (IDE plugins,
alternative packers, CI plugins) can rely on the same contract without
taking a Go dependency on the SDK.

Cross-language scope

The footer format and the --dump-bundle-spec contract are
language-agnostic. Future native-SDK languages (Rust, C++, Zig) emit
the same artefact shape and implement the same packer mechanism in
their own toolchain — the consumer-side scanner reads the result
identically regardless of source language.

What's in this PR

Four ADRs under go-sdk/adr/:

  • ADR 0001 — Post-build bundle-packing options. The option
    register: nine candidate packer mechanisms (standalone CLI,
    all-in-one CLI, self-pack, introspection-based, AST scan,
    go generate, -toolexec, tool directive, build-system recipe).
    Documents the rejected options so future SDKs facing the same
    question do not have to re-derive them.
  • ADR 0002 — Use the Go 1.24 tool directive for the bundle
    packer.
    Selects the implementation: Option H (tool directive)
    for delivery, paired with Option A (standalone airflow-go-pack)
    and Option D (standardised --dump-bundle-spec introspection).
  • ADR 0003 — Dual-mode bundle binary. Adds the
    msgpack-over-IPC coordinator-protocol path alongside the existing
    go-plugin/Edge-Worker path. Same Serve entry point, mode
    selected from invocation.
  • ADR 0004 — Self-contained executable bundle. Replaces the
    ZIP container assumed by ADRs 0001 / 0002 with a footer appended
    to the executable. The packer mechanism from ADR 0002 is
    unchanged; only the artefact it writes changes.

ADR 0001 and ADR 0002 cross-reference ADR 0004 in their Status
sections to flag the superseded portions, so readers landing on the
older ADRs first are pointed at the current artefact shape.


Was generative AI tooling used to co-author this PR?

@jason810496
Copy link
Copy Markdown
Member Author

The CI failure should be fixed after the dependent PRs get merged and rebase.

2. **Extend `bundlev1server.Serve` with `--dump-bundle-spec`.** The
flag prints a JSON document of the form:

```json
Copy link
Copy Markdown
Member Author

@jason810496 jason810496 May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could define the JSON schema of the airflow-metadata.yaml format either in this PR or the next one.

It was added in 478edab#diff-82337e6c6586f2c271a1457ae2f45acf563aa7b75ce82a78f163cf78690c91b4

3. **Bundle authors register the packer in their own `go.mod`:**

```
tool github.com/apache/airflow/go-sdk/cmd/airflow-go-pack
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release management issue is tracked in #66938

### Footer layout

A bundle file is laid out as:

Copy link
Copy Markdown
Member Author

@jason810496 jason810496 May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user-facing doc regarding the executable bundle spec was added in 478edab#diff-0f63500bac12820edd728da72331356b8546a8d571ed949df07a96e1085b835d

+---------------------------------+ <- EOF
```

`AFBNDL01` is `0x41 0x46 0x42 0x4E 0x44 0x4C 0x30 0x31`. The two
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AFBNDL01 magic represents Airflow BuNDLe version 01, we could change it to anything anyway.

@jason810496 jason810496 force-pushed the refactor/go-sdk/adrs branch from a8f9a29 to d65167d Compare May 22, 2026 03:13
@jason810496 jason810496 requested a review from uranusjr May 22, 2026 08:55
@jason810496 jason810496 marked this pull request as ready for review May 22, 2026 10:03
@jason810496 jason810496 requested review from aritra24 and kaxil May 22, 2026 10:03
@jason810496 jason810496 moved this from In progress to In review in AIP-72 (addendum): Go-SDK May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:go-sdk go-sdk Label to track work items for golang task sdk

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

1 participant