feat(libs/pipeline-plan): typed operator-plan IR (ADR-0045 Phase C.1)#67
Merged
Merged
Conversation
Phase C.1 of ADR-0045 — defines the typed operator-plan IR that replaces the free-form `--inline-sql` flag the Spark-backed runner consumed. A Plan is a directed acyclic graph of typed Op nodes; each Op has one of nine v1 kinds (read_table, filter, project, rename, cast, aggregate, union, limit, write_table) and exactly one populated per-kind config field. Inputs reference upstream Op.IDs. Scope is schema only — types, JSON serde, structural validation. The interpreter that consumes a Plan and executes it against Iceberg lives in a separate package (Phase C.2). Keeping the schema independent means pipeline-build-service, pipeline-runner, and any test harness can depend on the contract without dragging the Iceberg client and its dep graph (apache/iceberg-go, substrait-protobuf pin from Phase A, gocloud, …) into their build. The operator set is the one Phase 0's inventory (docs/migration/pipeline-runner-spark-to-go-inventory.md) found sufficient for every concrete pipeline in the repo. `join` is deferred to v2; adding OpKindJoin later is wire-compatible. Validate() runs 11 structural checks (uniqueness, kind/config consistency, input arity by kind, source/terminal presence, cycle detection via 3-colour DFS, per-kind invariants) and returns every finding in one pass so the authoring UI can highlight all broken nodes simultaneously. The ValidationError shape is JSON-serialised and ready to flow back through the existing per-node validation endpoint pipeline-authoring-service exposes. No new dependencies; reuses libs/pipeline-expression for column types only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase C.1 of ADR-0045 — defines the typed operator-plan IR that replaces the free-form
--inline-sqlcontract the Spark-backed runner consumed. This is the foundational schema every other Phase C sub-PR depends on; merging it first unblocks parallel work on the interpreter (C.2), the build-service emitter (C.4), and the runner consumer (C.5).A
Planis a directed acyclic graph of typedOpnodes. EachOphas one of nine v1Kinds and exactly one populated per-kind config field.Inputsreference upstreamOp.IDs.Scope
libs/pipeline-runtime/(Phase C.2).pipeline-build-service(C.4) andpipeline-runner(C.5) follow.Keeping the schema independent means the contract package does not drag the Iceberg client and its dep graph (
apache/iceberg-go, thesubstrait-protobufpin from Phase A,gocloud, …) into every consumer's build.Operator set (v1)
read_tablelakekeeper.<ns>.<table>at the current snapshotfilterpipeline-expressionDSL string evaluates toBOOLEANprojectExpr= passthroughrenamefrom → tomappingcastpipeline-expression.PipelineTypeaggregateGROUP BY+ the seven canonical aggregations (sum,count,count_distinct,avg,stddev,min,max)unionlimitwrite_tablecreate_or_replace(matches Spark'sdf.writeTo(target).createOrReplace()) orappendjoinis deferred to v2. Phase 0's inventory (docs/migration/pipeline-runner-spark-to-go-inventory.md) found every concrete pipeline in the repo re-expressed with these nine; if a follow-up audit against productionpipeline_authoring.published_dagturns up a pipeline that needsjoin, addingOpKindJoinis wire-compatible.Validation
Plan.Validate()returns aValidationErrorsslice so the authoring UI can highlight every broken node in one pass rather than one error at a time. SentinelErrEmptyPlanis reachable viaerrors.Is. Eleven structural checks:read_table) have zero inputs.limit n > 0, aggregation function token known, write mode valid, non-empty column names, no duplicate target columns, etc.).Test plan
go build ./...(full repo)go vet ./...(full repo)go test -race ./libs/pipeline-plan/...— 4 sub-tests inplan_test.go+ 11 sub-test groups invalidate_test.gocovering every error branch:transactions_cleanandcustomer_metricsfrom the Phase 0 inventory, plus arename + cast + limitchain and a 2-sourceunionso every v1 op is exercised by at least one valid case)ErrEmptyPlanviaerrors.IsFollow-ups
libs/pipeline-runtime/— interpreter that executes aPlanagainst Iceberg via theapache/iceberg-goreads from Phase A and the HTTP append adapter from Phase B.aggregatefromplannedtoavailableinservices/pipeline-build-service/internal/handler/transform_catalog.go:528(catalog entry exists, runtime path is missing).services/pipeline-build-service/internal/spark/spark.go→ emitPlaninstead ofSparkApplicationCR; renamesparkpackage todispatch.services/pipeline-runner/internal/runner/run.go→ fetch + executePlan. Delete--inline-sqlflags.infra/dev/poc-pipeline-nodes.yaml(6 pipelines with SQL →PlanJSON) and update READMEs.🤖 Generated with Claude Code