feat(infra/dev): migrate PoC pipelines to pipeline-runner Job + Confi…#78
Merged
Merged
Conversation
…gMap (ADR-0045 Phase C.6)
Phase C.6 of ADR-0045 — the closing PR of Phase C. Migrates the dev
manifests in infra/dev/ from SparkApplication CRs that ran the
Scala pipeline-runner-spark JAR to batch/v1 Job + ConfigMap pairs
that run the Go pipeline-runner (Phase C.5) over a structured
pipelineplan.Plan (Phase C.1, executed by Phase C.2 against Iceberg).
Runner extension:
- internal/runner/args.go: new --plan-file flag. internal/runner/run.go
resolves the Plan from --plan-file > PIPELINE_PLAN_FILE env var >
PIPELINE_PLAN_B64 env var (the dispatcher's path). Dev YAMLs use
the file path so the Plan JSON stays readable in `kubectl describe
configmap`; the dispatcher (Phase C.4.a) keeps using the env var.
- Tests added for loadPlanFromFile happy path + missing file; one
existing test updated for the new "no Plan source" error message.
infra/dev manifests:
- poc-pipeline-nodes.yaml fully rewritten. Three pipelines ship as
Job + ConfigMap pairs:
* online-retail-clean (read → filter quantity>0 AND price>0 → project
+ derived revenue → write_table create_or_replace)
* online-retail-returns (same shape with WHERE quantity<0)
* online-retail-cust (read transactions_clean → aggregate group_by
customer_id with sum(revenue), count_distinct(invoice),
count_distinct(country) → write_table create_or_replace)
- spark-smoke.yaml → pipeline-runner-smoke.yaml: trivial one-row
read_table → limit 1 → write_table(append) for catalog
connectivity validation.
- spark-ingest-online-retail.yaml deleted. CSV DataSource API is
the Phase 0 inventory's flagged deferred case (re-route via
connector-management-service); no runner-side replacement.
- spark-rbac.yaml deleted. Spark Operator goes away in Phase D; the
pipeline-runner Jobs use the standard `pipeline-runner` service
account.
online-retail-anomalies is intentionally NOT migrated:
the CTE + CROSS JOIN shape needs the v2 `join` operator
(libs/pipeline-plan only ships union in v1, per ADR-0045 § Migration
plan / Phase C). When `join` lands the pipeline becomes a two-stage
decomposition; the inventory doc tracks it.
Documentation:
- services/pipeline-runner/README.md fully rewritten for the Phase
C surface (Plan-source priority order, CLI flags, env fallbacks,
Phase A discoveries bundled with IcebergReader, distroless image
size delta, dev manifest pointers).
- services/pipeline-runner-spark/README.md gains a SUPERSEDED
header pointing at the new flow + migration map for the three
Scala mains (PipelineRunner → pipeline-runner; IcebergToObjectStoreIndexer
→ iceberg-object-indexer (Phase A); ActionLogStreamSink →
action-log-sink (Phase B)).
- docs/migration/pipeline-runner-spark-to-go-inventory.md gains a §G
with the sub-PR map and the pre-requisite checklist showing what's
done (#1, #3) vs what's still pending (#2 CSV re-route, #4 live
smoke).
Test plan:
- go vet ./... clean across the full repo.
- go test -count=1 ./services/pipeline-runner/... green (all three
packages; new loadPlanFromFile tests + the updated empty-env
assertion).
- The dev YAMLs themselves are validated by kubectl apply against
the dev k3s cluster — that smoke is the Phase D entry gate per
ADR-0045 § Migration plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…gMap (ADR-0045 Phase C.6)
Phase C.6 of ADR-0045 — the closing PR of Phase C. Migrates the dev manifests in infra/dev/ from SparkApplication CRs that ran the Scala pipeline-runner-spark JAR to batch/v1 Job + ConfigMap pairs that run the Go pipeline-runner (Phase C.5) over a structured pipelineplan.Plan (Phase C.1, executed by Phase C.2 against Iceberg).
Runner extension:
kubectl describe configmap; the dispatcher (Phase C.4.a) keeps using the env var.infra/dev manifests:
pipeline-runnerservice account.online-retail-anomalies is intentionally NOT migrated: the CTE + CROSS JOIN shape needs the v2
joinoperator (libs/pipeline-plan only ships union in v1, per ADR-0045 § Migration plan / Phase C). Whenjoinlands the pipeline becomes a two-stage decomposition; the inventory doc tracks it.Documentation:
Test plan: