Skip to content

Split amber CI tests into amber + amber-integration jobs #4870

@Yicong-Huang

Description

@Yicong-Huang

Task Summary

The amber CI job currently runs every Scala test in WorkflowExecutionService (66 spec files) inside a single matrix entry that always installs both Scala and Python dependencies. Only a handful of tests actually need Python at runtime (they spawn Python UDF workers via the e2e harness); the rest are pure-Scala unit tests that pay for the Python install on every run and conflate "needs Python" failures with engine-internal regressions.

Split into two jobs, incrementally:

  1. Add a class-level ScalaTest tag annotation @IntegrationTest (FQN org.apache.texera.amber.tags.IntegrationTest) under amber/src/test/scala/.... ScalaTest will pick this up via its tag annotation machinery, so no per-test taggedAs(...) is required.
  2. Introduce a new amber-integration job in build.yml that mirrors the existing amber job's setup (JDK + sbt + Postgres) plus Python and runs only tests tagged IntegrationTest: sbt 'WorkflowExecutionService/testOnly * -- -n org.apache.texera.amber.tags.IntegrationTest'.
  3. Modify the existing amber job to skip the same tag (-l ...) and drop its Python setup.
  4. Wire run_amber_integration through precheck / required-checks.yml so the new job is gated identically to amber.
  5. As the first migration, annotate engine/e2e/ReconfigurationSpec.scala (5 tests, the only e2e spec that actually spawns Python UDFs). Other e2e specs (DataProcessingSpec, PauseSpec, BatchSizePropagationSpec, PythonWorkflowWorkerSpec) can move in follow-up PRs as they are reviewed.

Task Type

  • Refactor / Cleanup
  • DevOps / Deployment / CI

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions