Skip to content

feat: route computing-unit metadata over HTTP, off Postgres#5298

Draft
bobbai00 wants to merge 4 commits into
apache:mainfrom
bobbai00:feat/cu-postgres-decoupling
Draft

feat: route computing-unit metadata over HTTP, off Postgres#5298
bobbai00 wants to merge 4 commits into
apache:mainfrom
bobbai00:feat/cu-postgres-decoupling

Conversation

@bobbai00
Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

The computing unit (which runs user-defined functions) no longer needs direct Postgres access. Execution-metadata operations route to the Dashboard Service (TexeraWebApplication) via a new /api/internal/execution-metadata/* API (InternalExecutionMetadataResource) and an HTTP client (RemoteExecutionMetadata); dataset-path resolution routes to file-service's new GET /api/dataset/resolve (RemoteDatasetResolver + a one-line dispatch in FileResolver). Routing engages when SqlServer is uninitialized (the CU) and the forwarded user JWT is present; ComputingUnitMaster skips SqlServer.initConnection and DB cleanup when EXECUTION_METADATA_REMOTE=true. No behavior change with the flag off — the Dashboard Service keeps the direct-DB path. Also adds SqlServer.isInitialized and the new env-var names.

Any related issues, documentation, discussions?

Part of #5011

How was this PR tested?

Unit tests for both HTTP clients (RemoteDatasetResolverSpec, RemoteExecutionMetadataSpec) using in-process HTTP servers — positive, 404→None, and error-status paths. Manually end-to-end: launched the full stack with the CU master in EXECUTION_METADATA_REMOTE=true mode (no STORAGE_JDBC_*; verified no Hikari/SqlServer init) against a Lakekeeper REST catalog; the IMDB example workflows (Movies, Iris) and workflow 2568 ran to completion with every execution-metadata and dataset-resolution call served over HTTP (confirmed in the Dashboard Service / file-service request logs).

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8 (1M context)

The computing unit runs user-defined functions yet shipped with Postgres
credentials (issue apache#5011). Add an opt-in path so the CU performs no direct
JDBC access:

- Execution metadata (create execution; runtime-stats / console / result
  URIs; latest-execution and result-URI lookup) routes to the Dashboard
  Service via a new /api/internal/execution-metadata/* API and an HTTP
  client; dataset-path resolution routes to file-service's new
  /api/dataset/resolve. Both forward the user's JWT.
- Routing is active when SqlServer is uninitialized (the CU) and the user
  token is present; ComputingUnitMaster skips SqlServer.initConnection and
  DB cleanup when EXECUTION_METADATA_REMOTE=true. No behavior change with
  the flag off — the Dashboard Service keeps the direct-DB path.

Unit-tested both HTTP clients (RemoteDatasetResolver, RemoteExecutionMetadata).
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 30, 2026

Codecov Report

❌ Patch coverage is 22.95918% with 151 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.02%. Comparing base (59f776c) to head (a6c34ba).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...e/texera/web/service/RemoteExecutionMetadata.scala 25.92% 38 Missing and 2 partials ⚠️
...b/resource/InternalExecutionMetadataResource.scala 0.00% 39 Missing ⚠️
...ard/user/workflow/WorkflowExecutionsResource.scala 40.47% 15 Missing and 10 partials ⚠️
...la/org/apache/texera/web/ComputingUnitMaster.scala 0.00% 19 Missing ⚠️
...web/service/ExecutionsMetadataPersistService.scala 0.00% 9 Missing and 3 partials ⚠️
...era/amber/core/storage/RemoteDatasetResolver.scala 60.86% 8 Missing and 1 partial ⚠️
...ache/texera/service/resource/DatasetResource.scala 0.00% 3 Missing ⚠️
...pache/texera/amber/core/storage/FileResolver.scala 0.00% 1 Missing and 1 partial ⚠️
...a/org/apache/texera/web/TexeraWebApplication.scala 0.00% 1 Missing ⚠️
...c/main/scala/org/apache/texera/dao/SqlServer.scala 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5298      +/-   ##
============================================
- Coverage     49.17%   49.02%   -0.16%     
+ Complexity     2386     2380       -6     
============================================
  Files          1051     1054       +3     
  Lines         40350    40516     +166     
  Branches       4279     4313      +34     
============================================
+ Hits          19841    19861      +20     
- Misses        19352    19476     +124     
- Partials       1157     1179      +22     
Flag Coverage Δ *Carryforward flag
access-control-service 41.89% <ø> (ø)
agent-service 33.76% <ø> (ø) Carriedforward from 59f776c
amber 51.26% <23.31%> (-0.42%) ⬇️
computing-unit-managing-service 0.00% <ø> (ø)
config-service 0.00% <ø> (ø)
file-service 38.12% <0.00%> (-0.31%) ⬇️
frontend 41.07% <ø> (ø) Carriedforward from 59f776c
python 90.79% <ø> (ø) Carriedforward from 59f776c
workflow-compiling-service 56.81% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bobbai00 bobbai00 marked this pull request as draft May 30, 2026 16:37
Bob Bai added 3 commits May 30, 2026 14:40
…ing-service

The computing unit re-compiled logical plans in-process (issue apache#5011), keeping
amber's WorkflowCompiler on the execution path. Route compilation to the
workflow-compiling-service over HTTP so the CU just runs a ready-made plan:

- WorkflowExecutionService.executeWorkflow POSTs the logical plan to
  /api/compile (CompilingServiceClient) and runs the returned PhysicalPlan; the
  runtime no longer needs the logical plan (Workflow holds None). A failed
  compile now reports the error and returns instead of dereferencing a null
  workflow. The dead in-process validateWorkflow is removed.
- PhysicalPlan is made JSON-serializable end to end: operator partition logic
  moves from closures to a DerivePartitionSpec ADT, with custom serdes for
  OpExecInitInfo, LocationPreference, OutputMode, and per-port PhysicalOp views,
  registered in a shared PhysicalPlanSerdeModule. WorkflowCompilationResource
  accepts workflow/execution ids and returns a typed success/failure response.
- Fix a latent LogicalLink round-trip (issue apache#5042): fromOpId / toOpId now
  serialize as bare strings (the shape the @JsonCreator constructor reads), so a
  re-serialized plan deserializes again.

Adds round-trip serde tests (hand-built, compiler-produced, and a thorough
multi-operator suite) and the LogicalLink round-trip regression.
amber and the workflow-compiling-service each carried a near-duplicate WorkflowCompiler and
logical-plan model (LogicalPlan, LogicalPlanPojo, LogicalLink) — the duplication was even
flagged in-code ("we should consider merge this compile with WorkflowCompilingService's").
Move the canonical pieces into the shared workflow-operator module:

- org.apache.texera.amber.compiler.model.{LogicalLink, LogicalPlan, LogicalPlanPojo} now live
  in workflow-operator as the single copy (the LogicalLink string serializer from apache#5042 is
  preserved). The duplicates in amber and workflow-compiling-service are deleted.
- PhysicalPlanExpander.expand holds the logical-to-physical expansion the two compilers shared.
- Each compiler is now a thin wrapper over it: amber's adds result-storage planning and builds
  a runtime Workflow (carrying the logical plan) for execution; the compiling-service's adds
  output-schema collection and error reporting for the editor.

amber's references migrate to the shared model package. No behavior change: the logical-link
round-trip, compiler-produced physical-plan serde, and storage-port collection specs all pass.
… JWT auth

The client (frontend and agent service) now compiles the workflow against the
workflow-compiling-service and ships the ready-to-run PhysicalPlan to the
Computing Unit, which runs it directly — no in-process or HTTP compilation, and
no JWT authentication, so the CU no longer needs the JWT secret (issue apache#5011).

Frontend:
- ExecuteWorkflowService compiles via WorkflowCompilingService on Run and sends
  WorkflowExecuteRequest{physicalPlan, opsToViewResult}; on a compile failure it
  surfaces the error and does not start a run.
- The workflow websocket URL no longer carries an access token.

Computing Unit:
- ComputingUnitMaster drops setupJwtAuth and RolesAllowedDynamicFeature, keeping
  only the SessionUser value-factory binder so @Auth parameters on co-registered
  dashboard resources stay injectable; it registers PhysicalPlanSerdeModule.
- ServletAwareConfigurator's single-node handshake no longer parses a token.
- SyncExecutionResource (/run) accepts a PhysicalPlan and drops @Auth/@RolesAllowed.
- WorkflowExecutionService runs request.physicalPlan; InternalExecutionMetadataResource
  falls back to the metadata caller's uid when the CU sends none.

Cleanup:
- Remove the now-unused CompilingServiceClient and WORKFLOW_COMPILING_SERVICE_ENDPOINT
  env var left from the abandoned CU-side compilation offload.

Test:
- Add ClientPhysicalPlanRequestSpec covering the physical-plan request round-trip.
@github-actions github-actions Bot added frontend Changes related to the frontend GUI amber-integration labels Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

amber-integration common engine feature frontend Changes related to the frontend GUI platform Non-amber Scala service paths

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants