Skip to content

Split LICENSE-binary / NOTICE-binary per module so each Docker image ships only the bundled deps #4667

@bobbai00

Description

@bobbai00

What happened?

Today every Docker image (texera-access-control-service, texera-config-service, texera-file-service, texera-workflow-compiling-service, texera-workflow-computing-unit-managing-service, texera-dashboard-service, texera-workflow-execution-coordinator, texera-workflow-execution-runner, texera-agent-service) ships the same monolithic root LICENSE-binary and NOTICE-binary containing the union of every third-party dependency across all services, frontend, agent-service, and Python.

The issue: the LICENSE in each image overstates what is actually bundled in that image. For example, texera-access-control-service ships only 116 third-party jars but its /texera/LICENSE references ~860 entries spanning frontend Angular packages, Python packages, agent-service npm packages, and jars only relevant to other services. ASF's licensing guidance is that the LICENSE in a binary distribution must describe the contents of that distribution.

How to reproduce?

  1. docker run --rm --entrypoint sh ghcr.io/apache/texera-access-control-service:latest -c 'cat /texera/LICENSE | grep "^ - " | wc -l'
  2. Compare with docker run --rm --entrypoint sh ghcr.io/apache/texera-access-control-service:latest -c 'ls /texera/lib/ | wc -l'
  3. The LICENSE references many more third-party components than the image actually bundles.

Expected behavior

Each Docker image's /texera/LICENSE should describe only the third-party components actually bundled in that image:

  • Standalone Scala services: only the jars under their lib/.
  • WorkflowExecutionService images (dashboard, coordinator, runner): the amber/ jars, plus Python deps for coordinator and runner, plus the frontend Angular bundle for the dashboard.
  • texera-agent-service: only the bun-installed npm packages.

Version

1.1.0-incubating (Pre-release/Master)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions