Skip to content

Add Coordinator Layer and Java Provider#65958

Draft
jason810496 wants to merge 48 commits intoapache:mainfrom
astronomer:task-sdk/feature/coordinator-interface
Draft

Add Coordinator Layer and Java Provider#65958
jason810496 wants to merge 48 commits intoapache:mainfrom
astronomer:task-sdk/feature/coordinator-interface

Conversation

@jason810496
Copy link
Copy Markdown
Member

@jason810496 jason810496 commented Apr 27, 2026

Add Coordinator Layer and Java Provider

  1. Add Java SDK #65956
  2. Add Coordinator Layer and Java Provider #65958 (this PR)
  3. Add CI, E2E Tests, and Pre-commit Hooks for Java SDK #65959
  • Try it out: A combined PoC branch with all changes cherry-picked is available at [DON'T MERGE] Java SDK All #65960 for reviewers who want to test the full integration end-to-end.

Why

Airflow's DAG file processor and task runner only understand Python. To run DAGs and tasks authored in other languages (Java now, Go/Rust later), both the parsing pipeline and the execution pipeline need a language-agnostic extension point that delegates to an external runtime subprocess.

How

The Coordinator Abstraction

A new BaseCoordinator base class in the Task SDK (task-sdk/src/airflow/sdk/execution_time/coordinator.py) defines the extension point. Language providers subclass it and implement three methods:

Method Purpose
can_handle_dag_file(bundle_name, path) File Discovery (e.g., "is this a valid JAR that we can parse?")
dag_parsing_runtime_cmd(...) Returns the subprocess command for DAG parsing
task_execution_runtime_cmd(...) Returns the subprocess command for task execution

The base class owns the full subprocess lifecycle: TCP server creation, subprocess spawning, connection acceptance, and a selector-based byte-forwarding bridge between the Airflow supervisor (fd 0) and the language runtime (TCP socket). The shared I/O loop is extracted into selector_loop.py and reused by WatchedSubprocess.

Discovery and Routing

Providers register coordinators in provider.yaml under a new coordinators key. ProvidersManager (airflow-core) and ProvidersManagerTaskRuntime (task-sdk) both discover them:

  • DAG Parsing: DagFileProcessorProcess._resolve_processor_target() iterates registered coordinators — the first whose can_handle_dag_file() returns True handles the file.
  • Task Execution: task_runner._resolve_runtime_entrypoint() uses a two-step resolution: first it consults the [sdk] queue_to_sdk mapping (queue name to coordinator runtime name), then it falls back to matching DAG file extensions against registered coordinators.

Queue-Based Runtime Routing

Tasks are routed to non-Python runtimes via their queue assignment and a configuration mapping. Operators set queue="java-queue" (or any custom queue name), and the [sdk] queue_to_sdk config maps queue names to coordinator runtime names:

[sdk]
queue_to_sdk = {"java-queue": "java"}

This avoids adding new columns or API fields -- the existing queue field carries the routing signal from scheduling to execution, and the mapping is resolved at task execution time.

Java Provider

A new apache-airflow-providers-sdk-java provider implements JavaCoordinator:

  • can_handle_dag_file: checks if the file is a JAR with valid Airflow Java SDK manifest attributes
  • dag_parsing_runtime_cmd: constructs java -classpath <bundle>/* <MainClass> --comm=... --logs=...
  • task_execution_runtime_cmd: handles both pure Java DAGs (JAR path) and Python stub DAGs (resolves bundle from [java] bundles_folder config)
  • get_code_from_file: extracts embedded .java source from the JAR for Airflow UI display

What

Task SDK (task-sdk/)

  • Add BaseCoordinator abstract base class with full subprocess bridge lifecycle
  • Add selector_loop.py — shared selector-based I/O utilities, refactored out of supervisor.py
  • Add _resolve_runtime_entrypoint() to task_runner.py with queue-based and file-extension-based dispatch
  • Add QueueToCoordinatorMapper for resolving queue names to coordinators via [sdk] queue_to_sdk config
  • Extract resolve_bundle() helper for reuse by both Python and coordinator paths
  • Register coordinators discovery in ProvidersManagerTaskRuntime

Airflow Core (airflow-core/)

  • Add [sdk] queue_to_sdk configuration option for queue-to-runtime mapping
  • Extend DagFileProcessorProcess.start() with _resolve_processor_target() for coordinator delegation
  • Extend DagFileProcessorManager to recognize runtime file extensions (e.g., .jar) and skip ZIP inspection for them
  • Extend DagCode.get_code_from_file() to delegate to coordinator's get_code_from_file()
  • Add coordinators extension point to provider.yaml.schema.json and provider_info.schema.json
  • Register coordinators discovery in ProvidersManager

Java Provider (providers/sdk/java/)

  • Add JavaCoordinator with DAG parsing, task execution, and code extraction
  • Add BundleScanner for JAR manifest inspection and bundle resolution
  • Add provider.yaml with coordinators registration and [java] bundles_folder config
  • Add provider packaging (pyproject.toml, docs, LICENSE, NOTICE)
  • Add java_sdk_setup.sh for Breeze development environment

Was generative AI tooling used to co-author this PR?

Co-authored-by: Tzu-ping Chung uranusjr@gmail.com

@jason810496 jason810496 removed the backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch label Apr 27, 2026
@jason810496 jason810496 self-assigned this Apr 27, 2026
@uranusjr uranusjr added AIP-108: java-sdk Change this to an 'area:' label after AIP acceptance. AIP-108: Coordinator Change this to an 'area:' label after AIP acceptance. labels Apr 28, 2026
Comment thread task-sdk/src/airflow/sdk/definitions/mappedoperator.py Outdated
@jason810496 jason810496 force-pushed the task-sdk/feature/coordinator-interface branch from 688d569 to 59d5a47 Compare April 28, 2026 11:02
Multi-Language extras
=====================

These are extras that add dependencies needed for integration with other languages runtimes. Currently we have only Java SDK related extra, but in the future we might add more extras related to other languages runtimes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go SDK would not be listed here?

Copy link
Copy Markdown
Member Author

@jason810496 jason810496 Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the go-sdk adapt the coordinator interface as a provider, I will update the description here to avoid the confusion.

Comment thread airflow-core/src/airflow/config_templates/config.yml Outdated
from airflow.providers_manager import ProvidersManager

extensions: list[str] = []
for coordinator_cls in ProvidersManager().coordinators:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we assume that (if multiple) all Dag Parsers load the language interpreters? I could well imagine I spin one (or multiple) Dag parser for Python and one additional for Java - then would deploy the JAR and JDK only to the instances where needed... and on the Python Dag parser would add the GitSyncBundle... (which on the Java side probably is not used).

Not sure if everybody likes to deploy a JDK into each Dag parser environment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer is similar to #65956 (comment) comment.

Only if the dag-processor install the target sdk.<lang> provider will enable the dag-parsing for pure--dag.

Comment thread task-sdk/src/airflow/sdk/execution_time/workloads/task.py

def _start_server() -> socket.socket:
"""Create a TCP server socket bound to a random port on localhost."""
server = socket.socket()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in the other PR - I am sceptcal that TCP sockets should be used as well as I do not think it is a good idea defining a proprietary protocol.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, the Python implementation in use since 3.0 already uses the same mechnism. (It just creates the TCP sockets in another way.)

@jason810496 jason810496 force-pushed the task-sdk/feature/coordinator-interface branch 6 times, most recently from d2f28c8 to 52dcb2a Compare April 30, 2026 14:33
Copy link
Copy Markdown
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for many other tests

],
)
@time_machine.travel("2025-01-01 00:00:00", tick=False)
@time_machine.travel(datetime(2025, 1, 1, 0, 0, 0, tzinfo=timezone.utc), tick=False)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@time_machine.travel(datetime(2025, 1, 1, 0, 0, 0, tzinfo=timezone.utc), tick=False)
@time_machine.travel(datetime(2025, 1, 1, tzinfo=timezone.utc), tick=False)

jason810496 added 18 commits May 4, 2026 15:52
- Update JavaCoordinator to use TaskInstanceDTO
- add compatibility check for Airflow >= 3.3.0
- Updated the Airflow issue template to include 'sdk-java' as an option.
- Added unit tests for JavaCoordinator functionality.
- Created a new test file for Java bundle scanning.
- Updated uv.lock to reflect new dependency requirements for tomli.
Replace TaskInstance with TaskInstanceDTO in StartupDetails fixtures
and add the required pool_slots, queue, and priority_weight fields.
DagCode.get_code_from_file probes every coordinator's can_handle_dag_file
on each fileloc, including .py paths nested inside ZIP DAGs (e.g.
test_zip.zip/test_zip.py). The Java coordinator opened these as JAR
files, raising NotADirectoryError because the parent path is a ZIP file
rather than a directory. Short-circuit on the .jar suffix and add
NotADirectoryError to the suppressed exceptions for safety.
The config.yml description duplicated the example field as a literal
"Example:" line in the description text. With --include-descriptions
this rendered as "# Example:", which trips
test_cli_show_config_shows_descriptions. The example is already in the
dedicated example field, so remove the duplicate from the description.
apache-airflow-providers-sdk-java requires apache-airflow>=3.3.0, so
installing it against the 2.11.1 / 3.0.6 / 3.1.8 / 3.2.1 compat
targets fails dependency resolution. Add it to remove-providers for
each older-Airflow row in PROVIDERS_COMPATIBILITY_TESTS_MATRIX.

Also silence mypy no-redef on dev/registry tomli fallback imports,
which now trip the mypy-dev hook because tomli is resolvable in the
mypy environment after recent uv.lock updates.
Import TaskInstanceDTO from the same airflow.sdk._shared.workloads
namespace that BaseCoordinator uses. The previous import via
airflow._shared.workloads pointed at the same physical file via a
symlink but mypy treated the two namespaces as distinct types,
flagging the override as a Liskov violation.
* Add 'sdk' to empty_subpackages in provider_conf so the autoapi-
  generated _api/airflow/providers/sdk/index.rst is excluded the
  same way the other namespace-only directories are. Without this,
  Sphinx warned that the document was not in any toctree.
* Fix the relative include paths in security.rst and installing-
  providers-from-sources.rst. Nested providers (those under a
  namespace package like sdk/) sit one directory deeper than
  flat providers, so the include needs four ../ segments instead
  of three to reach devel-common/src/sphinx_exts/includes/.
- Removed the shared workloads dependency from pyproject.toml and related files.
- Deleted the workloads directory and its references in the codebase.
- Refactored imports of TaskInstanceDTO to point to the new location in execution_time.workloads.task.
- Introduced new files for TaskInstanceDTO and its base class in the execution_time module.
- Updated tests to reflect the changes in TaskInstanceDTO imports.
@jason810496 jason810496 force-pushed the task-sdk/feature/coordinator-interface branch from 52dcb2a to 7f76aad Compare May 4, 2026 07:53
@eladkal
Copy link
Copy Markdown
Contributor

eladkal commented May 4, 2026

I may be late to join the party and it may have already been discussed but I don't understand the Java provider part?
We don't have Go provider.
Since we previously took some wrong desicions about providers (fab?) I want to verify that having Java provider is something we want to have. We should also account for the question: is it confusing for users?
Providers will appear in the doc registry, so users will find Java there but not Go.

Copy link
Copy Markdown
Member Author

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be late to join the party and it may have already been discussed but I don't understand the Java provider part? We don't have Go provider.

Hi Elad, thanks for the review and we're still welcome in :)

Since we previously took some wrong desicions about providers (fab?) I want to verify that having Java provider is something we want to have.

I just copied from the Dev List:

From my perspective, having each language implementation as a provider decouples the release lifecycle for each language. The core interface still lives in TaskSDK (sdk/execution_time/coordinator.py), and each language SDK just needs to implement a few methods to "translate" how to start the subprocess for the target language.

We should also account for the question: is it confusing for users? Providers will appear in the doc registry, so users will find Java there but not Go.

The Go-SDK "works" but only via Edge-Worker plus EdgeExecutor approach. However, the new coordinator interface works with any Executor (local, celery, kubernetes, etc) as we add the interface in TaskSDK layer!

You're right. It will only be confusing for "now", as we haven't raised the refactor PRs for Go-SDK and for the corresponding provider yet. (They're already WIP, we will raise them out after the current PR settle down)

According to the timeline, both Java-SDK and Go-SDK (with the coordinator interface refactor and the provider) will land in 3.3.

@ashb
Copy link
Copy Markdown
Member

ashb commented May 5, 2026

I wonder if this could be apache-airflow-coordinator-java as the PyPI package name, and airflow.coordinator.java.* as the module names?

I haven't been following exactly, but it's just imported as a class name right?

@jscheffl
Copy link
Copy Markdown
Contributor

jscheffl commented May 5, 2026

I wonder if this could be apache-airflow-coordinator-java as the PyPI package name, and airflow.coordinator.java.* as the module names?

I haven't been following exactly, but it's just imported as a class name right?

Love the idea - starting early to mark this a new provider type!

@uranusjr
Copy link
Copy Markdown
Member

uranusjr commented May 5, 2026

Should the subproject still be in providers?

@jason810496
Copy link
Copy Markdown
Member Author

jason810496 commented May 6, 2026

I wonder if this could be apache-airflow-coordinator-java as the PyPI package name, and airflow.coordinator.java.* as the module names?

I haven't been following exactly, but it's just imported as a class name right?

There're several thoughts come to my mind with this direction.
Could you deliberate more on it when you have a moment? @ashb
Thanks.

a) Regarding the public interface for Airflow-Core / TaskSDK to interact with Coordinator:

  1. Using existing ProvidersManager (keep it as-is since the ProvidersManager is the only interface now to interact with the optional dependencies installed by users now)
  2. Introduce new CoordinaterManager (probably live in _shared lib as both Airflow-Core and TaskSDK required it)

b) Regarding the module structure:

  1. Is it just name of smaller subset of "provider" (or a new provider type as @jscheffl said) but the coordinator implementations (subclasses) still live in providers/ (like what @uranusjr said)
airflow-core/
task-sdk/
providers/
├── airbyte
├── akeyless
├── ...
├── coordinators/
          ├── executable
          ├── java
  1. The coordinator implementations (subclasses) will live in a new coordinators folder
airflow-core/
coordinators/
task-sdk/
providers/

@eladkal
Copy link
Copy Markdown
Contributor

eladkal commented May 6, 2026

I am still confused about why coordinators needs to be under providers. It feels the only motivation for it is to levrage that providers have seperated release cycle from core. Please note that our Python Client also has seperated release from core. We can establish a similar release process for coordinators as well. I really don't see the user value in scoping coordinators under providers.

@ashb
Copy link
Copy Markdown
Member

ashb commented May 6, 2026

Mostly my reason for not wanting the coordinators to be live under "providers" is that 99% of those are "things you use in a python dag" and I'd like us to keep that way and move more things that way (i.e. move Edge and Celery out of providers namespace over time).

As Elad said, they can be released separately, that doesn't require them to be in providers.

Given all that, I'm thinking of this sort of repo layout:

sdk/
    coordinators/
        java/
    python/
        src/airflow/sdk/...
    java/
        src/java/org/apache/airflow/...
    golang/

(As we otherwise start to have more and more top level folders and that is getting messy)

@jason810496
Copy link
Copy Markdown
Member Author

jason810496 commented May 6, 2026

If users install the apache-airflow-coordinators-java properly, the Airflow-Core and TaskSDK should be able to import airflow.sdk.coordinators.java.JavaCoordinator with import_string properly. (Instead of involving heavy ProvidersManager at all)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AIP-108: Coordinator Change this to an 'area:' label after AIP acceptance. AIP-108: java-sdk Change this to an 'area:' label after AIP acceptance. area:ConfigTemplates area:DAG-processing area:dev-tools area:Executors-core LocalExecutor & SequentialExecutor area:providers area:task-sdk kind:documentation provider:standard

Development

Successfully merging this pull request may close these issues.

6 participants