Skip to content

Emit OpenTelemetry spans from airflow CLI entry points#66789

Open
1fanwang wants to merge 2 commits into
apache:mainfrom
1fanwang:feat/cli-otel-spans
Open

Emit OpenTelemetry spans from airflow CLI entry points#66789
1fanwang wants to merge 2 commits into
apache:mainfrom
1fanwang:feat/cli-otel-spans

Conversation

@1fanwang
Copy link
Copy Markdown
Contributor

@1fanwang 1fanwang commented May 12, 2026

Observability gap on our LinkedIn DI Airflow setup: scheduler and worker spans flow through our OTel pipeline, but operations issued via the airflow CLI (manual triggers, backfills, ad-hoc clears) don't emit spans. So when someone reports an unexpected Dag state, we can't trace it back through OTel to find the CLI command that caused it. This PR emits spans at the CLI entry points.

AIP-59 added OpenTelemetry traces inside the scheduler, dag processor, and task supervisor. The CLI entry points that drive those subsystems (airflow tasks test, airflow dags trigger, airflow dags test, airflow backfill create) are not wrapped in spans, so when one of these commands is invoked from a wrapper that already has trace context — a CI pipeline, a parent workflow, a debug harness — the inbound trace dies at the CLI binary and the resulting task and DagRun spans show up as a separate trace. This PR wires those four entry points into the existing tracer so the caller's trace and Airflow's downstream spans stitch into a single distributed trace.

Problem

airflow tasks test ... shelled out from a developer's terminal or airflow dags trigger ... issued by an external orchestrator both create a meaningful unit of work inside Airflow. Today there's no span to anchor that work to the caller's trace. The caller has to manually correlate trace IDs via logs, or accept a broken trace boundary at the CLI.

Fix

Add a small cli_span context-manager helper to airflow_shared.observability.traces (next to the existing AIP-59 tracer setup). It reads TRACEPARENT (and optionally TRACESTATE) from the environment using the W3C TraceContext propagator, opens a span parented to that context, and yields. When the env vars are absent it produces a root span; when OTel is not configured it falls back to the global no-op tracer and stays inert.

Apply the helper at four CLI entry points:

  • airflow.cli.commands.task_command.task_test → span cli.tasks.test
  • airflow.cli.commands.dag_command.dag_trigger → span cli.dags.trigger
  • airflow.cli.commands.dag_command.dag_test → span cli.dags.test
  • airflow.cli.commands.backfill_command.create_backfill → span cli.backfill.create

Each span carries airflow.dag_id plus the most useful context for that command (task_id, run_id, logical_date, dry_run flag).

Reproducer

Without this PR:

export OTEL_TRACES_EXPORTER=console
export AIRFLOW__TRACES__OTEL_ON=True
TRACEPARENT="00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01" \
  airflow dags test example_bash_operator

The console exporter prints task spans, but they're under a fresh root trace — the 0af76519… trace id from TRACEPARENT is not used. With this PR, the same invocation emits a cli.dags.test span as a child of b7ad6b7169203331, and the downstream task spans share the trace id from the caller.

Tests

  • shared/observability/tests/observability/test_traces.py — unit tests for the helper: traceparent extraction, tracestate propagation, malformed-header tolerance, env-var fallback, no-op-tracer safety.
  • airflow-core/tests/unit/cli/commands/test_cli_trace.py — integration tests for the four CLI entry points, asserting both the no-traceparent and with-traceparent paths via an InMemorySpanExporter.

16 tests, all passing. The existing CLI tests for task_test, dag_test, dag_trigger, and create_backfill still pass unchanged.

Notes

  • The helper is intentionally additive — if otel_on is false (the default), the global no-op tracer kicks in and the wrapper has effectively zero overhead.
  • Only the CLI entry points that initiate work (tasks test, dags trigger, dags test, backfill create) are wrapped. Read-only commands (dags list, tasks state, etc.) don't need a span — they're not the kind of operation a caller threads trace context through.
  • I deliberately did not wrap the helper around the whole action_cli decorator: that would emit a span for every CLI invocation including airflow info, airflow db check, etc., which is more noise than value. The explicit per-entry-point wiring keeps the span set meaningful.

Evidence

Captured against this branch by driving airflow dags trigger example_bash_operator --run-id repro_run through the public CLI entry point with an InMemorySpanExporter installed, TRACEPARENT=00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01 in the environment, and the metadata-DB-bound get_current_api_client mocked so the run completes without DB initialization. For the before pass the same entry point is invoked with cli_span patched to a no-op context manager, simulating the pre-fix code path with everything else identical.

Before (no cli_span wrapping the entry point):

(no airflow.cli.* spans emitted)

After (this PR):

{
  "name": "cli.dags.trigger",
  "trace_id": "0af7651916cd43dd8448eb211c80319c",
  "span_id": "69f1f3e7418b8123",
  "parent_span_id": "b7ad6b7169203331",
  "attributes": {
    "airflow.dag_id": "example_bash_operator",
    "airflow.dag_run.run_id": "repro_run"
  }
}

The span's trace_id matches the inbound TRACEPARENT trace id and its parent_span_id matches the inbound parent span id, confirming W3C context propagation from caller into Airflow. The same wiring covers cli.dags.test, cli.tasks.test, and cli.backfill.create via the shared cli_span helper, and airflow-core/tests/unit/cli/commands/test_cli_trace.py asserts the same shape for each.

Closes #66906.

1fanwang added 2 commits May 13, 2026 11:52
Wrap airflow tasks test, airflow dags trigger, airflow dags test,
and airflow backfill create with a span context that honours W3C
TRACEPARENT / TRACESTATE environment variables.

This lets an external caller (a CI step, parent workflow, or debug harness
running a DAG locally) propagate trace context into Airflow. Without the
wrapper a trace started by the caller terminates at the CLI binary and the
downstream task / DagRun spans show up as a separate trace.

The new `cli_span` helper lives next to the existing AIP-59 tracer setup
in `airflow_shared.observability.traces` so the CLI surface and the
scheduler / API server share one extraction path.
The CLI entry-point span tests drive real airflow CLI subcommands
(`dags trigger`, `dags test`, `tasks test`, `backfill create`)
which touch the metadata DB, so they must run in the DB-tests job, not
the Non-DB job that sets _AIRFLOW_SKIP_DB_TESTS.

Signed-off-by: 1fanwang <1fannnw@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:CLI ready for maintainer review Set after triaging when all criteria pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CLI commands don't emit OTel spans — breaks trace propagation from external orchestrators

2 participants