Skip to content

Add a DagRun and TaskInstance event listener under the Kafka provider#68082

Open
xBis7 wants to merge 1 commit into
apache:mainfrom
xBis7:kafka-listener
Open

Add a DagRun and TaskInstance event listener under the Kafka provider#68082
xBis7 wants to merge 1 commit into
apache:mainfrom
xBis7:kafka-listener

Conversation

@xBis7
Copy link
Copy Markdown
Contributor

@xBis7 xBis7 commented Jun 5, 2026


This PR is adding a listener under the Kafka provider which provides an implementation for the DagRun and TaskInstance state change event hooks.

The listener publishes a message to a pre-existing Kafka topic for every event. Each message has some extra metadata for the dag or task that the event belongs to.

DagRun events and TaskInstance events are separated and guarded behind different config flags. Both flags are disabled by default because each listener adds load to Airflow that users might not want.

The listener expects that the user has already created the topic and then defines it in the listener's config section. In case the topic doesn't exist, the listener doesn't fail. Instead it logs a warning and checks for the topic existence again after a configured interval.

Users can also filter events for dag runs based on dag_id and for tasks based on dag_id + task_id. For example, someone could choose to get only dag run state events for a particular dag and nothing else.

By adding a listener to the provider we can have dags consume from the Kafka topic and react to messages. When a particular message arrives, a deferred task could be triggered. This could also work for dags across multiple Airflow installations, all sharing the common topic. For that reason, I also added a source config key, so that users can distinguish where the messages came from.

The listener follows the same format as the openlineage one

https://github.com/apache/airflow/blob/main/providers/openlineage/src/airflow/providers/openlineage/plugins/listener.py

Testing

I've added unit tests and also a simple integration test that uses an actual broker. I've also tested the changes manually.

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)
    Claude Code, Opus 4.7

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

Comment thread airflow-core/tests/integration/otel/test_otel.py
@jscheffl jscheffl removed the backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch label Jun 5, 2026
@vikramkoka
Copy link
Copy Markdown
Contributor

I am confused by the use case here.

Shouldn't this be an Asset Watcher?

@xBis7 xBis7 force-pushed the kafka-listener branch from 46ac924 to ae76051 Compare June 5, 2026 21:52
@xBis7
Copy link
Copy Markdown
Contributor Author

xBis7 commented Jun 5, 2026

Shouldn't this be an Asset Watcher?

@vikramkoka No but it could be used together with an Asset Watcher. An Asset Watcher is essentially on the consumer side of events while this PR is on the producer side.

The new listener produces a message to a Kafka topic for every event state change. We can consume the messages from the topic either using the existing consumer hook from the Kafka provider or using an Asset Watcher.

For example, team A with Airflow installation A triggers dag1 and has enabled the listener which publishes events for every state change to a Kafka topic

  1. dag_run.dag1.running
  2. task1.running
  3. task1.success
  4. task2.running
  5. task2.success
  6. dag_run.dag1.success

Team B with Airflow installation B, is monitoring the same kafka topic and has a deferred task that waits to run after it consumes the message task1.success. The message consumption can happen using an Asset Watcher.

@xBis7 xBis7 force-pushed the kafka-listener branch from ae76051 to 093a2ed Compare June 6, 2026 07:10
@xBis7 xBis7 force-pushed the kafka-listener branch from 093a2ed to a7e4f81 Compare June 6, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants