Render Dataset Conditions in DAG Graph view #41137

bbovenzi · 2024-07-30T21:43:10Z

Before we were just rendering a json object of the any/all conditions of dataset events for a dag to run. Now, we interpret that and render it in the graph view with logical gates.

Datasets that actually had events will be highlighted with a different border so it's easy to see what triggered the selected run.

Also, added a check to still create a dataset node if there is a dataset event even if the getDatasets endpoint didnt return anything.

These are both workarounds. It would best to refactor the dag graph python code to accept a with_datasets param and handle this logic and include dataset aliases too. Hopefully, I can make that a follow-up PR.

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Lee-W · 2024-07-31T04:03:57Z

The UI looks great! I tested it with the following dag but couldn't see the graph. Is there anything I missed? Thanks!

from __future__ import annotations

import pendulum

from airflow import DAG
from airflow.datasets import Dataset
from airflow.decorators import task

with DAG(
    dag_id="issue_856",
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
    schedule=Dataset("s3://bucket/my-task") | Dataset("2") & Dataset("3") | (Dataset("4") & Dataset("5")),
    catchup=False,
    tags=["producer", "dataset"],
):

    @task
    def produce_dataset_events():
        pass

    produce_dataset_events()

eladkal

Nice!

bbovenzi · 2024-07-31T16:39:49Z

The UI looks great! I tested it with the following dag but couldn't see the graph. Is there anything I missed? Thanks!

This is on the DAG graph not the datasets dependency graph. Can you share a screenshot?

Lee-W · 2024-08-01T01:26:18Z

Yep, it looks like this.

with DAG(
    dag_id="issue_consumer",
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
    schedule=Dataset("1") | Dataset("2"),
    catchup=False,
    tags=["consumer", "dataset"],
):
    ...

Lee-W · 2024-08-01T01:27:18Z

oh, I finally got what you mean!
It looks super cool. Thanks @bbovenzi !

bbovenzi added 2 commits July 30, 2024 17:33

Render dataset expression

efabb20

Only fetch events if a run is selected

c7392c8

bbovenzi added this to the Airflow 2.10.0 milestone Jul 30, 2024

bbovenzi requested review from ryanahamilton, ashb and pierrejeambrun as code owners July 30, 2024 21:43

boring-cyborg bot added area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues labels Jul 30, 2024

eladkal approved these changes Jul 31, 2024

View reviewed changes

phanikumv merged commit 16ed4df into apache:main Jul 31, 2024
48 checks passed

phanikumv deleted the dag_outlets branch July 31, 2024 12:59

utkarsharma2 added the type:improvement Changelog: Improvements label Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Render Dataset Conditions in DAG Graph view #41137

Render Dataset Conditions in DAG Graph view #41137

bbovenzi commented Jul 30, 2024 •

edited

Loading

Lee-W commented Jul 31, 2024

eladkal left a comment

bbovenzi commented Jul 31, 2024

Lee-W commented Aug 1, 2024

Lee-W commented Aug 1, 2024

Render Dataset Conditions in DAG Graph view #41137

Render Dataset Conditions in DAG Graph view #41137

Conversation

bbovenzi commented Jul 30, 2024 • edited Loading

Lee-W commented Jul 31, 2024

eladkal left a comment

Choose a reason for hiding this comment

bbovenzi commented Jul 31, 2024

Lee-W commented Aug 1, 2024

Lee-W commented Aug 1, 2024

bbovenzi commented Jul 30, 2024 •

edited

Loading