-
Notifications
You must be signed in to change notification settings - Fork 16.6k
Description
Description
Use Case:
Provide an end-to-end view of data loading for a single table using multiple data workflows.
Flow Diagram:
Data Uri example -> file://tmp/txn_leads.csv
Thoughts:
- Provides a consolidated view of data moving into a target table.
- Not related to scheduling dependencies like "Assets".
- It does not replace data lineage tools like Collibra or OpenLineage etc., but offers a basic utility for a high-level overview.
Nice to have:
- Ability to click on the DAG name or data URI to navigate to the respective DAG or task.
- Display the number of records or extract information under the data URI for a quick glance.
Airflow Version: 3
Use case/motivation
Example: In a Data Vault 2.0 model, a HUB table receives data from various sources using different data rules. It is useful to understand how data moves into a single table through different DAGs and tasks.
Lineage SQL: May not be the most optimized or accurate
select
TODR.DAG_ID,
TODR.TASK_ID,
TODR.priority_weight,
INA.URI IN_URI,
OUA.URI AS OUT_URI
from
public.task_instance TODR
inner join public.task_inlet_asset_reference INTIAR
on (INTIAR.task_id = TODR.task_id and INTIAR.dag_id = TODR.dag_id)
inner join public.asset INA
on (INTIAR.asset_id = INA.id )
inner join public.task_outlet_asset_reference OUTAR
on (OUTAR.task_id = TODR.task_id and OUTAR.dag_id = TODR.dag_id)
inner join public.asset OUA
on (OUTAR.asset_id = OUA.id )
where
(TODR.TASK_ID,TODR.DAG_ID,TODR.start_date) in (select TASK_ID,DAG_ID,MAX(start_date) from public.task_instance group by TASK_ID,DAG_ID )
and TODR.DAG_ID in ('source_system_1','source_system_2','source_system_3','source_system_4')
order by TODR.DAG_ID DESC,TODR.priority_weight DESC;
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct