[FEATURE] `TaskDependencySensor` improvements #101

maxim-mityutko · 2024-03-11T20:06:17Z

Is your feature request related to a problem? Please describe.

Consistent approach is required across all sensors to determine the time that will be used as a base for calculating whether or not upstream criteria are fulfilled. This approach should make sensor executions reproducable if brickflow_start_time / brickflow_start_date custom parameters are set manually or job is scheduled and the execution date will be derived from CRON expression. Currently TaskDependencySensor is using datetime.now() approach:

brickflow/brickflow_plugins/airflow/operators/external_tasks.py

Lines 324 to 326 in 6ba515b

    
           execution_window_tz = (datetime.now() + execution_delta).strftime( 
        
               "%Y-%m-%dT%H:%M:%SZ" 
        
           )

From the sensor usage perspective, the user does not care if the upstream DAG failed or not, what matters is the ability to trigger downstream tasks when upstream dependency is fulfilled, even if it was delayed due to the need to restart the failed DAG. That means that failing the task and the workflow is not a desired scenario for the operator usage.

brickflow/brickflow_plugins/airflow/operators/external_tasks.py

Lines 333 to 337 in 6ba515b

    
           if status == "failed": 
        
               log.error( 
        
                   f"Upstream dag {external_dag_id} failed at {external_task_id} task " 
        
               ) 
        
               raise Exception("Upstream Dag Failed")

Along the same lines as (2), failing the sensor if the upstream execution is not found is not a desirable flow:

brickflow/brickflow_plugins/airflow/operators/external_tasks.py

Lines 261 to 264 in 6ba515b

    
           if response.status_code == 401: 
        
               raise Exception( 
        
                   f"No Runs found for {external_dag_id} dag after {execution_window_tz}, Please check upstream dag" 
        
               )

The argument that the execution is always created by the Airflow, even if it not yet started, is not valid, because if the DAG paramers depends_on_past is used, new executions won't be created unless older ones are succesfull.

Cloud Information

AWS
Azure
GCP
Other

Describe the solution you'd like

Use context["execution_date"] which is available for Airflow operators or use brickflow_start_time from the Brickflow context.
Continue poking upstream.
Log that upstream execution is not found and continue poking.

Describe alternatives you've considered

Additional context

The text was updated successfully, but these errors were encountered:

maxim-mityutko added the enhancement New feature or request label Mar 11, 2024

maxim-mityutko mentioned this issue Jun 4, 2024

TaskDependencySensor: do not fail if upstream failed / derive execution time from context #132

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] `TaskDependencySensor` improvements #101

[FEATURE] `TaskDependencySensor` improvements #101

maxim-mityutko commented Mar 11, 2024 •

edited

[FEATURE] TaskDependencySensor improvements #101

[FEATURE] TaskDependencySensor improvements #101

Comments

maxim-mityutko commented Mar 11, 2024 • edited

Describe alternatives you've considered

Additional context

[FEATURE] `TaskDependencySensor` improvements #101

[FEATURE] `TaskDependencySensor` improvements #101

maxim-mityutko commented Mar 11, 2024 •

edited