Skip to content

Databricks Provider - Task within a Workflow to handle different "run if dependencies" configuration (currently only supports default ALL_SUCCEEDED) #42822

@RafaelCartenet

Description

@RafaelCartenet

Description

Concerns airflow.providers.databricks.operators.databricks

When creating a task inside a Workflow in Databricks, you can choose "Run if dependencies", see screenshot below.

https://docs.databricks.com/en/jobs/run-if.html

image

The workflow json contains the information at the task level, for example:

{
      "task_key": "C",
      "depends_on": [
        {
          "task_key": "A"
        },
        {
          "task_key": "B"
        }
      ],
      "run_if": "ALL_SUCCEEDED",
     ...
}

It is not supported using Databricks provider for now, in the api call to create the workflow it's ignored and thus default value is used: "ALL_SUCCEEDED".
Would be awesome to be able to feed that information at the task level so that we can handle more dependency types.

I think the best would be to be able to leverage Airflow operator generic trigger_rule but i'm not too sure how to implement that or if that's doable.

I think the easiest would be add a parameter in the DatabricksNotebookOperator that would override the run_if field in the job json object

I'm happy to help with a PR

Use case/motivation

I have this complex job in Databricks that I am trying to migrate as code and I'm blocked because I can't reproduce the dependency issue that I mentioned.

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions