Lock in Databricks workflow depends_on parent-key behavior (#47614) by Vamsi-klu · Pull Request #66681 · apache/airflow

Vamsi-klu · 2026-05-11T03:08:07Z

The runtime fix for issue #47614 shipped in #48492 (commit 9dcce2f).
Closes: #47614
The GitHub issue stayed open because:

The existing unit test (test_convert_to_databricks_workflow_task) passed
relevant_upstreams = [MagicMock(task_id="upstream_task")] — a list of
mock objects, not strings — so the task_id in relevant_upstreams filter
was silently always-False and the depends_on branch was never exercised.
The fix could regress without anyone noticing.
A few small follow-on quality issues in the same area were worth tightening.
The issue was not linked to the merged PR, so it wasn't auto-closed.

This PR is therefore a lock-in / regression / hardening change, not a new fix.

Changes

databricks.py: corrected relevant_upstreams annotation from
list[BaseOperator] to list[str] on both DatabricksTaskBaseOperator
and DatabricksNotebookOperator overrides.
databricks_workflow.py: changed self.relevant_upstreams = [task_id]
to self.relevant_upstreams: list[str] = []. Behavior is unchanged
(the launch task's raw, unprefixed task_id was previously filtered out
only by an accidental prefix mismatch); the new initializer makes the
intent explicit.
test_databricks_workflow.py: new TestWorkflowDependsOn class with 7
end-to-end tests that build a real DAG + DatabricksWorkflowTaskGroup
with real DatabricksNotebookOperator tasks and assert on the
tasks[*]['depends_on'] payload returned by create_workflow_json().
Covers default keys, custom parent key, >100-char parent key, diamond,
fan-out, root tasks (launch task is filtered out), and external
upstream filtering.
test_databricks.py: fixed the existing
test_convert_to_databricks_workflow_task to pass strings instead of
mocks and to assert the correct parent-keyed depends_on. Added
test_generate_databricks_task_key_requires_task_dict_when_task_id_passed.
changelog.rst: Bug Fixes entry under 7.14.0.

closes: #47614

Was generative AI tooling used to co-author this PR?

Yes — Claude Code (Opus 4.7)

Generated-by: Claude Code (Opus 4.7) following the guidelines

Vamsi-klu · 2026-05-11T03:11:34Z

@eladkal @jscheffl @shahar1 — could one of you take a look when you have a moment? This PR is regression coverage / type-hint cleanup for the depends_on bug whose runtime fix you reviewed in #48492. The actual behavior is unchanged; the goal is to lock it in with end-to-end tests so the same bug can't silently come back, and to close out #47614.

eladkal · 2026-05-11T04:23:51Z

cc @moomindani for review

moomindani

The diff itself looks good to me — the type annotation is the right one, the relevant_upstreams = [task_id] → [] initializer is genuinely dead code being cleaned up (the bare "launch" could never match the child tasks' prefixed "<group>.launch" upstreams, so the original list was always semantically empty), the existing MagicMock-based test fix actually exercises the depends_on branch now, and TestWorkflowDependsOn covers default/custom/oversize key, diamond, fan-out, root-task filtering, and external-upstream filtering. Changelog revert in 025e8a4 is clean.

A couple of things before I'd be comfortable approving:

CI is red. The remaining failure (Compat 3.0.6:P3.10) is gh run download artifact infra, not your code, but we shouldn't merge on red — could you re-trigger it (or push an empty commit) so we get a clean build?
Real-workspace validation. The runtime fix has been live since #48492 in April 2025, so the user-visible behavior change here is small, but this PR does touch production code (relevant_upstreams = [task_id] → []). Have you (or anyone) actually run a DatabricksWorkflowTaskGroup with a parent/child task pair against a Databricks workspace and confirmed the rendered Jobs API JSON has depends_on: [{task_key: <parent_key>}]? The unit tests build real DAG objects but assert on the in-process create_workflow_json() payload, not what arrives at Databricks. An airflow dags test run + a screenshot of the resulting Databricks job graph would close the loop (similar pattern to what I used in #66613).

Once CI is green and there's some evidence the wire payload is right, LGTM.

Vamsi-klu · 2026-05-19T05:55:33Z

@moomindani thanks for the review. Two things addressed:

(A) CI the failing provider distributions tests / Compat 3.0.6:P3.10 shard was artifact infra: gh run download couldn't find the CI-image stash ci-image-save-v3-linux_amd64-3.10-66681_merge. All other compat jobs (2.11.1 / 3.1.8 / 3.2.1), DB tests, mypy, static checks, and integration tests were green. I've rebased onto current main (the branch was 129 commits behind) and force-with-lease pushed, which re-runs the full matrix from scratch.

(B) Wire-payload validation added a new TestWorkflowDependsOnWirePayload class in providers/databricks/tests/unit/databricks/operators/test_databricks_workflow.py with two tests that drive _CreateDatabricksWorkflowOperator._create_or_reset_job end to end and assert on what the hook receives:

test_create_job_payload_carries_parent_depends_on captures launch_task._hook.create_job.call_args.args[0] (the new-job path) and asserts tasks[task_b]["depends_on"] == [{"task_key": <md5(task_a)>}].
test_reset_job_payload_carries_parent_depends_on — captures launch_task._hook.reset_job.call_args.args[1] (the existing-job path, job_id=42) and asserts the same shape.

Both tests build a real DAG + DatabricksWorkflowTaskGroup populated with real DatabricksNotebookOperators (no operator mocks), so the captured job_spec is exactly what would be POSTed to /api/2.1/jobs/create and /api/2.1/jobs/reset. The existing TestWorkflowDependsOn class still covers the in-process create_workflow_json payload; this new class closes the loop at the wire boundary and lives in CI forever.

I don't have a Databricks workspace handy for the live airflow dags test + job-graph screenshot, but the call-args assertion verifies the same JSON the Jobs API would receive. The runtime fix has also been live since #48492 (April 2025) without regressions reported on the depends_on path. Happy to add the screenshot too if you'd still like it.

moomindani

Real-workspace validation

Drove the PR branch end-to-end through airflow dags test against a Databricks workspace with a minimal DatabricksWorkflowTaskGroup (task_a >> task_b, both DatabricksNotebookOperator), then queried the created Databricks job back via 2.2/jobs/list. The Jobs API payload is correctly parent-keyed:

{
  "tasks": [
    {
      "task_key": "d13f300f3f3b28c910b3710198314589"
      // md5("pr66681_realenv__wf.task_a"); no depends_on
    },
    {
      "task_key": "dcf56d6d083da88df804ce506620a7a2",  // md5("pr66681_realenv__wf.task_b")
      "depends_on": [
        {"task_key": "d13f300f3f3b28c910b3710198314589"}  // parent task_a's key, not its own
      ]
    }
  ]
}

Closes the wire-payload loop on top of the unit tests.

Reproduction

export AIRFLOW_CONN_DATABRICKS_DEFAULT='{"conn_type":"databricks","host":"https://<workspace>","password":"<token>"}'
export PR66681_NOTEBOOK_PATH=/Users/<you>/airflow-pr66681-noop  # any existing notebook in the workspace
airflow dags test pr66681_realenv

Then verify the resulting Databricks job via:

curl -s -H "Authorization: Bearer $DATABRICKS_TOKEN" \
  "https://<workspace>/api/2.2/jobs/list?name=pr66681_realenv.wf&limit=1&expand_tasks=true" | jq '.jobs[0].settings.tasks'

dags/dag_pr66681_realenv.py

"""Real-environment validation DAG for PR #66681 (GH-47614)."""
from __future__ import annotations

import hashlib
import os
from datetime import datetime

from airflow.providers.databricks.hooks.databricks import DatabricksHook
from airflow.providers.databricks.operators.databricks import DatabricksNotebookOperator
from airflow.providers.databricks.operators.databricks_workflow import DatabricksWorkflowTaskGroup
from airflow.sdk import DAG, task

NOTEBOOK_PATH = os.environ["PR66681_NOTEBOOK_PATH"]
DAG_ID = "pr66681_realenv"
GROUP_ID = "wf"
CONN_ID = "databricks_default"


def _hook() -> DatabricksHook:
    return DatabricksHook(databricks_conn_id=CONN_ID)


def _expected_task_key(task_id_in_group: str) -> str:
    full_task_id = f"{GROUP_ID}.{task_id_in_group}"
    return hashlib.md5(f"{DAG_ID}__{full_task_id}".encode()).hexdigest()


with DAG(dag_id=DAG_ID, start_date=datetime(2026, 1, 1), schedule=None, catchup=False) as dag:
    with DatabricksWorkflowTaskGroup(
        group_id=GROUP_ID,
        databricks_conn_id=CONN_ID,
        job_clusters=[
            {
                "job_cluster_key": "shared",
                "new_cluster": {
                    "spark_version": "15.4.x-scala2.12",
                    "node_type_id": "i3.xlarge",
                    "num_workers": 0,
                    "spark_conf": {"spark.master": "local[*]"},
                    "custom_tags": {"ResourceClass": "SingleNode"},
                },
            }
        ],
    ) as wf:
        task_a = DatabricksNotebookOperator(
            task_id="task_a", notebook_path=NOTEBOOK_PATH, source="WORKSPACE", job_cluster_key="shared"
        )
        task_b = DatabricksNotebookOperator(
            task_id="task_b", notebook_path=NOTEBOOK_PATH, source="WORKSPACE", job_cluster_key="shared"
        )
        task_a >> task_b

    @task(task_id="verify_depends_on", trigger_rule="all_done")
    def verify_depends_on() -> None:
        jobs = _hook()._do_api_call(
            ("GET", "2.2/jobs/list"),
            {"name": f"{DAG_ID}.{GROUP_ID}", "limit": 1, "expand_tasks": True},
        )
        job = jobs["jobs"][0]
        tasks_by_key = {t["task_key"]: t for t in job["settings"]["tasks"]}
        a_key = _expected_task_key("task_a")
        b_key = _expected_task_key("task_b")
        assert tasks_by_key[a_key].get("depends_on", []) == []
        assert tasks_by_key[b_key]["depends_on"] == [{"task_key": a_key}]
        _hook()._do_api_call(("POST", "2.2/jobs/delete"), {"job_id": job["job_id"]})

    wf >> verify_depends_on()

Approving — note I'm not a committer, so this is only a non-binding LGTM; a committer still needs to sign off before merge.

eladkal · 2026-05-20T13:17:41Z

    def _convert_to_databricks_workflow_task(
        self,
-        relevant_upstreams: list[BaseOperator],
+        relevant_upstreams: list[str],


Can you clarify why this change is needed?

Good catch. This is correcting the annotation to match the runtime value rather than changing behavior.

relevant_upstreams is populated from task.task_id in DatabricksWorkflowTaskGroup.__exit__, so it is a list of task-id strings. _convert_to_databricks_workflow_task() then compares those strings with self.upstream_task_ids:

for task_id in self.upstream_task_ids if task_id in relevant_upstreams

The old list[BaseOperator] annotation was misleading: passing operators there would make that membership check fail because it compares str task IDs to operator objects. The actual BaseOperator instances are still carried separately in task_dict, which is used when resolving the parent task’s Databricks task key.

So this change is mainly a type-hint cleanup that also makes the tests reflect the real call shape.

The runtime fix for issue apache#47614 shipped in PR apache#48492; this PR adds end-to-end regression coverage so the bug cannot silently regress, plus small type-hint and constructor-clarity follow-ups in the same area. Tests build a real DAG + DatabricksWorkflowTaskGroup with DatabricksNotebookOperator tasks and assert depends_on payloads for the default-key, custom-key, >100-char-key, diamond, fan-out, root-task, and external-upstream cases. Also fixes the existing test_convert_to_databricks_workflow_task to pass strings (not mocks) so the depends_on branch is actually exercised, and adds a one-line check that _generate_databricks_task_key raises when called with a parent task_id but no task_dict. closes: apache#47614

Provider changelogs are regenerated from git log by the release manager and should not be edited by hand.

Existing TestWorkflowDependsOn coverage verified the in-process create_workflow_json() output. The new TestWorkflowDependsOnWirePayload class drives _CreateDatabricksWorkflowOperator._create_or_reset_job end to end and asserts on the spec that DatabricksHook.create_job / DatabricksHook.reset_job actually receive, i.e. the payload that lands in /api/2.1/jobs/create and /api/2.1/jobs/reset. Both branches (no existing job -> create_job, existing job -> reset_job) are exercised; both assert tasks[child].depends_on == [{task_key: md5(parent)}] and tasks[parent].depends_on == [].

boring-cyborg Bot added area:providers kind:documentation provider:databricks labels May 11, 2026

Vamsi-klu marked this pull request as ready for review May 11, 2026 03:10

eladkal reviewed May 11, 2026

View reviewed changes

Comment thread providers/databricks/docs/changelog.rst Outdated

eladkal requested a review from pankajkoti May 13, 2026 04:47

eladkal reviewed May 13, 2026

View reviewed changes

Comment thread providers/databricks/docs/changelog.rst Outdated

moomindani reviewed May 19, 2026

View reviewed changes

Vamsi-klu force-pushed the fix-issue-47614-databricks-depends-on branch from 025e8a4 to c231337 Compare May 19, 2026 05:55

moomindani approved these changes May 20, 2026

View reviewed changes

eladkal reviewed May 20, 2026

View reviewed changes

Vamsi-klu added 3 commits May 21, 2026 08:44

Remove manually added databricks changelog entry

f90a8ff

Provider changelogs are regenerated from git log by the release manager and should not be edited by hand.

Vamsi-klu force-pushed the fix-issue-47614-databricks-depends-on branch from c231337 to 684bfee Compare May 21, 2026 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lock in Databricks workflow depends_on parent-key behavior (#47614)#66681

Lock in Databricks workflow depends_on parent-key behavior (#47614)#66681
Vamsi-klu wants to merge 3 commits into
apache:mainfrom
Vamsi-klu:fix-issue-47614-databricks-depends-on

Vamsi-klu commented May 11, 2026 •

edited by eladkal

Loading

Uh oh!

Vamsi-klu commented May 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

eladkal commented May 11, 2026

Uh oh!

Uh oh!

moomindani left a comment

Uh oh!

Vamsi-klu commented May 19, 2026 •

edited

Loading

Uh oh!

moomindani left a comment

Uh oh!

eladkal May 20, 2026

Uh oh!

Vamsi-klu May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Vamsi-klu commented May 11, 2026 • edited by eladkal Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Was generative AI tooling used to co-author this PR?

Uh oh!

Vamsi-klu commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

eladkal commented May 11, 2026

Uh oh!

Uh oh!

moomindani left a comment

Choose a reason for hiding this comment

Uh oh!

Vamsi-klu commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

moomindani left a comment

Choose a reason for hiding this comment

Real-workspace validation

Reproduction

Uh oh!

eladkal May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Vamsi-klu May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Vamsi-klu commented May 11, 2026 •

edited by eladkal

Loading

Vamsi-klu commented May 11, 2026 •

edited

Loading

Vamsi-klu commented May 19, 2026 •

edited

Loading

Vamsi-klu May 21, 2026 •

edited

Loading