Make partition_key provenance-only and inherit it onto asset events#67718
Make partition_key provenance-only and inherit it onto asset events#67718Lee-W wants to merge 1 commit into
Conversation
8b575fc to
87119dd
Compare
…vent - Drop runtime back-fill so task-emitted keys no longer overwrite DagRun.partition_key; the run's key stays provenance-only. - AssetEvent.partition_key now inherits the run's key when a task emits an event without an explicit key, instead of staying None (which left downstream partitioned consumers unable to trigger). - Reject a manual partition_key on the REST trigger endpoint for non-partitioned Dags with a 400. All behaviour is unreleased (AIP-76 first ships in 3.2.0), so changes land in place with no compat shim and no newsfragment. closes: apache#67368
87119dd to
dd8e7c4
Compare
| and not dag.timetable.partitioned | ||
| and not dag.timetable.partitioned_at_runtime | ||
| ): | ||
| raise ValueError( |
There was a problem hiding this comment.
This rejection only runs on the public REST trigger path. The Execution API route (trigger_dag_run in execution_api/routes/dag_runs.py:144) passes partition_key straight into trigger_dag then create_dagrun with no partitionability check, so a non-partitioned run can still be created with a key there, and with the new inheritance it then emits partitioned-looking AssetEvents. TriggerDagRunOperator doesn't expose partition_key today so it isn't reachable through the shipped operator, but the wire field is forwarded ungated. Putting the check in create_dagrun/DagRun.__init__ would cover both entrypoints instead of just the request body.
kaxil
left a comment
There was a problem hiding this comment.
Approving. The one inline note (the partition_key rejection lives in the REST request body, so the Execution API trigger path at dag_runs.py:144 still forwards a key into create_dagrun ungated) is an optional suggestion, not a blocker, since TriggerDagRunOperator doesn't expose partition_key today. Worth considering folding the check into create_dagrun so both entrypoints share one gate.
Why
partition_keysemantics forDagRunandAssetEventhad three rough edges, all on unreleased AIP-76 behaviour (first ships in 3.2.0):DagRun.partition_keyvia a runtime back-fill. That blurred ownership of the run's key — it should be provenance set by the scheduler / trigger side, not rewritten by task emission.partition_key = None, which left downstream partitioned consumers unable to route off it — even though the run it came from carried a key.partition_keyfor Dags that are not partitioned at all, silently ignoring it.closes: #67368
What
register_asset_changes_in_db. A task emitting outlet events no longer writesDagRun.partition_key; the run's key stays provenance-only, set by the scheduler / trigger side.AssetEventemitted without an explicitpartition_keynow inherits the run'spartition_keyinstead of stayingNone, so downstream partitioned consumers can route off it.partition_keyfor non-partitioned Dags with a400(Dags that are neithertimetable.partitionednortimetable.partitioned_at_runtime). The validation runs inside the route's error-handling block so the domainValueErrorsurfaces as400rather than500.All behaviour is unreleased, so the changes land in place with no compatibility shim and no newsfragment.
Was generative AI tooling used to co-author this PR?
Generated-by: [Claude] following the guidelines
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.