Skip to content

Do not deserialize trigger_kwargs when loading serialized DAGs#66002

Open
amoghrajesh wants to merge 3 commits intoapache:mainfrom
astronomer:dont-deser-triggers-on-scheduler-and-api-server
Open

Do not deserialize trigger_kwargs when loading serialized DAGs#66002
amoghrajesh wants to merge 3 commits intoapache:mainfrom
astronomer:dont-deser-triggers-on-scheduler-and-api-server

Conversation

@amoghrajesh
Copy link
Copy Markdown
Contributor

@amoghrajesh amoghrajesh commented Apr 28, 2026


Was generative AI tooling used to co-author this PR?
  • Yes : claude sonnet 4.5

After #55068 was merged, inconsequentially, _decode_start_trigger_args was deserializing trigger_kwargs and next_kwargs when loading a serialized DAG. These fields hold the serialized trigger state that only the Triggerer needs — when it picks up a deferred task, inflates the kwargs, and instantiates the trigger class.

The Scheduler and API Server load serialized DAGs but never touch these values; deserializing them there is wasted work.

This PR removes the deserialization and keeps trigger_kwargs and next_kwargs as raw JSON on StartTriggerArgs. The Triggerer reads trigger kwargs directly from the Trigger DB row (not through DAG deserialization), so its path is unaffected: airflow-core/src/airflow/models/trigger.py#L170-L179

What changes?

Why the changes to enum.py

#66002 (comment)

Scheduler

Before:
DagSerialization.from_dict()
  → _decode_start_trigger_args()
  → BaseSerialization.deserialize(trigger_kwargs)   # inflate to Python objects
  → StartTriggerArgs.trigger_kwargs = {delta: timedelta(...)}
TI.schedule_tis()
  → Trigger(kwargs=trigger_kwargs)
  → encrypt_kwargs() → serde.serialize() → json → fernet → encrypted_kwargs (DB)

After:

DagSerialization.from_dict()
  → _decode_start_trigger_args()
  → StartTriggerArgs.trigger_kwargs = {"__type": "dict", "__var": {...}}  # raw JSON, no work done
TI.schedule_tis()
  → Trigger(kwargs=trigger_kwargs)
  → encrypt_kwargs() → serde.serialize() → json → fernet → encrypted_kwargs (DB)

The Trigger.__init__ calls serde.serialize() on whatever it receives, so deserializing then re-serializing was pure waste.

API Server

Before: loads serialized DAG → _decode_start_trigger_args() which deserializes trigger_kwargs into Python objects → never used, thrown away.

After: same but stays raw JSON. No functional difference, just skips the work.

Triggerer

Before and after:

The triggerer also loads the serialized DAG at https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/triggerer_job_runner.py#L717-L735 to check start_from_trigger. So it does go through _decode_start_trigger_args() too, but it does not use trigger_kwargs from that — it reads trigger.encrypted_kwargs from the DB row directly below. So the fix is safe for the triggerer as well.


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Apr 28, 2026

Just one test left!

Copy link
Copy Markdown
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple, straightforward.

@amoghrajesh
Copy link
Copy Markdown
Contributor Author

The changes to the Encoding enum are due to this: https://github.com/apache/airflow/actions/runs/25040217658/job/73343984827?pr=66002

_decode_start_trigger_args previously called BaseSerialization.deserialize() on trigger_kwargs when loading a serialized DAG, which inflated the BaseSerialization-encoded dict back to Python objects before storing on StartTriggerArgs. This PR stops that deserialization and keeps trigger_kwargs as raw JSON.

The raw JSON has Encoding enum instances as dict keys — that's how BaseSerialization.serialize() encodes dicts. When defer_task() passes trigger_kwargs to Trigger(kwargs=...), encrypt_kwargs calls serde.serialize(), which converts dict keys via str(k).

Python 3.10+

str(Encoding.TYPE)
Out[3]: '__type'

Python 3.10:

Python 3.10.19 (main, Feb 12 2026, 00:36:33) [Clang 21.1.4 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from enum import Enum, unique
>>> @unique
... class Encoding(str, Enum):
...     TYPE = "__type"
...     VAR = "__var"
...
>>> str(Encoding.TYPE)
'Encoding.TYPE'

On Python ≤3.10, str(Encoding.TYPE) returns "Encoding.TYPE" (the enum repr) instead of "__type" (its value), mangling the keys so the Triggerer cannot read them back.

Adding __str__ to Encoding makes str(Encoding.TYPE) return "__type" consistently across all Python versions, so serde's key conversion produces the right output. This is a pre-existing inconsistency that only became observable once we stopped deserializing trigger_kwargs early.

@amoghrajesh amoghrajesh self-assigned this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants