Allow future logical_date for manual and operator-triggered DAG runs#65856
Allow future logical_date for manual and operator-triggered DAG runs#65856ZhaoMJ wants to merge 4 commits intoapache:mainfrom
Conversation
PR apache#46663 removed the `allow_trigger_in_future` config and hardcoded a blanket ban on future `logical_date`. Both the scheduler and the `RunnableExecDateDep` task dependency unconditionally reject any DAG run whose `logical_date` is in the future, regardless of run type. There are legitimate use cases (testing, pre-staging runs, Calendar view organization) where users want to manually trigger a DAG with a future `logical_date`. This change only blocks future `logical_date` for scheduled runs, not manual ones. Changes: - Scheduler (`_schedule_dag_run`): skip future-date check for MANUAL runs - Task dep (`RunnableExecDateDep`): skip future-date check for MANUAL runs - Tests: updated existing tests, added scheduler-level tests for both manual (allowed) and scheduled (still blocked) future logical_date
The previous commit removed the scheduler/task-dep hard block for manual runs with a future logical_date. This extends the same treatment to operator-triggered runs (TriggerDagRunOperator) and adds a UI checkbox so users can choose whether a future-dated manual run executes immediately or waits until the logical date. - Scheduler: also skip the future logical_date block for OPERATOR_TRIGGERED runs (run_after query filter is sufficient) - Task dep (RunnableExecDateDep): skip future-date check for both MANUAL and OPERATOR_TRIGGERED run types - UI: show "Run immediately" checkbox when logical date is in the future; when unchecked, set run_after=logicalDate so the run waits - Add translated strings for all 21 locales - Add tests for operator-triggered runs in both scheduler and dep tests
Thread `run_after` through the full operator → exception → task runner → supervisor → SDK client → execution API chain so users can control when an operator-triggered DAG run becomes eligible for scheduling. When `logical_date` is explicitly set but `run_after` is not, `run_after` defaults to `logical_date` so the run waits until that time. When neither is set, `run_after` is left to the API server default (`utcnow()`). Setting `run_after=None` explicitly forces immediate execution even with a future `logical_date`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SameerMesiah97
left a comment
There was a problem hiding this comment.
I can understand rationale behind this PR but this looks like a partial implementation toward decoupling logical_date from scheduling and execution semantics.
Right now, the scheduler explicitly uses logical_date as an ordering criterion when selecting task instances to queue (e.g. ORDER BY priority_weight, logical_date). This means the system assumes that execution progresses in logical_date order. I have left a comment where I go into this in abit more detail but my point is that implementing this fully may require removing logical_date from scheduling order and redefining how task prioritization works.
Have you dicsussed with or sought alignment from the maintainers in this regard?
| dag_run.logical_date | ||
| and dag_run.logical_date > timezone.utcnow() | ||
| and dag_run.run_type not in (DagRunType.MANUAL, DagRunType.OPERATOR_TRIGGERED) | ||
| ): |
There was a problem hiding this comment.
This is not the only place where logical_date is being used as an invariant. If you look at other parts of scheduler_job_runner.py (specifically _executable_task_instances_to_queued), you will see that task instances are ordered by logical_date before starvation filters are applied. This means that if we allow a manually triggered run with a future logical_date to start executing early, it can still be deprioritized relative to earlier logical dates during scheduling.
So instead of being fully unblocked, these runs may end up in a state where they start but then make slower or inconsistent progress depending on what other runs exist. More generally, we now allow execution to violate logical ordering, while the scheduler still uses logical_date to enforce it.
There was a problem hiding this comment.
Thanks for the feedback, interesting observation! I haven't sought alignment from the maintainers in this regard. From my (limited) understanding, the direction is to pivot from logical_date to run_after entirely in scheduling, but I think it's a separate concern from this PR.
The goal of this PR is to add a non-breaking and non-confusing option to allow immediate execution of future logical date. I'm afraid changing TI scheduling order to use run_after instead of logical_date would affect existing workflows, and would be too impactful, risky and breaking.
The mismatch between run_after affecting dagrun scheduling decisions and logical_date affecting task prioritization already exists today — I'm just extending it to future logical dates.
That said, decoupling logical_date from scheduling and execution semantics feels like a separate meaningful discussion if the community wants to pursue it, and I'm happy to follow up on it.
Regarding the specific code here — I'm not quite sure how this changes anything regarding logical ordering violation. I believe you can already specify mismatched run_after and logical_date using the API today and potentially allow a dagrun with an earlier logical_date to execute later (than another one with a later logical_date but an earlier run_after.)
Besides, per our current definition of logical_date:
A date-time that logically identifies the current Dag run. This value does not contain any semantics, but is simply a value for identification.
I'd argue that the "logical" is really to the user rather than to Airflow internals, and it shouldn't be applied to ordering if not for backward compatibility reasons.
That said, looking at this part again and considering that run_after is defined as "A pendulum.DateTime instance that tells the scheduler when the Dag run can be scheduled", I feel like maybe we shouldn't be checking logical_date here at all.
|
I assume such change requires a small [DISCUSS] thread on the devlist as it changes semantics of scheduling and can have a couple of side effects. No opinion on my side and did not take a look to code. |
|
Thanks @jscheffl — I didn't start a discussion thread first because I believed the changes are not breaking. That said, I agree it is a good idea to have some discussion first, given the side effects and that the PR is more complex than I thought. I've started a discussion thread: #65949. (I also sent an email to the devlist, but that didn't seem to go through for some reason.) Happy to incorporate feedback from there. |
Discussions in Github are "just support" they are not real devlist discussions - many contributors will miss discussions. If you miss to see the email on devlist you mightneed to subscribe yourself? |
|
yep, agree with @jscheffl we'll need a dev mailing list discussion for this. I'm a bit surprised by the change. you can join the dev list by the instruction here https://airflow.apache.org/community/ |
|
@Lee-W @jscheffl Thanks for the reply! I finally managed to subscribe and send the discussion email to the devlist after switching from outlook to gmail. Looking forward to the discussion there: https://lists.apache.org/thread/5j6ljfq6tdyo2jy63dvr209hthwhbykl |
Summary
PR #46663 removed the
allow_trigger_in_futureconfig and hardcoded a blanket ban on futurelogical_date. Both the scheduler (_schedule_dag_run) and theRunnableExecDateDeptask dependency unconditionally reject any DAG run whoselogical_dateis in the future, regardless of run type. The result: when a user triggers a DAG with a futurelogical_date, the DagRun appears as "running" but tasks never execute — and the run can't be deleted until the future date either.This change relaxes the restriction for manually triggered and operator-triggered runs, allowing them to execute immediately even with a future
logical_date. Scheduled runs with a futurelogical_dateare still blocked. A new "Run immediately" checkbox in the Trigger DAG UI gives users explicit control over whether a future-dated run executes immediately or waits until the logical date. The default behavior is unchanged — existing workflows are not affected.Key insight: The API already has separate
logical_dateandrun_afterfields.run_aftercontrols when the scheduler picks up the run (WHERE run_after <= NOW()). Settingrun_after=now()withlogical_date=futureshould work — but a hard block in the scheduler and a task dep (RunnableExecDateDep) prevented it.Motivation
The rename from
execution_datetological_datein Airflow 3 reflects a deliberate design shift:logical_dateis a logical organizer for DAG runs, not a constraint on when tasks execute. Blocking future values undermines this.This is blocking our migration from Airflow 2 to 3. In Airflow 2,
allow_trigger_in_future=Truesupported our workflows. That config was removed in Airflow 3 with no replacement, breaking existing production patterns.Suggested workarounds don't work
In #46663, it was suggested: "Instead of using a future date, you can trigger with a None logical date whenever you want. A custom run_id can be supplied if you want it. If a date is needed, it can be passed as a DAG param instead."
However:
logical_datetoNone. The Trigger DAG form always pre-fills logical date withnow()and provides no way to clear it.{{ logical_date }}are used extensively throughout task definitionslogical_date— params are invisible therelogical_date, not paramsUse cases
Custom session date models: Organizations that maintain their own concept of "session date" (e.g., triggering a Monday run on Friday night) where the session date maps directly to Airflow's
logical_date. These runs should execute immediately at trigger time, not wait until the logical date arrives. This is our primary use case — we model business session dates that don't align with calendar dates, andlogical_dateis the natural representation for this in Airflow.Cross-timezone event-driven ETL: DAGs triggered by external webhooks that serve data pipelines across multiple timezones. The
logical_daterepresents the business date being processed — not when the task should execute. For Asian timezones (e.g.,Asia/Tokyo,Asia/Sydney), the business day rolls over hours before UTC midnight. When a webhook fires at 16:00 UTC to process the next business day's data for Tokyo (already April 27 in JST while still April 26 in UTC), thelogical_datemust be set to the next UTC day — a future date from the server's perspective.Pre-staging and testing: Trigger a run for a future date to verify pipelines ahead of time (e.g., end-of-month processing, quarterly reports).
Forecast/forward-looking data: Pipelines that process weather forecasts, financial projections, or other data indexed by future dates naturally need future
logical_datevalues.Calendar view positioning: The Calendar view positions runs by
logical_date. Without futurelogical_datesupport, manually triggered runs always cluster at today's date instead of appearing at their intended position.Changes
Scheduler and task dependency (commits 1–2)
_schedule_dag_run): skip futurelogical_dateblock forMANUALandOPERATOR_TRIGGEREDrun types (therun_after <= now()query filter is the primary gate)RunnableExecDateDep): skip future-date check forMANUALandOPERATOR_TRIGGEREDrun typesTriggerDAGForm.tsx,types.ts,useTrigger.ts): add "Run immediately" checkbox that appears when logical date is in the future. When unchecked,run_after = logicalDateso the run waits; when checked,run_afterdefaults tonow()and the run executes immediatelyrunImmediatelystring for all 21 localeslogical_datetests for both scheduler and task dep; existing manual and scheduled tests remainOptional
run_afteron TriggerDagRunOperator (commit 3)Adds an optional
run_afterparameter toTriggerDagRunOperator, threaded through the full chain: operator →DagRunTriggerException→ task runner → supervisor → SDK client → Execution API →trigger_dag().Default behavior is unchanged:
logical_datenorrun_afteris set:run_after=None→trigger_dag()defaults toutcnow()→ run executes immediately (same as before)logical_dateis explicitly set butrun_afteris not:run_afterdefaults tological_date, so the run waits until that time (same as before — previously enforced byRunnableExecDateDep, now byrun_after)run_afteris explicitly set: user controls exactly when the run becomes eligiblerun_after=Noneis explicitly set with a futurelogical_date: forces immediate executionFiles changed:
TriggerDagRunOperator: newrun_afterparam with NOTSET default, added totemplate_fieldsDagRunTriggerException: newrun_afterfield_handle_trigger_dag_run): passesrun_aftertoTriggerDagRunmessagerun_aftertoclient.dag_runs.trigger()DagRunOperations.trigger): newrun_afterparam, included inTriggerDAGRunPayloadTriggerDAGRunPayload): newrun_afterfieldrun_aftertotrigger_dag()AddRunAfterToTriggerPayloadversion change for backward compatrun_afterfield added toTriggerDAGRunPayloadrelated: #46663
related: #60443
"Run immediately" checkbox
Calendar view with future
logical_dateWas generative AI tooling used to co-author this PR?
Generated-by: Claude Code (Claude Opus 4.6) following the guidelines