Skip to content

Fix airflow dags clear clearing the wrong day for non-UTC partitioned timetables#67717

Open
Lee-W wants to merge 1 commit into
apache:mainfrom
astronomer:dag-command-partition-date-tz
Open

Fix airflow dags clear clearing the wrong day for non-UTC partitioned timetables#67717
Lee-W wants to merge 1 commit into
apache:mainfrom
astronomer:dag-command-partition-date-tz

Conversation

@Lee-W
Copy link
Copy Markdown
Member

@Lee-W Lee-W commented May 29, 2026

Why

airflow dags clear --partition-date-start/--partition-date-end parsed the user-supplied dates as UTC and compared those bounds directly against DagRun.partition_date. For a timetable whose timezone is not UTC, that comparison is off by a day:

  • UTC+ zones (e.g. Asia/Taipei, UTC+8): the start day's runs were dropped — the lower bound excluded partitions that belong to the requested local day.
  • UTC- zones (e.g. America/New_York, EST): the end day's runs were dropped — the upper bound excluded partitions that belong to the requested local day.

A user asking to clear a local-calendar day expects the runs of that local day, not a UTC-shifted window.

What

  • Resolve the partition window through the timetable's timezone instead of comparing UTC-parsed dates directly:
    • The --partition-date-start date is interpreted as local midnight in the timetable timezone, converted to UTC, and used as an inclusive lower
      bound (partition_date >= lower_utc).
    • The --partition-date-end date is interpreted as local midnight of the next day in the timetable timezone, converted to UTC, and used as an
      exclusive upper bound (partition_date < upper_utc). This half-open range keeps the whole requested end day inside the window.
    • This timezone resolution only applies when the timetable is partitioned and exposes a timezone; otherwise the previous inclusive >= start / <= end behaviour is preserved as a fallback.
  • Add a public CronMixin.timezone accessor so dag_clear can read the timetable timezone without reaching into the private _timezone attribute.

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: [Claude] following the guidelines


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@Lee-W Lee-W force-pushed the dag-command-partition-date-tz branch from 2fd152c to c675fe8 Compare May 29, 2026 13:56
@Lee-W Lee-W changed the title fix(cli): clear correct partition window for non-UTC timetables Fix airflow dags clear clearing the wrong day for non-UTC partitioned timetables May 29, 2026
@Lee-W Lee-W moved this to In Progress in AIP-76 Asset Partitioning May 29, 2026
@Lee-W Lee-W force-pushed the dag-command-partition-date-tz branch 2 times, most recently from fe52efa to effb74c Compare May 29, 2026 15:03
@Lee-W Lee-W marked this pull request as ready for review May 29, 2026 15:05
- `airflow dags clear --partition-date-start/end` compared the UTC-parsed
  bounds straight against DagRun.partition_date, so non-UTC timetables cleared
  the wrong day (UTC+ zones dropped the start day, UTC- zones the end day).
- Convert the bounds through the timetable timezone into a half-open UTC range;
  add a public CronMixin.timezone accessor.
@Lee-W Lee-W force-pushed the dag-command-partition-date-tz branch from effb74c to cab5a31 Compare May 29, 2026 15:15
@Lee-W Lee-W moved this from In Progress to In Review in AIP-76 Asset Partitioning May 29, 2026
query = query.where(DagRun.partition_date >= args.partition_date_start)
if args.partition_date_end is not None:
query = query.where(DagRun.partition_date <= args.partition_date_end)
tt_tz = getattr(dag.timetable, "timezone", None) if dag.timetable.partitioned else None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tt_tz is resolved by probing for a .timezone attribute, which only CronMixin-based timetables have (this PR adds the property). PartitionedAssetTimetable (timetables/simple.py:267) is also partitioned = True but has no .timezone, so it silently takes the no-tz branch. That's correct today if asset-partition dates are genuinely UTC-anchored, but the dispatch is duck-typed: any future tz-aware partitioned timetable that doesn't expose .timezone will silently fall back to the UTC branch and reintroduce the exact off-by-one this PR fixes, with no error to flag it. Worth either putting the tz accessor on the partitioned-timetable contract so it's explicit which timetables are tz-aware, or branching on a known type. Minor, related: the two day-bound blocks are nearly identical across the tz and no-tz paths and could share a _day_bounds(label, tz) helper to keep them from drifting.

# Partitioned runs are stored as local-midnight UTC instants; compare at day
# granularity in the timetable's timezone rather than at the raw UTC instant.
if args.partition_date_start is not None:
start_label = args.partition_date_start.date()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parsedate returns the parsed instant in UTC (naive input is read as UTC), so .date() takes the UTC calendar day and then re-anchors it to midnight in the timetable tz. For naive values that's intuitive, but a tz-aware CLI value can shift the day: --partition-date-start 2026-02-19T07:00:00+08:00 parses to 2026-02-18T23:00Z, and .date() yields 2026-02-18, not the user's local 2026-02-19. The help text says time-of-day is ignored, but not that the calendar day is read from the parsed (UTC) instant rather than re-projected into the timetable tz. A one-line note would make the as-typed behaviour explicit. Same applies to the end bound at line 182.

if args.partition_date_end is not None:
end_label = args.partition_date_end.date()
# Half-open upper bound: include all of the end local calendar day.
next_day = datetime.date(end_label.year, end_label.month, end_label.day) + datetime.timedelta(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

end_label is already a datetime.date (from .date() above), so datetime.date(end_label.year, end_label.month, end_label.day) just rebuilds the same date. This can be next_day = end_label + datetime.timedelta(days=1), which is the simpler form the no-tz branch below already uses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

2 participants