-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test for interval timetable catchup=False #19145
Add test for interval timetable catchup=False #19145
Conversation
Also includes some minor refactorings and a fix on edge case boundaries that only happen when the current time lies on the interval boundary with microsecond accuracy (i.e. not gonna happen in the real world).
elif earliest is not None: | ||
earliest = self._align(earliest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous implementation in #19130 missed this call. Without this alignment call, the next run's data interval would start at the current time (e.g. 2021-10-22T10:43:57.43256) instead of the interval boundary (e.g. 2021-10-22T10:00:00.00000) and cause inconsistencies. So I moved the call here to make sure it is always done correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant good spot
next_start = self._get_next(current_time) | ||
last_start = self._get_prev(current_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous implementation
next_start = self._get_next(current_time)
last_start = self._get_prev(current_time)
had a bug if current_time
falls right on the interval boundary (e.g. the full hour mark for a @hourly
schedule interval) because croniter
would make next_start
and last_start
two hours apart instead of one. So this is changed to
last_start = self._get_prev(current_time)
next_start = self._get_next(last_start)
toe ensure the two starts are one hour apart, and current_time == next_start
for the interval boundary case. This is not really practically relevant (what's the chance a now()
call falls directly one that time with microsecond accurary), but is an issue in unit tests.
Only failing on Postgres, so probably not related...? But DAG scheduling is so core I can't really be sure. |
Can we also add the test I dropped at #19130, might be useful |
Added (with some minor changes, adding |
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
Test failures went away, probably flaky. |
self.scheduler_job._schedule_dag_run(dr, session) | ||
session.flush() | ||
|
||
dag.catchup = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps worth a comment here as to why we are doing this. Or you could freezetime to queue and schedule dagruns at the dag start date then outside of the freezetime block queue another dagrun and then confirm its the latest execution date?
(cherry picked from commit 205219c)
(cherry picked from commit 205219c)
(cherry picked from commit 205219c)
(cherry picked from commit 205219c)
Test case for #19130 + some minor fixes I missed earlier.
cc @robinedwards