Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for interval timetable catchup=False #19145

Merged
merged 2 commits into from
Oct 22, 2021

Conversation

uranusjr
Copy link
Member

Test case for #19130 + some minor fixes I missed earlier.

cc @robinedwards

Also includes some minor refactorings and a fix on edge case boundaries
that only happen when the current time lies on the interval boundary
with microsecond accuracy (i.e. not gonna happen in the real world).
Comment on lines +77 to +78
elif earliest is not None:
earliest = self._align(earliest)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation in #19130 missed this call. Without this alignment call, the next run's data interval would start at the current time (e.g. 2021-10-22T10:43:57.43256) instead of the interval boundary (e.g. 2021-10-22T10:00:00.00000) and cause inconsistencies. So I moved the call here to make sure it is always done correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant good spot

next_start = self._get_next(current_time)
last_start = self._get_prev(current_time)
Copy link
Member Author

@uranusjr uranusjr Oct 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation

next_start = self._get_next(current_time)
last_start = self._get_prev(current_time)

had a bug if current_time falls right on the interval boundary (e.g. the full hour mark for a @hourly schedule interval) because croniter would make next_start and last_start two hours apart instead of one. So this is changed to

last_start = self._get_prev(current_time)
next_start = self._get_next(last_start)

toe ensure the two starts are one hour apart, and current_time == next_start for the interval boundary case. This is not really practically relevant (what's the chance a now() call falls directly one that time with microsecond accurary), but is an issue in unit tests.

@uranusjr uranusjr added this to the Airflow 2.2.1 milestone Oct 22, 2021
@uranusjr
Copy link
Member Author

tests/models/test_cleartasks.py::TestClearTasks::test_clear_task_instances: assert 3 == 1
 +  where 3 = <TaskInstance: test_clear_task_instances.1 scheduled__2016-01-01T00:00:00+00:00 [None]>.max_tries
tests/models/test_cleartasks.py::TestClearTasks::test_clear_task_instances_with_task_reschedule: AssertionError: assert 0 == 1
 +  where 0 = <function TestClearTasks.test_clear_task_instances_with_task_reschedule.<locals>.count_task_reschedule at 0x7f3f4c267b70>('1')
 +    where '1' = <TaskInstance: test_clear_task_instances_with_task_reschedule.1 scheduled__2016-01-01T00:00:00+00:00 [up_for_reschedule]>.task_id
tests/models/test_cleartasks.py::TestClearTasks::test_dag_clear: assert 3 == 1
 +  where 3 = <TaskInstance: test_dag_clear.test_dag_clear_task_1 scheduled__2016-01-01T00:00:00+00:00 [None]>.max_tries

Only failing on Postgres, so probably not related...? But DAG scheduling is so core I can't really be sure.

@uranusjr uranusjr marked this pull request as ready for review October 22, 2021 03:37
@ephraimbuddy
Copy link
Contributor

Can we also add the test I dropped at #19130, might be useful

@uranusjr
Copy link
Member Author

Added (with some minor changes, adding session=session etc.)

@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Oct 22, 2021
@uranusjr uranusjr merged commit 205219c into apache:main Oct 22, 2021
@uranusjr uranusjr deleted the timetable-interval-catchup-test branch October 22, 2021 07:39
@uranusjr
Copy link
Member Author

Test failures went away, probably flaky.

self.scheduler_job._schedule_dag_run(dr, session)
session.flush()

dag.catchup = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth a comment here as to why we are doing this. Or you could freezetime to queue and schedule dagruns at the dag start date then outside of the freezetime block queue another dagrun and then confirm its the latest execution date?

jedcunningham pushed a commit that referenced this pull request Oct 22, 2021
jedcunningham pushed a commit that referenced this pull request Oct 24, 2021
jedcunningham pushed a commit to astronomer/airflow that referenced this pull request Oct 26, 2021
sharon2719 pushed a commit to sharon2719/airflow that referenced this pull request Oct 27, 2021
jedcunningham pushed a commit to astronomer/airflow that referenced this pull request Oct 27, 2021
@jedcunningham jedcunningham added the type:bug-fix Changelog: Bug Fixes label Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants