Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable health check threshold for triggerer #33089

Conversation

o-nikolas
Copy link
Contributor

Recently the Triggerer was forced to a health check threshold of trigger_heartbeat * 2.1 with a default of 5s for the heartbeat (as of #32123), this generates a threshold of 10.5s, the previous threshold was 30s, this leads to us observing a very unstable situation in many Airflow environments where Triggerers are not given a reasonable amount of time to heartbeat and their triggers are taken from them and given to another "healthy" Triggerer. This repeats back and fourth between multiple Triggerers.

This change allows the user to configure this threshold the same way we do for the scheduler.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:Scheduler Scheduler or dag parsing Issues area:Triggerer labels Aug 3, 2023


@pytest.mark.parametrize("check_triggerer_heartrate", [10, 60, 300])
def test_assign_unassigned_missing_heartbeat(session, create_task_instance, check_triggerer_heartrate):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new assert was added to the above test that covers the usecase this tests was trying to cover. This test was quite long and more complicated than I think required to test this particular code branch.

Recently the Triggerer was forced to a health check threshold of
`trigger_heartbeat * 2.1` with a default of 5s for the heartbeat, this
generates a threshold of 10.5s, the previous threshold was 30s, this
leads to a very unstable situation where Triggerers are not given a
reasonable amount of time to heartbeat and their triggers are taken
from them.

This change allows the user to configure this threshold the same way we
do for the scheduler.
@o-nikolas o-nikolas force-pushed the onikolas/configurable_triggerer_heartbeat_gracetime branch from c7185b7 to 3f65b7a Compare August 4, 2023 05:17
@o-nikolas
Copy link
Contributor Author

CC @potiuk @hussein-awala

@potiuk potiuk added this to the Airflow 2.7.0 milestone Aug 6, 2023
@potiuk
Copy link
Member

potiuk commented Aug 6, 2023

cc: @hussein-awala @ephraimbuddy -> i think this one should go to RC1/

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, LGTM

@hussein-awala hussein-awala merged commit 6ec3b9a into apache:main Aug 6, 2023
42 checks passed
@hussein-awala
Copy link
Member

I think this one should go to RC1

+1

ephraimbuddy pushed a commit that referenced this pull request Aug 8, 2023
Recently the Triggerer was forced to a health check threshold of
`trigger_heartbeat * 2.1` with a default of 5s for the heartbeat, this
generates a threshold of 10.5s, the previous threshold was 30s, this
leads to a very unstable situation where Triggerers are not given a
reasonable amount of time to heartbeat and their triggers are taken
from them.

This change allows the user to configure this threshold the same way we
do for the scheduler.

(cherry picked from commit 6ec3b9a)
@ephraimbuddy ephraimbuddy added the type:improvement Changelog: Improvements label Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler Scheduler or dag parsing Issues area:Triggerer type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants