Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

heartbeat recovery message #34457

Merged
merged 1 commit into from
Apr 5, 2024
Merged

Conversation

Bowrna
Copy link
Contributor

@Bowrna Bowrna commented Sep 18, 2023

related: #31810
closed: #31810

this covers the point C of the issue


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Sep 18, 2023
@Bowrna Bowrna force-pushed the heartbeat-message-enhancement branch 4 times, most recently from df94736 to 55faf46 Compare September 19, 2023 07:31
airflow/jobs/job.py Outdated Show resolved Hide resolved
airflow/jobs/job.py Outdated Show resolved Hide resolved
airflow/jobs/job.py Outdated Show resolved Hide resolved
@Bowrna Bowrna force-pushed the heartbeat-message-enhancement branch from 5c097d4 to a33cf10 Compare September 21, 2023 17:09
@Bowrna Bowrna force-pushed the heartbeat-message-enhancement branch from a33cf10 to 1730227 Compare October 1, 2023 04:45
@Bowrna Bowrna force-pushed the heartbeat-message-enhancement branch from d051cb4 to 9d82803 Compare October 13, 2023 12:35
@Bowrna Bowrna force-pushed the heartbeat-message-enhancement branch 8 times, most recently from c21c011 to 54e6c96 Compare November 1, 2023 07:10
@Bowrna Bowrna force-pushed the heartbeat-message-enhancement branch from 54e6c96 to f77fa86 Compare November 1, 2023 09:11
@Bowrna Bowrna force-pushed the heartbeat-message-enhancement branch 5 times, most recently from 0328759 to 139f4aa Compare November 2, 2023 04:34
@Bowrna Bowrna force-pushed the heartbeat-message-enhancement branch 2 times, most recently from 984c74c to dfaaabe Compare December 3, 2023 15:27
@potiuk
Copy link
Member

potiuk commented Mar 9, 2024

this looks good @Bowrna (except the static check failing) - but I have one small proposal. It took me a bit of time to see why grace_multiplier is not passed to to the method in heartbeat - and I found out that even if it is specified in is_alive method, it's never set to anything diffferent than 2.1. And I think it's placed wrrongly. it should not be parameter of is_alive, but it should be - similarly as heartrate for example) added as Job's field in Job's constructor.

self.grace_multiplier = 2.1

And then - it should be taken from job whenever needed (similarly as we take job.job_type).

It would be great to change it here - because we are anyhow touching the same code, but If you think it's too much, I am fine to approve it even without it (and you could do it as a follow-up for example).

@Bowrna
Copy link
Contributor Author

Bowrna commented Mar 13, 2024

@potiuk i will add the changes that you have suggested

@Bowrna Bowrna force-pushed the heartbeat-message-enhancement branch 2 times, most recently from 3176485 to f28a5b7 Compare March 20, 2024 06:27
@Bowrna
Copy link
Contributor Author

Bowrna commented Mar 20, 2024

@potiuk i made the changes regarding the grace_multiplier. If I remember right, the grace_multiplier was allowed to be set as a param value in config side and I don't see that part of the code currently. am I missing something here?

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But I would love some other committer to take a look - it's quite crucial part of our processing and we had some issues that have been overlooked there in the past - for example #37992

@potiuk
Copy link
Member

potiuk commented Mar 20, 2024

@potiuk i made the changes regarding the grace_multiplier. If I remember right, the grace_multiplier was allowed to be set as a param value in config side and I don't see that part of the code currently. am I missing something here?

Yes. Grace_multiplier is not set anywhere to a different value than default 2.1 now.

@potiuk
Copy link
Member

potiuk commented Mar 20, 2024

(or so I saw from reviewing the code myself). It's still good to keep it though in case we have a case we want to override it.

@Bowrna Bowrna force-pushed the heartbeat-message-enhancement branch from f28a5b7 to eef776f Compare March 20, 2024 09:19
@Bowrna Bowrna closed this Mar 20, 2024
@Bowrna Bowrna reopened this Mar 20, 2024
@potiuk
Copy link
Member

potiuk commented Mar 20, 2024

Looks like real issues.

@Bowrna Bowrna force-pushed the heartbeat-message-enhancement branch 3 times, most recently from 2e4b788 to fd2e4e1 Compare March 20, 2024 14:54
@Bowrna
Copy link
Contributor Author

Bowrna commented Mar 21, 2024

Test is failing in the health api call testcases. let me debug the issue and fix it by EOD

@Bowrna Bowrna force-pushed the heartbeat-message-enhancement branch from fd2e4e1 to ddc7d44 Compare March 21, 2024 08:43
airflow/jobs/job.py Outdated Show resolved Hide resolved
@Bowrna Bowrna force-pushed the heartbeat-message-enhancement branch 2 times, most recently from d0b3f94 to 40c4b33 Compare April 4, 2024 06:26
@Bowrna Bowrna force-pushed the heartbeat-message-enhancement branch from 40c4b33 to dda9b36 Compare April 4, 2024 06:43
@Bowrna
Copy link
Contributor Author

Bowrna commented Apr 4, 2024

All green now :)

@potiuk potiuk merged commit 6c866f4 into apache:main Apr 5, 2024
41 checks passed
@potiuk
Copy link
Member

potiuk commented Apr 5, 2024

🎉

@ephraimbuddy ephraimbuddy added the type:improvement Changelog: Improvements label Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve log messages of heartbeat connection errors and recovery
5 participants