New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
heartbeat logmessage fix #31996
heartbeat logmessage fix #31996
Conversation
Some of the suggestions provided in the issue by @hterik
the current changes cover A, B and D. I will figure out how I can find if the heartbeat is recovering and finish C part. |
0be788d
to
33c0427
Compare
33c0427
to
98ae62a
Compare
airflow/jobs/job.py
Outdated
if self.is_alive(): | ||
self.log.error("%s heartbeat failed with error. Scheduler may go into unhealthy state", self.__class__.__name__) | ||
else: | ||
self.log.error("%s heartbeat failed with error. Scheduler is in unhealthy state", self.__class__.__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user of airflow reading the logs of a dag i would not understand what this means to me. Is this something i have to react to? Do i need to contact my admins? Is the dag results corrupted? Should i restart the scheduler?
This error isn't necessarily a problem with the scheduler. More often it is a problem of the executor not being able to reach the database, due to transient network problems. As long as this error is transient and recovers shortly, the consequence of this is usually none. The log message should reflect this. If this is too much to fit into a log-message, linking to the architecture documentation at airflow.apache.org as suggested by potiuk above sounds like a good proposal.
98ae62a
to
53e0f12
Compare
@potiuk In this PR, I am specifically working on the heartbeat failure when updating the table about the heartbeat timestamp. So its more related to db problems when executor wants to update the table. But with above comments, I think the scope for this issue is expanding. Is my understanding right @potiuk ? |
That's the proposal - but it's just suggestion, if you want to stay with small change - up to you. |
@potiuk thanks for the update. Let me break the proposal into a few more tasks and work on it. |
53e0f12
to
283a9f2
Compare
283a9f2
to
827c8a2
Compare
airflow/jobs/job.py
Outdated
self.log.error( | ||
"%s heartbeat failed with error. Scheduler is in unhealthy state", self.__class__.__name__ | ||
) | ||
# self.log.exception("%s heartbeat got an exception", self.__class__.__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# self.log.exception("%s heartbeat got an exception", self.__class__.__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM: with the small NIT of removing the commented out line
thanks @potiuk let me fix it |
827c8a2
to
22de77e
Compare
I wrongly did a git force push with main branch changes. Please excuse. I will clean up and commit the changes |
closes: #31810
relates: #31810
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.