-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression on pid reset to allow task start after heartbeat #17333
Regression on pid reset to allow task start after heartbeat #17333
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
Do you have a dag that could be used to reproduce this behaviour? |
@ephraimbuddy IMO it's not a bug that can be reproduced in a simply dag cause it not concern task execution code but task runner.
If the starting process take more time than heartbeat (5s default), and it's not the first TI run, the problem occur. I don't know if you have another idea to reproduce that in a dag ? In my case, it occur more specifically in production at night, when the scheduler process our data of last day, and lot of task is running in parallel that generate load on our servers and our airflow db. |
Maybe our first log with that error from airflow can help
|
airflow/models/taskinstance.py
Outdated
@@ -1111,6 +1111,7 @@ def check_and_change_state_before_execution( | |||
if not test_mode: | |||
session.add(Log(State.RUNNING, self)) | |||
self.state = State.RUNNING | |||
self.pid = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be up with lines 1043-1044 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, i put it back on line where it was in 2.0 , but I'm going to push a commit to put it next to the hostname which I think is a good place too. https://github.com/apache/airflow/blob/2.0.2/airflow/models/taskinstance.py#L1076
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite understandable(makes sense to me). Let's wait for @ashb to also take a look
1eaa20b
to
428cc76
Compare
4f6b399
to
428cc76
Compare
LGTM @ephraimbuddy |
Awesome work, congrats on your first merged pull request! |
I incorporated the change https://github.com/apache/airflow/blob/2.0.2/airflow/models/taskinstance.py#L1076 and I am still getting the error .... |
Regression on PID reset to allow task start after heartbeat Co-authored-by: Nicolas MEHRAEIN <nicolas.mehraein@adevinta.com> (cherry picked from commit ed99eaa)
It's a bug we have found in production in my company.
Here is the bug pattern:
It's critical in case of sensors with reschedule mode cause it trigger a fail immediately.