-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AirflowTaskTimeout should inherit BaseException instead of Exception? #35644
Comments
Yes. good idea. Might be worth implementing it. |
That is exactly what a Option 1 propose 😄 |
Proposed change here #35653 |
Yeah. But it's not the same issue (or at least I believe it's not) - I believe the original issue was connected with SIGALRM not being properly handled by long-running low-level c-code. Catching and ignoring exceptions is a bad programming habit by anyone who run task in a loop from example described hy @hterik. You should generally ignore known exceptions not all of them if you want to add them to the loop in the fashon described here. We should still make the change to AirflowTaskTimeout inheriting from BaseException to handle this "bad programming" case of course - (we do not want our users to shoot themselves in a foot by not knowing that they are doing something wrong) - but I think Kafka case from #35474 was somethign entirely different (only loosely related). I highly doubt somone in Kafka code handles all exceptions this way - my guess is that SIGALRM was not handled there :) |
To be honest we don't know what the reason in particular this cases. It could be both:
In second case when Airflow task SIGALARM works, we do not control in which place in Main thread code returns In this case it will break Airflow timeout, and in addition it might breaks something else try
... # <- returns here
except Exception:
pass In case if handler return in this code block, then change inheritance wouldn't help at all try
... # <- returns here
except: # or except BaseException:
pass |
Code that normally catches Exception should not implicitly ignore interrupts from AirflowTaskTimout. Fixes apache#35644 Partially addresses apache#35474
Yep. There might be many reasons - that's why I would leave #35474 open (and likely someone taking a stab on it). For me managing timeout from the parent process that can perform signal escalation and eventually SIGKILL the task is tho only "ultimate" solution that works in all cases (though defining Timeout as BaseException first is nicer because it will allow for much more controlled and gentle timeout handling with closing all resources in case user does - indeed - catches and does not bubble up |
The problem a bit more deep, all current exceptions is a part of Public Interface of Airflow which I think it should not be, there are only couple exceptions which should be a part of public interface and IMO About SIGKILL it would be trade off between work almost always and complete |
Yes. I think we should remove those "non -public" exceptions from the list. The list might contain things that should not be there and we are free to remove them if they found its way by mistake (like any other bugfix).. Might be a good idea to do so as a separate PR. But even if not - I think changing the base for exception does not qualify as "Breaking change" IMHO. It's still an exception and it does not change the usage of it. |
Just a comment on why we can modify the list - SemVer is not a |
I've found some grey area of Public interface, but let's focus on Exceptions and current case. Many of the things point for the future improvements or clarifications. Maybe you could help me with this puzzle Statement 1. The same valid with |
There are probably plenty. It's open for contribution to clarify them as usual :). SemVer is very clear about this, https://semver.org/ SemVer is a "communication" tool. It is about communication of our intentions about what our public API does, not about making sure all implementation details follow strict "I do not change this/ I change that" policy.
I'd modify it a bit. The existence of AirflowTaskTimeout (with intent of being an exception to be thrown when timeout occures is Public. The implementation and fact that it derives from BaseException or Exception is internal implementation details. If we change it, we should tell it but it does not change the intention. Public APIs are usually about intentions not about details. Someone COULD rely on the implementation detail of it being based on Exception but this was not the intention we had. Hyrum's law is very clear about it (and I fully agree with it). Any change has potential of breaking someone's workflow. But Public APIS are intentions, not promises - and the change here which i see is that we clarify the intention we had with that exception.
Yes. Applies well - intention of signifying timeout is Public, all the other details being internal implementation details.
try:
do something
except Timeout:
/// react somehow All is fine and well. We will have cases that we break someone's workflow. TOUGH. We will have to bite the bullet. If we have not clarified our intentions well enough before - it might be even seen as our fault. Also tough. But with clarifying intentions we are moving closer and closer to the place that most of our Publiic APIs is listed and intentions explained. This should be our goal. |
And yes. This is inevitable. It's our job as maintainers to take a calculated risk, whether it's worth risking it when we clarify our intention and leave "other ways" where things can be broken. This is of course extreme case, but it very nicely shows that SemVer and our decisions are never 0/1. It's always continous spectrum of us impacting other workflows. And we should take a decision how far we want to go when we do - actually - clarify our intentions. And comment on this one (the Timeout case):
Of course the only thing you can do is to "attempt" it. People will misinterprete things and will use things in unintended ways (as in the comic above). And you can do absolutely nothing about it. But you can at least make sure that your intentions are clearer and clearer with every iteration. And this should be IMHO our goal with the "Public Interface" page - continue updating it and explaining things every time we find any misinterpretation of it, and sometimes allowe changes even if we know some workflows will be broken, as long as we make sure our intentions are clarified. |
So ? Any more concerns :) ? |
The bug report and discussion seems to be a bit stale. Just looking through the PR it looks good and reasonable for me. There might be side effects but we can handle this via a newsfragment I think. |
Code that normally catches Exception should not implicitly ignore interrupts from AirflowTaskTimout. Fixes apache#35644 apache#35474
Code that normally catches Exception should not implicitly ignore interrupts from AirflowTaskTimout. Fixes apache#35644 apache#35474
Code that normally catches Exception should not implicitly ignore interrupts from AirflowTaskTimout. Fixes apache#35644 apache#35474
Code that normally catches Exception should not implicitly ignore interrupts from AirflowTaskTimout. Fixes apache#35644 apache#35474
Code that normally catches Exception should not implicitly ignore interrupts from AirflowTaskTimout. Fixes apache#35644 apache#35474
Code that normally catches Exception should not implicitly ignore interrupts from AirflowTaskTimout. Fixes apache#35644 apache#35474
Code that normally catches Exception should not implicitly ignore interrupts from AirflowTaskTimout. Fixes apache#35644 apache#35474
Code that normally catches Exception should not implicitly ignore interrupts from AirflowTaskTimout. Fixes apache#35644 apache#35474
Discussed in #35643
Originally posted by hterik November 14, 2023
Apache Airflow version
2.7.3
What happened
Assume following code runs inside airflow task:
If the Airflow task time out, what happens is that it injects an
AirflowTaskTimeout
exception where the code is currently running now.If the code is designed to capture exceptions, it will capture the timeout and potentially continue running for several hours anyway.
What you think should happen instead
I think it would be better if
AirflowTaskTimeout
was treated similarly toKeyboardInterrupt
, so it need explicitexcept
block to capture.Moving it outside of the
Exception
inheritance tree to inherit directly fromBaseException
would solve that.See https://docs.python.org/3/library/exceptions.html#exception-hierarchy
Timeouts are generally in place to be last resort of aborting process by the higher level system if the lower level code has bugs in it, so it's safer to assume that code that doesn't deal with catching timeout should be aborted.
How to reproduce
See above
Operating System
NA
Versions of Apache Airflow Providers
apache-airflow==2.7.1
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: