-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
fix(aci): Retry process_delayed_workflows timeouts #95379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #95379 +/- ##
===========================================
+ Coverage 36.78% 87.77% +50.98%
===========================================
Files 9915 10509 +594
Lines 557022 606328 +49306
Branches 23720 23720
===========================================
+ Hits 204927 532193 +327266
+ Misses 351732 73772 -277960
Partials 363 363 |
def retry_timeouts(func): | ||
""" | ||
Schedule a task retry if the function raises ProcessingDeadlineExceeded. | ||
This exists because the standard retry decorator doesn't allow BaseExceptions. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you find yourself using this frequently, we could introduce deadline approaching exceptions that happen some time before the processing deadline is crossed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Timeout Handling Fails in Retry Decorator
The retry_timeouts
decorator has two issues. First, an exception raised by sentry_sdk.capture_exception()
will mask the original ProcessingDeadlineExceeded
and prevent retry_task()
from being called. Second, if retry_task()
unexpectedly doesn't raise an exception (despite its NoReturn
annotation), the wrapper implicitly returns None
, potentially masking the timeout.
src/sentry/workflow_engine/tasks/utils.py#L97-L105
sentry/src/sentry/workflow_engine/tasks/utils.py
Lines 97 to 105 in ce8936f
@wraps(func) | |
def wrapper(*args, **kwargs): | |
try: | |
return func(*args, **kwargs) | |
except ProcessingDeadlineExceeded: | |
sentry_sdk.capture_exception(level="info") | |
retry_task() | |
Was this report helpful? Give feedback by reacting with 👍 or 👎
@@ -330,7 +330,7 @@ def _process_resource_change( | |||
except model.DoesNotExist as e: | |||
# Explicitly requeue the task, so we don't report this to Sentry until | |||
# we hit the max number of retries. | |||
return retry_task(e) | |||
retry_task(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: why did we have to remove return
from here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I redefined retry_task
as officially never returning as far as the type system is concerned.
So, this return
was kinda incorrect, and due to my change mypy was able to flag it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 more question/thought
@@ -760,6 +761,7 @@ def repr_keys[T, V](d: dict[T, V]) -> dict[str, V]: | |||
), | |||
) | |||
@retry | |||
@retry_timeouts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any thoughts on some convenience decorator to group the 2 retries together. something like @retry_including_timeouts
probably will increase adoption. there are a lot of ecosystem tasks i would use this for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow-up coming soon.
Timeouts are expected here and assumed transient, and the existing
retry
decorator and config don't support retryingProcessingDeadlineExceeded
.So, we add a new task decorator to trigger retries on timeouts. This avoids opening the door to other BaseExceptions and keeps it opt-in as designed.