-
Notifications
You must be signed in to change notification settings - Fork 11
Fix bundle analysis processor infinite retry loop from visibility timeout #633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bundle analysis processor infinite retry loop from visibility timeout #633
Conversation
…eout The bundle analysis processor was checking self.request.retries instead of self.attempts when determining if max retries were exceeded. This caused tasks to continue retrying after hitting max retries when visibility timeout caused re-deliveries, because: - self.request.retries only counts intentional retries via self.retry() - self.attempts includes visibility timeout re-deliveries via the attempts header When a task times out and gets redelivered due to visibility timeout, self.request.retries stays the same but self.attempts increases, causing the max retry check to fail. Changes: - Use self.attempts instead of self.request.retries when checking max retries - Pass self.attempts to lock_manager.locked() as documented - Pass max_retries to lock_manager.locked() for proper retry limit checking - Add test to catch visibility timeout re-delivery scenario Fixes infinite retry loop for task 61998622-ac1a-4861-a459-98bb4f51b8ed
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #633 +/- ##
=======================================
Coverage 93.90% 93.90%
=======================================
Files 1286 1286
Lines 46802 46803 +1
Branches 1517 1517
=======================================
+ Hits 43951 43952 +1
Misses 2542 2542
Partials 309 309
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
The apply_async method was always setting attempts=1, even when opt_headers already contained an attempts value. This caused tasks to reset their retry counter when recreated via apply_async (e.g., via chain or re-scheduling). Now we preserve the existing attempts value from opt_headers if present, only defaulting to 1 for new task creations. This ensures retry counters are properly maintained across task recreations.
| log.error( | ||
| "Bundle analysis processor exceeded max retries", | ||
| extra={ | ||
| "attempts": attempts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just do self.attempts here and remove the call of attempts = self.attempts
apps/worker/tasks/base.py
Outdated
| **opt_headers, | ||
| "created_timestamp": current_time.isoformat(), | ||
| "attempts": 1, | ||
| "attempts": attempts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be inlined also with opt_headers.get("attempts", 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, is this needed? we call **opt_headers
|
Regarding the question "is this needed? we call Yes, the fix is needed. While headers = {
**opt_headers, # Spreads existing headers including any "attempts"
"created_timestamp": ...,
"attempts": 1, # ⚠️ ALWAYS overwrites to 1!
}In Python dicts, later keys override earlier ones. So even if This is exactly the bug causing the infinite retry loop - when tasks are recreated via The fix ensures we preserve the existing attempts value (or default to 1 for new tasks): headers = {
**opt_headers,
"created_timestamp": ...,
"attempts": opt_headers.get("attempts", 1), # Preserves existing, defaults to 1
}I'll inline this as you suggested in the earlier comment. |
- Add debug logging for session creation/cleanup in apply_async and run - Log session_id, task name, and whether transaction was open - Preserve attempts header from PR #633 to fix visibility timeout tracking - Logging helps verify session cleanup is working correctly in production
Problem
The bundle analysis processor was experiencing infinite retry loops due to two related bugs:
self.request.retriesinstead ofself.attemptswhen determining if max retries were exceededapply_asyncmethod was always resettingattemptsto 1, even when recreating tasks that had already been retriedRoot Cause
Bug #1: Wrong retry counter
self.request.retriesonly counts intentional retries viaself.retry()self.attemptsincludes visibility timeout re-deliveries via theattemptsheaderWhen a task times out and gets redelivered due to visibility timeout,
self.request.retriesstays the same butself.attemptsincreases, causing the max retry check to fail.Bug #2: Task recreation resetting counter
apply_asyncwas always settingattempts: 1in headers, overwriting any existingattemptsvalueapply_async(e.g., via chain or re-scheduling), the retry counter would reset to 1Example
Task
61998622-ac1a-4861-a459-98bb4f51b8edkept retrying after hitting max retries because:self.request.retrieswas 10 (at max_retries)self.attemptswas 11+ (exceeded due to visibility timeout re-deliveries)self.request.retries >= self.max_retrieswhich didn't account for the additional attemptsapply_async, theattemptsheader would reset to 1, allowing infinite retriesSolution
Fix #1: Use correct retry counter
self.attemptsinstead ofself.request.retrieswhen checking max retriesself.attemptstolock_manager.locked()as documented in the method signaturemax_retriestolock_manager.locked()for proper retry limit checkingself._has_exceeded_max_attempts()helper method which correctly usesself.attemptsFix #2: Preserve attempts counter
attemptsvalue fromopt_headersif present when callingapply_asyncattempts: 1for new task creationsChanges
apps/worker/tasks/bundle_analysis_processor.py: Fixed retry check to useself.attemptsinstead ofself.request.retriesapps/worker/tasks/base.py: Fixedapply_asyncto preserve existingattemptsheader instead of always resetting to 1apps/worker/tasks/tests/unit/test_bundle_analysis_processor_task.py: Added test to catch visibility timeout re-delivery scenarioTesting
Added test
test_bundle_analysis_processor_task_max_retries_exceeded_visibility_timeoutthat simulates:self.request.retries = 5(below max_retries of 10)self.attempts = 11(exceeds max_retries due to visibility timeout re-deliveries)previous_resultinstead of continuing to retryThis test would have caught both bugs before they were deployed.
Note
Addresses infinite retry loops in bundle analysis processing caused by visibility-timeout re-deliveries being ignored.
self.attemptsand passretry_num=self.attempts, max_retries=self.max_retriestoLockManager.locked(); check limits via_has_exceeded_max_attempts()and enhance error loggingattemptsinBaseCodecovTask.apply_asyncheaders instead of resetting to 1test_bundle_analysis_processor_task_max_retries_exceeded_visibility_timeoutto ensure tasks stop retrying when attempts exceed maxWritten by Cursor Bugbot for commit 21fdf2f. This will update automatically on new commits. Configure here.