-
Notifications
You must be signed in to change notification settings - Fork 4
Fix concurrency slot release bug causing task overlap #223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When a task was blocked via ConcurrencyBlocked (e.g., a redelivery of a task that's still running), the finally block would still call _release_concurrency_slot with the same execution.key, removing the slot that the original delivery was actively using. This allowed other tasks to acquire that slot and run concurrently. The fix tracks whether we actually acquired a slot (`acquired_concurrency_slot`) and only releases in the finally block if we did. Also cleaned up the Lua script: - Removed ZREMRANGEBYSCORE which was expiring slots based on redelivery_timeout (causing slots to be evicted while tasks were still running) - Changed from worker_id:task_key to just task_key as the member for O(1 ZSCORE lookup to detect redelivery of the same task - Added slot_timeout logic to block fresh redeliveries but allow takeover of stale slots from crashed workers The test now uses two workers to properly stress cross-worker concurrency limits. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Claude finished @chrisguidry's task —— View job Code Review Feedback✅ SummaryThis PR effectively fixes a critical concurrency bug where slots were being incorrectly released when tasks were blocked by concurrency limits. The solution is clean and well-tested. 🔍 Key Changes AnalysisBug Fix: Concurrency Slot Tracking (
|
|
📚 Documentation has been built for this PR! You can download the documentation directly here: |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #223 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 48 48
Lines 7745 7745
Branches 403 402 -1
=========================================
Hits 7745 7745
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
PR #223 fixed a concurrency slot release bug but removed the ZREMRANGEBYSCORE that cleaned up stale entries. This could cause memory growth over time if slots become orphaned (e.g., worker crashes between acquiring and releasing). The fix adds ZREMRANGEBYSCORE back to the Lua script but with the correct cutoff: `slot_timeout` (redelivery_timeout + 5) instead of `redelivery_timeout`. This is safe because any slot older than slot_timeout is either: 1. Orphaned (should have been released) - safe to clean 2. From a task exceeding redelivery_timeout - will be redelivered anyway 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
PR #223 fixed a concurrency slot release bug but removed the ZREMRANGEBYSCORE that cleaned up stale entries. This could cause memory growth over time if slots become orphaned (e.g., worker crashes between acquiring and releasing). The fix adds ZREMRANGEBYSCORE back to the Lua script but with the correct cutoff: `slot_timeout` (redelivery_timeout + 5) instead of `redelivery_timeout`. This is safe because any slot older than slot_timeout is either: 1. Orphaned (should have been released) - safe to clean 2. From a task exceeding redelivery_timeout - will be redelivered anyway Related: #223 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
When a task was blocked via
ConcurrencyBlocked(e.g., a redelivery of a taskthat's still running), the finally block would still call
_release_concurrency_slotwith the same
execution.key, removing the slot that the original delivery wasactively using. This allowed other tasks to acquire that slot and run concurrently.
The fix tracks whether we actually acquired a slot (
acquired_concurrency_slot)and only releases in the finally block if we did.
Also cleaned up the Lua script:
ZREMRANGEBYSCOREwhich was expiring slots based onredelivery_timeout(causing slots to be evicted while tasks were still running)
worker_id:task_keyto justtask_keyas the member for O(1)ZSCORElookup to detect redelivery of the same task
slot_timeoutlogic to block fresh redeliveries but allow takeover ofstale slots from crashed workers
The test now uses two workers to properly stress cross-worker concurrency limits.
🤖 Generated with Claude Code