Skip to content

Conversation

@chrisguidry
Copy link
Owner

This fixes a flaky test that was failing intermittently in CI due to testing the wrong thing. We have a zero-tolerance policy for test flakes, so this needed immediate attention.

The Problem

The test test_concurrency_blocked_task_executes_exactly_once was checking exact execution counts, but what actually matters for concurrency limits is temporal overlap detection, not how many times tasks execute due to redelivery.

Tasks could be redelivered before being ACKed when they took longer than the 50ms redelivery timeout. This is acceptable behavior - the important guarantee is that concurrency semantics are preserved (no overlapping executions per customer_id).

Related flake: https://github.com/chrisguidry/docket/actions/runs/19515494612/job/55866051167

The Fix

Rewrote the test to validate what actually matters: no temporal overlaps per customer. Instead of asserting execution counts, the test now:

  • Records start/end timestamps for each execution
  • Groups by customer_id
  • Validates no two executions for the same customer overlap in time using start1 < end2 and start2 < end1

Result: 200/200 passes (previously ~40-50% failure rate)

Bonus: Test Reorganization

While fixing this, consolidated all concurrency tests into tests/concurrency_limits/ with logical grouping:

  • test_basic.py - Core ConcurrencyLimit API (12 tests, 371 LOC)
  • test_redelivery.py - Redelivery interaction (6 tests, 321 LOC)
  • test_worker_mechanics.py - Worker internals (11 tests, 288 LOC)
  • test_execution_patterns.py - Execution patterns (7 tests, 220 LOC)
  • test_errors_and_resilience.py - Error handling (7 tests, 238 LOC)

Removed 2 duplicate tests in the process. All 450 tests passing with 100% coverage maintained.

🤖 Generated with Claude Code

chrisguidry and others added 2 commits November 19, 2025 17:27
This fixes a flaky test that was failing intermittently in CI due to testing the wrong thing. The test was checking exact execution counts, but what matters for concurrency limits is temporal overlap, not how many times tasks execute due to redelivery.

Changes:
- Rewrote `test_concurrency_blocked_task_executes_exactly_once` to validate no temporal overlaps instead of execution counts
- Reorganized all concurrency tests into `tests/concurrency_limits/` with logical grouping:
  - `test_basic.py` - Core ConcurrencyLimit API (12 tests, 371 LOC)
  - `test_redelivery.py` - Redelivery interaction (6 tests, 321 LOC)
  - `test_worker_mechanics.py` - Worker internals (11 tests, 288 LOC)
  - `test_execution_patterns.py` - Execution patterns (7 tests, 220 LOC)
  - `test_errors_and_resilience.py` - Error handling (7 tests, 238 LOC)
- Removed 2 duplicate tests
- All 450 tests passing with 100% coverage
- Fixed test now passes 200/200 iterations (previously ~40-50% failure rate)

The root issue was that tasks could be redelivered before being ACKed when they took longer than the 50ms redelivery timeout. This is acceptable behavior - what matters is that concurrency semantics are preserved (no overlapping executions per customer_id).

Related: https://github.com/chrisguidry/docket/actions/runs/19515494612/job/55866051167

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@claude
Copy link
Contributor

claude bot commented Nov 19, 2025

Claude finished @chrisguidry's task —— View job


PR Review: Fix flaky concurrency test and reorganize test suite

✅ Test Fix Analysis

The fix for test_concurrency_blocked_task_executes_exactly_once correctly addresses the root issue:

Previous approach (flaky):

  • Checked exact execution counts
  • Failed when tasks were redelivered due to 50ms timeout
  • Testing implementation detail rather than actual guarantee

New approach (robust):

  • Records start/end timestamps for each execution (tests/concurrency_limits/test_redelivery.py:271-274)
  • Validates temporal non-overlap using proper interval logic: overlap = start1 < end2 and start2 < end1 (test_redelivery.py:302)
  • Correctly allows redeliveries while ensuring concurrency semantics

This is the right fix - what matters for concurrency limits is preventing simultaneous execution, not counting deliveries.

✅ Test Reorganization

The reorganization into tests/concurrency_limits/ is well-structured:

  • test_basic.py (12 tests): Core ConcurrencyLimit API and validation
  • test_redelivery.py (6 tests): Redelivery timeout interaction
  • test_worker_mechanics.py (11 tests): Internal worker mechanisms
  • test_execution_patterns.py (7 tests): Task execution patterns
  • test_errors_and_resilience.py (7 tests): Error handling and stress testing

👍 Strengths

  1. Proper test isolation: The temporal overlap check correctly validates the actual guarantee
  2. Logical grouping: Tests are now organized by concern rather than scattered
  3. Comprehensive coverage: 43 tests total (27 removed from test_worker.py, 41 async + 2 sync added)
  4. Good documentation: Each module has clear docstrings explaining its focus
  5. No functionality lost: All test scenarios preserved or improved

🔍 Minor Observations

  1. Test duplication check: Found 2 tests were removed as duplicates - good cleanup
  2. Coverage maintained: PR claims 100% coverage is maintained
  3. Naming consistency: All test files follow clear naming patterns

🎯 Recommendations

  1. Consider adding a comment in the test explaining why temporal overlap is the correct validation approach
  2. The 50ms redelivery timeout in the test is quite aggressive - document why this stress-test value was chosen
  3. Good use of execution keys to track individual task instances

✅ Verdict

This PR successfully fixes a legitimate test flake by correcting the test logic to validate the actual guarantee (no concurrent executions) rather than an implementation detail (execution count). The test reorganization improves maintainability without losing any coverage. The fix has been validated with 200/200 passes.

Approved for merge - excellent work fixing the flake and improving test organization!

@github-actions
Copy link

📚 Documentation has been built for this PR!

You can download the documentation directly here:
https://github.com/chrisguidry/docket/actions/runs/19519839786/artifacts/4621439960

@codecov-commenter
Copy link

codecov-commenter commented Nov 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (1f62900) to head (bfe720e).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##              main      #202   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           43        45    +2     
  Lines         7219      7237   +18     
  Branches       369       375    +6     
=========================================
+ Hits          7219      7237   +18     
Flag Coverage Δ
python-3.10 100.00% <100.00%> (ø)
python-3.11 98.97% <87.13%> (-0.11%) ⬇️
python-3.12 100.00% <100.00%> (ø)
python-3.13 100.00% <100.00%> (ø)
python-3.14 100.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/docket/execution.py 100.00% <100.00%> (ø)
src/docket/worker.py 100.00% <ø> (ø)
tests/concurrency_limits/test_basic.py 100.00% <100.00%> (ø)
...s/concurrency_limits/test_errors_and_resilience.py 100.00% <100.00%> (ø)
...ests/concurrency_limits/test_execution_patterns.py 100.00% <100.00%> (ø)
tests/concurrency_limits/test_redelivery.py 100.00% <100.00%> (ø)
tests/concurrency_limits/test_worker_mechanics.py 100.00% <100.00%> (ø)
tests/test_worker.py 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@chrisguidry chrisguidry merged commit 787dcc9 into main Nov 19, 2025
25 checks passed
@chrisguidry chrisguidry deleted the fix-concurrency-blocked-flake branch November 19, 2025 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants