Add support for execution_ttl=0 and timeout/deadline parameters #189

chrisguidry · 2025-11-07T13:20:08Z

This adds several improvements for high-throughput applications and API clarity:

Support execution_ttl=timedelta(0) to skip state/result persistence entirely, returning to "fire-and-forget" mode with no observability overhead
Reduce default execution_ttl from 1 hour to 15 minutes
Add both timeout (timedelta) and deadline (datetime) parameters to get_result() following standard naming conventions
Add tests for perpetual tasks to verify they correctly reuse the same key across cycles without accumulating state records

When execution_ttl is 0, the system skips storing results and exceptions entirely, and deletes execution state records immediately upon completion. This provides maximum throughput for applications that don't need observability.

The get_result() method now accepts either timeout (a duration) or deadline (an absolute datetime), with validation to prevent specifying both. This makes the API more intuitive and follows Python conventions where timeout is a duration and deadline is a point in time.

🤖 Generated with Claude Code

This adds several improvements for high-throughput applications and API clarity: - Support execution_ttl=timedelta(0) to skip state/result persistence entirely, returning to "fire-and-forget" mode with no observability overhead - Reduce default execution_ttl from 1 hour to 15 minutes - Add both timeout (timedelta) and deadline (datetime) parameters to get_result() following standard naming conventions - Add tests for perpetual tasks to verify they correctly reuse the same key across cycles without accumulating state records When execution_ttl is 0, the system skips storing results and exceptions entirely, and deletes execution state records immediately upon completion. This provides maximum throughput for applications that don't need observability. The get_result() method now accepts either timeout (a duration) or deadline (an absolute datetime), with validation to prevent specifying both. This makes the API more intuitive and follows Python conventions where timeout is a duration and deadline is a point in time. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

claude · 2025-11-07T13:20:22Z

Claude finished @chrisguidry's task —— View job

Code Review for PR #189

Review Tasks

Read and analyze main implementation changes
Review test coverage and quality
Check for potential bugs and edge cases
Evaluate performance implications
Assess security concerns
Provide comprehensive feedback

Overall Assessment

This PR provides valuable performance and API improvements for high-throughput scenarios. The implementation is solid with excellent test coverage maintained at 100%. The changes are well-structured and backwards-compatible.

✅ Strengths

Fire-and-forget mode (execution_ttl=0): Excellent implementation that completely skips state persistence for maximum throughput. The immediate deletion of state records in mark_as_completed() and mark_as_failed() (src/docket/execution.py:711-715, 748-753) is clean and efficient.
API improvements: The addition of both timeout (timedelta) and deadline (datetime) parameters to get_result() (src/docket/execution.py:767-794) follows Python conventions and improves usability. The validation preventing both parameters is appropriate.
Test coverage: Comprehensive test suite in tests/test_ttl_zero.py covering:
- Basic execution with TTL=0
- State record deletion verification
- Result storage behavior
- Failed task handling
- Mixed TTL workloads
Perpetual task testing: New tests in tests/test_perpetual_state.py verify that perpetual tasks correctly reuse the same key across cycles without accumulating state records - important for preventing memory leaks.
Deterministic test synchronization: The improvements in tests/cli/waiting.py and tests/cli/test_watch.py using wait_for_watch_subscribed() eliminate race conditions in tests.

🔍 Potential Issues & Suggestions

Reduced default TTL impact: The reduction from 1 hour to 15 minutes (src/docket/docket.py:153) is reasonable but could impact users relying on longer observability windows. Consider documenting this breaking change prominently in release notes.
Edge case in result storage: When execution_ttl=0, results are not stored (src/docket/worker.py:692-703, 729-739). The tests show this causes TimeoutError when calling get_result(), but the error message could be more informative:
```
# Consider adding a check in get_result():
if self.docket.execution_ttl == timedelta(0):
    raise ValueError("Cannot retrieve results when execution_ttl is 0 (fire-and-forget mode)")
```
Race condition possibility: In mark_as_completed() and mark_as_failed() when execution_ttl=0, the state is written then immediately deleted (src/docket/execution.py:706-715). This could theoretically allow a brief window where partial state exists. Consider using a Lua script for atomicity or skipping the write entirely when TTL=0.
Documentation: The docstring for execution_ttl parameter (src/docket/docket.py:169-170) should mention that setting it to 0 enables fire-and-forget mode with no observability.

⚡ Performance Considerations

Positive: Fire-and-forget mode (execution_ttl=0) eliminates all Redis state operations after task completion, significantly improving throughput for high-volume scenarios.
Positive: Skipping result serialization when execution_ttl=0 saves CPU cycles.
Neutral: The reduced default TTL (15 minutes) will reduce Redis memory usage but may require users to adjust monitoring tools.
Consider: The conditional checks for execution_ttl in the worker hot path are minimal and shouldn't impact performance.

🔒 Security Assessment

No security concerns identified. The changes:

Don't introduce new attack vectors
Maintain proper input validation (timeout/deadline parameters)
Don't expose sensitive information
Preserve existing security boundaries

🎯 Recommendations

Document the breaking change: The default TTL reduction should be prominently documented in release notes and migration guides.
Improve error messaging: When get_result() is called with execution_ttl=0, provide a clearer error message explaining why results aren't available.
Consider atomic operations: For the edge case in state deletion with TTL=0, consider making the operation atomic or skip the write entirely.
Add usage examples: Consider adding documentation examples showing when to use execution_ttl=0 vs normal TTL settings.

✅ Test Coverage

Test coverage remains at 100% with excellent new tests:

test_ttl_zero.py: 166 lines covering all TTL=0 scenarios
test_perpetual_state.py: 179 lines verifying perpetual task behavior
Enhanced test_results.py with timeout/deadline parameter tests
Improved test synchronization utilities in waiting.py

Conclusion

This is a well-implemented PR that adds valuable functionality for high-throughput use cases while maintaining backward compatibility. The code quality is high, test coverage is comprehensive, and the implementation is thoughtful. With the minor suggestions above (especially around documentation and error messages), this PR is ready for merge.

Great work on maintaining 100% test coverage and adding deterministic test synchronization! 🎉

github-actions · 2025-11-07T13:20:28Z

📚 Documentation has been built for this PR!

You can download the documentation directly here:
https://github.com/chrisguidry/docket/actions/runs/19172859974/artifacts/4500632549

codecov-commenter · 2025-11-07T13:21:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (799bb6d) to head (088940b).

Additional details and impacted files

@@            Coverage Diff             @@
##              main      #189    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           39        41     +2     
  Lines         6557      6781   +224     
  Branches       326       345    +19     
==========================================
+ Hits          6557      6781   +224

Flag	Coverage Δ
python-3.10	`100.00% <100.00%> (ø)`
python-3.11	`99.02% <100.00%> (+0.03%)`	⬆️
python-3.12	`100.00% <100.00%> (ø)`
python-3.13	`100.00% <100.00%> (ø)`
python-3.14	`100.00% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/docket/cli.py	`100.00% <ø> (ø)`
src/docket/docket.py	`100.00% <ø> (ø)`
src/docket/execution.py	`100.00% <100.00%> (ø)`
src/docket/worker.py	`100.00% <100.00%> (ø)`
tests/cli/test_watch.py	`100.00% <100.00%> (ø)`
tests/cli/waiting.py	`100.00% <100.00%> (ø)`
tests/test_perpetual_state.py	`100.00% <100.00%> (ø)`
tests/test_results.py	`100.00% <100.00%> (ø)`
tests/test_ttl_zero.py	`100.00% <100.00%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add pragma: no branch to async with blocks in tests to fix coverage reporting. These blocks are fully executed but coverage tracks context manager entry/exit as separate branches, causing false positives. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

The watch tests were flaking in CI because subprocess startup timing created race conditions with fast-completing tasks, especially when execution_ttl=0 deletes state immediately upon completion. Tests used fixed sleeps hoping the watch subprocess would be ready, but on slow CI runners (particularly Python 3.12 + Valkey 8.0) the subprocess could take 0.5-1.0s to start, causing the task to complete before watch even subscribed to pub/sub events. ## Changes **Added deterministic synchronization:** - New `wait_for_watch_subscribed()` helper uses Redis `PUBSUB NUMSUB` to detect when watch subprocess has actually subscribed to the state channel - Modified `test_watch_running_task_until_completion` to use coordination key pattern: task waits for watch to subscribe before completing - Eliminates all timing assumptions and sleep-based coordination **Added coverage pragmas:** - `src/docket/cli.py` lines 920, 926: Progress task initialization at watch startup (legitimately hard to hit with subprocess overhead) - `tests/test_ttl_zero.py` lines 83, 109, 165: `pytest.raises()` context manager branches (no-exception branch is test failure, not code path) ## Testing - Previously flaking test now passes 20/20 consecutive runs locally - Full test suite: 420 tests, 100% coverage - No more timing dependencies or spray-and-pray sleeps This should eliminate the flakes seen in https://github.com/chrisguidry/docket/actions/runs/19141324949/job/54706906078 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

These tools help with local flake detection and validation: - pytest-repeat: Run tests multiple times (`pytest --count=N`) - pytest-flakefinder: Hunt for intermittent failures (`pytest --flake-finder`) Not used in CI - just for local development when fixing flakes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

… persistence Documents features added in PRs #181, #184, and #189: - Add comprehensive "Task State and Progress Monitoring" section to advanced-patterns.md covering execution state machine, Redis data model, pub/sub events, progress reporting, result retrieval, and CLI monitoring - Add Progress() dependency documentation to dependencies.md with usage examples and links to detailed patterns - Add "State and Result Storage" section to production.md documenting execution_ttl configuration, fire-and-forget mode (execution_ttl=0), and custom result storage backends - Add "Task Observability" section to getting-started.md introducing state tracking, progress reporting, and result retrieval with links to detailed documentation All documentation includes working code examples and follows the existing tone and style. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

… persistence (#191) ## Summary Adds comprehensive documentation for the execution state tracking, progress monitoring, and result persistence features introduced in PRs #181, #184, and #189. **Related PRs:** - #181 - Add execution state tracking and progress monitoring - #184 - Add result persistence - #189 - Add support for execution_ttl=0 and timeout/deadline parameters ## Documentation Changes ### docs/advanced-patterns.md Added a new major section **"Task State and Progress Monitoring"** covering: - **High-Level Design** - Architecture overview explaining the execution state machine, Redis data model, pub/sub event system, and result storage - **Tracking Execution State** - How to query and monitor task states - **Reporting Task Progress** - Using the `Progress()` dependency with `ExecutionProgress` API - **Monitoring Progress in Real-Time** - Subscribing to progress and state events programmatically - **Advanced Progress Patterns** - Incremental progress, batch updates, and nested tracking patterns - **Retrieving Task Results** - Using `get_result()` with timeout/deadline parameters and exception handling - **CLI Monitoring with Watch** - Using the `docket watch` command for real-time monitoring - **Fire-and-Forget Mode** - Documenting `execution_ttl=0` for high-throughput scenarios ### docs/dependencies.md Added **"Reporting Task Progress"** subsection to Built-in Context Dependencies: - Documents the `Progress()` dependency injection pattern - Explains `ExecutionProgress` methods (`set_total`, `increment`, `set_message`, `sync`) - Highlights key characteristics (atomic, real-time, observable, ephemeral) - Links to detailed documentation in advanced-patterns.md ### docs/production.md Added **"State and Result Storage"** section: - Execution state tracking configuration with `execution_ttl` - Fire-and-forget mode using `execution_ttl=0` for maximum throughput - Result storage configuration with custom backends (RedisStore, MemoryStore) - Best practices for TTL management, separate databases, and large results - Monitoring state and result storage in Redis ### docs/getting-started.md Added **"Task Observability"** section: - Introduction to execution state tracking - Basic progress reporting examples - Task result retrieval with `get_result()` - Links to detailed documentation in advanced-patterns.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>

chrisguidry and others added 3 commits November 7, 2025 09:06

chrisguidry requested a review from desertaxle November 7, 2025 15:16

Adds pytest-flakefinder and pytest-repeat as devdeps

088940b

chrisguidry merged commit 5842d32 into main Nov 7, 2025
25 checks passed

chrisguidry deleted the disable-state-tracking branch November 7, 2025 15:27

desertaxle mentioned this pull request Nov 7, 2025

Add documentation for state tracking, progress monitoring, and result persistence #191

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for execution_ttl=0 and timeout/deadline parameters #189

Add support for execution_ttl=0 and timeout/deadline parameters #189

Uh oh!

chrisguidry commented Nov 7, 2025

Uh oh!

claude bot commented Nov 7, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 7, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Nov 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add support for execution_ttl=0 and timeout/deadline parameters #189

Add support for execution_ttl=0 and timeout/deadline parameters #189

Uh oh!

Conversation

chrisguidry commented Nov 7, 2025

Uh oh!

claude bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review for PR #189

Review Tasks

Overall Assessment

✅ Strengths

🔍 Potential Issues & Suggestions

⚡ Performance Considerations

🔒 Security Assessment

🎯 Recommendations

✅ Test Coverage

Conclusion

Uh oh!

github-actions bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

claude bot commented Nov 7, 2025 •

edited

Loading

github-actions bot commented Nov 7, 2025 •

edited

Loading

codecov-commenter commented Nov 7, 2025 •

edited

Loading