Move caching from plan-build time to task execution time#3
Merged
Conversation
Cache detection now happens inside execute_task(): if manifest.json already exists at the storage prefix, the task returns immediately with cached=True without re-running. This removes the is_cached callback from Pipeline.build_plan(), TaskNode.cached, make_cache_checker(), and all "all-cached" shortcuts in the Celery and Step Functions backends. https://claude.ai/code/session_01FKpHFKYAG2Gig6YVehiQuF
Tasks communicate through file writes to the storage context, not return values. The files_written list in ExecutionResult was redundant and unused — Celery tasks are immutable so return values aren't passed between tasks, and Step Functions discards Lambda output via ResultPath:null. https://claude.ai/code/session_01FKpHFKYAG2Gig6YVehiQuF
- Add PlanHandle (muflow/backends/handle.py): serializable reference to a submitted plan execution. Supports get_state(), cancel(), to_json(), and from_json() for all three backends without retaining the backend instance. Enables the polling pattern for Django-style DB updates. - Refactor CompletionCallback to notify(plan_id, success, error) — removes domain-specific analysis_id parameter. CeleryBackend wires completion via a new muflow.send_completion Celery task registered by create_celery_task(). StepFunctionsBackend logs a warning and defers to PlanHandle.get_state() polling. - submit_plan() now returns PlanHandle on all three backends; raw str return removed. on_node_start/complete/failure remain on LocalBackend as non-protocol parameters but are removed from the ExecutionBackend protocol. https://claude.ai/code/session_01FKpHFKYAG2Gig6YVehiQuF
A node is complete when manifest.json exists at its storage prefix. This adds a storage-layer ProgressChecker protocol with Local and S3 implementations, plus PlanHandle.get_progress() that uses it. - muflow/storage/progress.py: ProgressChecker protocol, LocalProgressChecker (os.path.exists), S3ProgressChecker (HEAD per prefix), make_progress_checker() factory - PlanHandle gains node_prefixes, storage_type, storage_config fields and get_progress() -> PlanProgress with total/completed/fraction/ node_breakdown - PlanProgress: total, completed, node_breakdown dict, fraction and is_complete properties - All three backends populate the new PlanHandle fields in submit_plan() - 26 new tests covering both checker implementations and integration with LocalBackend end-to-end https://claude.ai/code/session_01FKpHFKYAG2Gig6YVehiQuF
Documents all changes from this release cycle: execution-time caching, files_written removal, PlanHandle, CompletionCallback signature change, ProgressChecker, and CeleryCompletionCallback wiring. https://claude.ai/code/session_01FKpHFKYAG2Gig6YVehiQuF
DESIGN.md documents the architectural decisions discussed during this development cycle: separation of concerns, content-addressed storage, execution-time caching, file-only task communication, PlanHandle rationale, ProgressChecker placement in the storage layer, completion callback homogenisation, and the no-callback polling pattern. README.md: replace stale is_cached caching example with the current automatic caching behaviour; update backend section to show PlanHandle return type and its get_state()/get_progress()/to_json() API. https://claude.ai/code/session_01FKpHFKYAG2Gig6YVehiQuF
The document incorrectly stated that a failed task does not prevent re-execution. In fact write_manifest() is in a finally block and runs unconditionally, so a failed node writes manifest.json and will be silently skipped as cached on any subsequent run. Clarify that deleting the storage prefix is required to re-run a failed node. https://claude.ai/code/session_01FKpHFKYAG2Gig6YVehiQuF
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cache detection now happens inside execute_task(): if manifest.json
already exists at the storage prefix, the task returns immediately with
cached=True without re-running. This removes the is_cached callback
from Pipeline.build_plan(), TaskNode.cached, make_cache_checker(), and
all "all-cached" shortcuts in the Celery and Step Functions backends.
https://claude.ai/code/session_01FKpHFKYAG2Gig6YVehiQuF