Skip to content

feat(parallelism): make sub-application ID an overridable hook (#761)#781

Open
elijahbenizzy wants to merge 3 commits into
mainfrom
fix/761-customizable-sub-app-id
Open

feat(parallelism): make sub-application ID an overridable hook (#761)#781
elijahbenizzy wants to merge 3 commits into
mainfrom
fix/761-customizable-sub-app-id

Conversation

@elijahbenizzy
Copy link
Copy Markdown
Contributor

@elijahbenizzy elijahbenizzy commented May 16, 2026

Adds an overridable sub_application_id hook on TaskBasedParallelAction for users who want sub-app cache semantics different from the parent's (e.g. fresh-each-invocation under a cascading state initializer).

Background

TaskBasedParallelAction (and its MapStates / MapActions / MapActionsAndStates subclasses) compute each sub-application id as a deterministic hash of (parent_app_id, key). That's load-bearing for retry-on-failure and crash recovery: when the parent is rebuilt (e.g. inside a retry loop, or via initialize_from), each sub-task gets the same id it had on the prior attempt, so a cascading state initializer can find its persisted checkpoint and resume. The retry-on-failure path in tests/test_end_to_end.py::test_end_to_end_parallel_collatz_many_unreliable_tasks depends on this.

This default is consistent with initialize_from(..., resume_at_next_action=True): you opted into cascading checkpoint resume, and you get it on sub-apps too. That's by design. But users sometimes want different cache semantics for sub-apps -- e.g. fresh execution per parent invocation, or pinning to a business key. Today there's no clean way to express that without overriding tasks() and reaching into task.application_id post-hoc.

What this PR does

Promotes the inline sub-app id computation into a named, overridable method: TaskBasedParallelAction.sub_application_id(key, state, context). The default implementation is unchanged -- it still returns _stable_app_id_hash(parent_app_id, key) -- so existing behavior is preserved bit-for-bit. Users who want different sub-app cache behavior override the hook on their subclass.

Example

```python
import uuid
from burr.core import ApplicationContext, State
from burr.core.parallelism import MapStates

class MyParallel(MapStates):
# ... states(), action(), reduce(), reads, writes ...

def sub_application_id(
    self, key: str, state: State, context: ApplicationContext
) -> str:
    # Fresh per invocation -- trades resume-on-rebuild for fresh execution
    # under a cascading state initializer.
    return f"{context.app_id}:{key}:{uuid.uuid4()}"

```

Background context

PR #778 was an earlier attempt that changed the default to salt with context.sequence_id -- that broke the retry-on-failure path because sequence_id advances on every parent rebuild. That PR was closed after analysis. See #778 for the diagnostic discussion. Conclusion: keep the default, surface the choice to the user.

#761 is the user report that surfaced this. Per the analysis there, the reported behavior is by design (cascading initializer + deterministic ids = checkpoint resume on sub-apps too). This PR adds the escape hatch.

Test plan

  • All existing tests in tests/core/test_parallelism.py pass (19/19) -- default hook returns the same value the inline computation did.
  • tests/test_end_to_end.py::test_end_to_end_parallel_collatz_many_unreliable_tasks passes -- the default is preserved, so the retry path is unaffected.
  • New: tests/core/test_parallelism.py::test_sub_application_id_override_enables_fresh_execution_with_cascading_initializer -- overrides the hook to demonstrate fresh-execution-per-invocation, asserts all 9 sub-task executions run rather than 3 running and 6 replaying.

Previously, the sub-application id for parallel sub-apps was computed
inline inside MapActionsAndStates._create_task as a deterministic hash
of (parent_app_id, key). That hash is what enables resume-on-rebuild
for parallel sub-apps (retry-on-failure, crash recovery), but it also
means that with a cascading state initializer the sub-apps will replay
stale state on every parent invocation -- the case reported in #761.

The previous attempt (#778, closed) tried to auto-fix this by salting
sub-app ids with context.sequence_id. That broke
test_end_to_end_parallel_collatz_many_unreliable_tasks, because
sequence_id advances on parent rebuilds and the retry path could no
longer find the previously-persisted sub-app checkpoints.

This change keeps the default behavior unchanged (deterministic
(parent_app_id, key) hash, so retry-on-failure still works) and instead
exposes the id computation as a named, overridable method on
TaskBasedParallelAction. Users who hit #761 override the hook to add
whatever salt they need; everyone else is unaffected.

Adds one test exercising the new hook with a cascading initializer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core Application, State, Graph, Actions area/streaming Streaming actions, parallel streams

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants