Design and merge plan: operator output port result cache (MVP) #5880

Xiao-zhen-Liu · 2026-06-22T05:20:48Z

Xiao-zhen-Liu
Jun 22, 2026
Collaborator

Design and merge plan: operator output port result cache (MVP)

In current Texera main, the engine runs a workflow from the start every time, even when the user changed only one operator near the end. This proposal adds a result cache so that, on a re-run, an output port whose upstream computation logic is unchanged reads its saved result instead of recomputing it. The code is written and working on a prototype branch. This post describes the design and the plan to bring it into main as small PRs, so anyone can raise concerns before the PRs go up.

Matching results across executions

Each output port has a cache key built from its upstream operators, their parameters, their output schemas, and the wiring between them. Two ports with the same cache key produce the same result (output port equivalence). When a run saves a port's result, we record (workflow, port, cache key) -> result location. On a later run, a port whose cache key has a recorded result is a matched port, and that result can be reused. Any edit upstream of a port changes its cache key, so its old result is no longer matched.

Scope (MVP)

In scope: reuse the saved result at every matched port (full reuse), match by cache key, invalidate entries that no longer match after an edit, and read, write, and clear the cache from the UI.

Out of scope (future work, not in these PRs): choosing per port whether reuse is cheaper than recompute (cost-based reuse planning), and removing results under storage limits (eviction). The merged code always reuses a matched port's result.

How it fits the current system

Current main (figure below): the Workflow Compiler builds a physical plan, CostBasedScheduleGenerator builds a schedule of regions, and the executor runs the regions on workers that read and write tables in storage.

op-port-cache-related-diagrams-0 execution components drawio

With the cache MVP (figure below), the engine includes additional modules:

Cache service: at submission, find the matched ports for this workflow (cache-key lookup); during execution, record cache metadata.
Skeleton generator: remove the operators and edges whose results are reused, leaving the run-skeleton, the part that still needs to run.
Scheduler (CostBasedScheduleGenerator): schedule the run-skeleton as it does today. The removed part becomes regions that are skipped, and operators that read from it use the saved result locations.

The executor saves results to the cached-result storage as ports finish.

op-port-cache-related-diagrams-0 execution components (cache) drawio

Nothing changes when there are no matched ports

On the first run of a workflow, or any run right after an upstream edit, there are no matched ports: the run-skeleton is the whole plan and the schedule is the same as today. The cache changes behavior only when a matched port exists, so the code can land inactive and turn on once results are saved.

Merge plan: five PRs

In dependency order; each has its own issue:

Storage foundation (Add operator output port cache storage (table, DAO, cache key) #5882): the cache table, the code that reads and writes it, and the cache-key computation.
Cache state and statistics (Add a completed-from-cache operator state and cached-region statistics handling #5883): a "completed from cache" operator state and the matching statistics handling.
Scheduler (Add cache-reuse planning to the scheduler (skeleton generation and schedule assembly) #5884): the reuse planner (full reuse), skeleton generator, and the change that schedules the run-skeleton and combines it with the skipped regions.
Turn the feature on (Wire cache lookup, result saving, REST endpoints, and cleanup into execution #5885): the submission-time cache-matcher lookup, saving results as ports finish, the cache endpoints, and cleanup on deletion.
Frontend (Add cache panel and canvas display to the workflow editor #5886): the cache panel and the canvas display.

PRs 1 and 2 are independent. PR 3 needs 1 and 2; PR 4 needs 3; PR 5 needs 4.

chenlica · 2026-06-22T06:31:00Z

chenlica
Jun 22, 2026
Collaborator

@Xiao-zhen-Liu Thanks for the great summary. Please also describe our plan to manage the lifecycle of the cached results.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design and merge plan: operator output port result cache (MVP) #5880

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Design and merge plan: operator output port result cache (MVP) #5880

Uh oh!

Uh oh!

Xiao-zhen-Liu Jun 22, 2026 Collaborator

Design and merge plan: operator output port result cache (MVP)

Matching results across executions

Scope (MVP)

How it fits the current system

Nothing changes when there are no matched ports

Merge plan: five PRs

Replies: 1 comment

Uh oh!

chenlica Jun 22, 2026 Collaborator

Xiao-zhen-Liu
Jun 22, 2026
Collaborator

chenlica
Jun 22, 2026
Collaborator