You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The ListActors operation is currently unpaginated and loads all actors from all Redis shards into memory at once. At scale (potentially billions of records), this will cause severe memory exhaustion (OOM) and massive latency spikes.
This issue tracks the effort to redesign ListActors to support pagination across a sharded Redis cluster.
Context & Motivation
ListActors is primarily an administrative/"ops" command. It is not on the critical scheduling path. Because of this:
Usage Frequency: Expected to be rare.
Performance: It is acceptable for this operation to be relatively slow, provided it is memory-safe and paginated.
Overhead: We want to avoid introducing massive secondary indexes (like global sorted sets) just to support deterministic ordering for this one ops command, as that could add hundreds of gigabytes of overhead to the dataset.
Current Architecture & Limitations
Currently, ListActors executes via a ForEachMaster loop across all Redis shards in parallel. Inside each shard's loop, it uses SCAN and sequentially calls GET for every matching key.
Memory Exhaustion: It buffers all actors into a single []*ateapipb.Actor array before returning to the client.
N+1 Query Problem: Every actor found by SCAN triggers a synchronous GET roundtrip.
No Resumption: There is no concept of a page token or page size in the ateapipb API or store.Interface.
Design Principles & "Soft" Guarantees
To maintain scalability, we will implement the "softest" possible guarantees for list operations:
Non-deterministic Ordering: There is no guarantee of a specific order for the results.
Handling Flux & Duplicates: The system does not guarantee the absence of duplicates across pages, nor does it guarantee the inclusion/exclusion of operations that occur concurrently while iterating.
Empty Pages: Following AIP standards, it is acceptable to return a page with a page_token but zero results, forcing the client to call again to continue scanning.
sequenceDiagram
participant CLI as kubectl-ate
participant API as ateapi (controlapi)
participant Store as ateredis (Store Interface)
participant Redis1 as Redis Primary Shard 1
participant Redis2 as Redis Primary Shard 2
CLI->>API: ListActors(ListActorsRequest)
API->>Store: ListActors(ctx)
par ForEachMaster (Parallel execution)
Store->>Redis1: Scan(0, "actor:*")
loop For each matching key
Store->>Redis1: Get(key)
Redis1-->>Store: JSON string
Store->>Store: protojson.Unmarshal
end
and
Store->>Redis2: Scan(0, "actor:*")
loop For each matching key
Store->>Redis2: Get(key)
Redis2-->>Store: JSON string
Store->>Store: protojson.Unmarshal
end
end
Store->>Store: Append to result []Actor (mutex protected)
Store-->>API: []*ateapipb.Actor
API-->>CLI: ListActorsResponse{Actors}
CLI->>CLI: PrintActors
Overview
The
ListActorsoperation is currently unpaginated and loads all actors from all Redis shards into memory at once. At scale (potentially billions of records), this will cause severe memory exhaustion (OOM) and massive latency spikes.This issue tracks the effort to redesign
ListActorsto support pagination across a sharded Redis cluster.Context & Motivation
ListActorsis primarily an administrative/"ops" command. It is not on the critical scheduling path. Because of this:Current Architecture & Limitations
Currently,
ListActorsexecutes via aForEachMasterloop across all Redis shards in parallel. Inside each shard's loop, it usesSCANand sequentially callsGETfor every matching key.[]*ateapipb.Actorarray before returning to the client.SCANtriggers a synchronousGETroundtrip.ateapipbAPI orstore.Interface.Design Principles & "Soft" Guarantees
To maintain scalability, we will implement the "softest" possible guarantees for list operations:
page_tokenbut zero results, forcing the client to call again to continue scanning.sequenceDiagram participant CLI as kubectl-ate participant API as ateapi (controlapi) participant Store as ateredis (Store Interface) participant Redis1 as Redis Primary Shard 1 participant Redis2 as Redis Primary Shard 2 CLI->>API: ListActors(ListActorsRequest) API->>Store: ListActors(ctx) par ForEachMaster (Parallel execution) Store->>Redis1: Scan(0, "actor:*") loop For each matching key Store->>Redis1: Get(key) Redis1-->>Store: JSON string Store->>Store: protojson.Unmarshal end and Store->>Redis2: Scan(0, "actor:*") loop For each matching key Store->>Redis2: Get(key) Redis2-->>Store: JSON string Store->>Store: protojson.Unmarshal end end Store->>Store: Append to result []Actor (mutex protected) Store-->>API: []*ateapipb.Actor API-->>CLI: ListActorsResponse{Actors} CLI->>CLI: PrintActors