Skip to content

[Feature] Implement scalable pagination for ListActors #153

@MushuEE

Description

Overview

The ListActors operation is currently unpaginated and loads all actors from all Redis shards into memory at once. At scale (potentially billions of records), this will cause severe memory exhaustion (OOM) and massive latency spikes.
This issue tracks the effort to redesign ListActors to support pagination across a sharded Redis cluster.

Context & Motivation

ListActors is primarily an administrative/"ops" command. It is not on the critical scheduling path. Because of this:

  • Usage Frequency: Expected to be rare.
  • Performance: It is acceptable for this operation to be relatively slow, provided it is memory-safe and paginated.
  • Overhead: We want to avoid introducing massive secondary indexes (like global sorted sets) just to support deterministic ordering for this one ops command, as that could add hundreds of gigabytes of overhead to the dataset.

Current Architecture & Limitations

Currently, ListActors executes via a ForEachMaster loop across all Redis shards in parallel. Inside each shard's loop, it uses SCAN and sequentially calls GET for every matching key.

  • Memory Exhaustion: It buffers all actors into a single []*ateapipb.Actor array before returning to the client.
  • N+1 Query Problem: Every actor found by SCAN triggers a synchronous GET roundtrip.
  • No Resumption: There is no concept of a page token or page size in the ateapipb API or store.Interface.

Design Principles & "Soft" Guarantees

To maintain scalability, we will implement the "softest" possible guarantees for list operations:

  1. Non-deterministic Ordering: There is no guarantee of a specific order for the results.
  2. Handling Flux & Duplicates: The system does not guarantee the absence of duplicates across pages, nor does it guarantee the inclusion/exclusion of operations that occur concurrently while iterating.
  3. Empty Pages: Following AIP standards, it is acceptable to return a page with a page_token but zero results, forcing the client to call again to continue scanning.
sequenceDiagram
    participant CLI as kubectl-ate
    participant API as ateapi (controlapi)
    participant Store as ateredis (Store Interface)
    participant Redis1 as Redis Primary Shard 1
    participant Redis2 as Redis Primary Shard 2

    CLI->>API: ListActors(ListActorsRequest)
    API->>Store: ListActors(ctx)
    
    par ForEachMaster (Parallel execution)
        Store->>Redis1: Scan(0, "actor:*")
        loop For each matching key
            Store->>Redis1: Get(key)
            Redis1-->>Store: JSON string
            Store->>Store: protojson.Unmarshal
        end
    and
        Store->>Redis2: Scan(0, "actor:*")
        loop For each matching key
            Store->>Redis2: Get(key)
            Redis2-->>Store: JSON string
            Store->>Store: protojson.Unmarshal
        end
    end
    
    Store->>Store: Append to result []Actor (mutex protected)
    Store-->>API: []*ateapipb.Actor
    API-->>CLI: ListActorsResponse{Actors}
    CLI->>CLI: PrintActors
Loading

Metadata

Metadata

No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions