ZenosInteractive · virtexalejandro · Apr 28, 2026 · Apr 23, 2026 · Apr 28, 2026 · Apr 28, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -10,11 +10,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Added
 
 - **scripts**: `scripts/release_sdk.sh` -- Linux/macOS counterpart to `scripts/release_sdk.bat`.  Builds the SDK libs + `vtx_cli` in Release mode and installs into `./dist`.  Removes the build/release script asymmetry between Windows and Linux
+- **reader/api**: `ReaderContext::IsReady()`, `IsReadyFailed()`, `GetReadyError()`, `WaitUntilReady()` + `WaitUntilReady(std::chrono::milliseconds)` for explicit "first chunk in RAM" signalling, plus new `ReplayReaderEvents::OnReady` / `OnReadyFailed` callbacks.  Previously `ReaderContext::Loaded()` flipped to `true` the instant `OpenReplayFile()` returned -- header and footer parsed, property-address cache built, seek table ready, but zero chunks decompressed in RAM.  The first `GetFrameSync()` call still paid the full ZSTD + deserialise cost synchronously, and the Inspector already carried a redundant `is_file_loaded_` flag alongside `Loaded()` to paper over the gap (`tools/inspector/include/inspector_session.h:25`).  Now `OpenReplayFile()` eagerly kicks off an async load of chunk 0 as part of opening (via the existing `WarmAt(0)` / `UpdateCacheWindow` pipeline; empty 0-frame replays flip the flag vacuously through a new `MarkReadyVacuous()` facade hook so waiters never hang).  Callers consume the signal in whichever style they prefer: poll (`while (!ctx.IsReady()) ...`), block (`ctx.WaitUntilReady(2s)`), or register a callback (`OnReady` / `OnReadyFailed` fire exactly once each, single-shot guarded under `ready_mutex_` so racing async + sync load paths cannot double-fire).  Failure semantics: a corrupt or unreadable chunk 0 does NOT fail `OpenReplayFile()` itself -- the reader is still constructed, `IsReadyFailed()` returns `true`, `GetReadyError()` carries the message, and downstream `GetFrame*()` calls behave as before (return `nullptr` / empty).  The header-parsed-ok-but-chunk-zero-broken state stays useful to inspector-style tools that want to show partial file info.  Destructor best-effort unblocks any waiter by flipping `ready_failed_` + notifying the condition variable under `ready_mutex_`; callers remain responsible for joining their waiter threads before destroying the `ReaderContext` (C++ standard requires no blocked waiters at condition-variable destruction time)
+- **tests**: six new cases in `tests/reader/test_reader_context.cpp` under "§READY: chunk-0 ready signalling".  `ReaderContextHappy.ReadyFlipsWithinTimeoutOnValidReplay` asserts `WaitUntilReady(5s)` returns `true` on a well-formed replay; `ReadyIsStableAcrossRepeatedQueries` pins the terminal-state stability guarantee; `WaitUntilReadyIsIdempotent` asserts repeated calls after ready return immediately; `ReaderContextReady.OnReadyFiresOnDirectFacadeWithPreWiredEvents` uses `CreateFlatBuffersFacade()` directly, wires events before `WarmAt(0)`, and polls an atomic counter to verify single-shot firing; `ReadyIsVacuousForZeroFrameReplay` exercises the `MarkReadyVacuous` path with a `GTEST_SKIP` fallback if the writer refuses a 0-frame replay; `ReadyFailsOnCorruptChunkZero` writes a valid file then overwrites its middle third with `0xFF` bytes and verifies `WaitUntilReady` returns `false` + `IsReadyFailed()` + non-empty `GetReadyError()`.  No destruction-race test: destroying `std::condition_variable` / `std::mutex` while waiters are blocked is UB per the standard, so the API contract is "join waiters before destroying" and the dtor's `notify_all` is best-effort only
 
 ### Changed
 
 - **repo layout**: all five build/clean/release wrappers moved from the repo root into `scripts/` (`build_sdk.bat`, `build_sdk.sh`, `clean.bat`, `clean.sh`, `release_sdk.bat`).  Each script now `cd`s to the repo root internally so invocations like `./scripts/build_sdk.sh` or `scripts\build_sdk.bat` work from any working directory.  Documentation references (README, CONTRIBUTING, docs/BUILD.md) updated accordingly
 - **repo layout**: `reports/benchmarks/` renamed to `docs/benchmarks/` to signal that the committed baseline outputs are reference documentation (co-located with `docs/PERFORMANCE.md` which narrates them) rather than stray CI artefacts.  `reports/` directory removed.  References in `docs/PERFORMANCE.md`, `docs/BUILD.md`, and the benchmark write-ups updated
+- **reader/api**: `OpenReplayFile()` now triggers an eager prefetch of chunk 0 via the existing async pipeline before returning.  Open latency on the calling thread is unchanged because the load runs on the same background thread `WarmAt` / `UpdateCacheWindow` already dispatches to; the prior "first `GetFrame*()` is slow" cost is moved off the first access onto the open-time spawn path (same total work, just overlapped with caller init).  Only chunk 0 is warmed -- the facade temporarily narrows the cache window to `(0, 0)` around the warm call and restores it to the default `(2, 2)` immediately after, so callers that set a narrow window right after `OpenReplayFile()` (memory-constrained tools, tests that isolate a single chunk) observe exactly the cache contents they asked for.  `ReaderContext::Loaded()` semantics are unchanged: still means "reader object exists".  New concept is `IsReady()` == "chunk 0 decompressed and deserialised in RAM"
 
 ## [0.1.0] - 2026-04-24
 

diff --git a/benchmarks/CMakeLists.txt b/benchmarks/CMakeLists.txt
@@ -54,6 +54,7 @@ endif()
 # Initialize + RunSpecifiedBenchmarks + Shutdown.
 add_executable(vtx_benchmarks
     bench_reader.cpp
+    bench_reader_ready.cpp
     bench_writer.cpp
     bench_differ.cpp
     bench_property_cache.cpp

diff --git a/benchmarks/bench_reader_ready.cpp b/benchmarks/bench_reader_ready.cpp
@@ -0,0 +1,122 @@
+// VTX SDK -- reader "ready" benchmarks.
+//
+// What the eager-chunk-0 warm changes
+//   OpenReplayFile() used to return as soon as header + footer were parsed;
+//   the first GetFrame* call then paid the full ZSTD decompress +
+//   deserialise cost synchronously.  Now OpenReplayFile() kicks off an
+//   async load of chunk 0 as part of opening, so the decompress runs on a
+//   background thread and typically overlaps with caller initialisation.
+//
+// Scenarios
+//   BM_ReaderOpenOnly             OpenReplayFile + return (no wait)
+//   BM_ReaderOpenToReady          OpenReplayFile + WaitUntilReady
+//   BM_ReaderOpenToFirstFrame     OpenReplayFile + GetFrameSync(0)
+//
+//   The gap between BM_ReaderOpenOnly and BM_ReaderOpenToReady is the
+//   "how much chunk-0 work is already visible to the caller" -- low when
+//   the OS file cache is warm, larger on first open.
+//   BM_ReaderOpenToFirstFrame measures the same path a 0.1-style caller
+//   still takes (no explicit wait); it should match BM_ReaderOpenToReady
+//   closely because GetFrameSync falls through to the same sync path
+//   when the async load is not yet in cache.
+//
+// Fixture
+//   synth_10k.vtx, same fixture as bench_reader.cpp.  VTX_BENCH_FIXTURES_DIR
+//   is set by benchmarks/CMakeLists.txt via target_compile_definitions.
+
+#include "vtx/common/vtx_logger.h"
+#include "vtx/reader/core/vtx_reader_facade.h"
+
+#include "bench_utils.h"
+
+#include <benchmark/benchmark.h>
+
+#include <chrono>
+#include <string>
+
+namespace {
+
+    std::string FixturePath(const char* name) {
+        return std::string(VTX_BENCH_FIXTURES_DIR) + "/" + name;
+    }
+
+    struct SilenceDebugLogsAtInit {
+        SilenceDebugLogsAtInit() { VTX::Logger::Instance().SetDebugEnabled(false); }
+    };
+    const SilenceDebugLogsAtInit silence_debug_logs_at_init {};
+
+} // namespace
+
+// Baseline: just open the file and immediately drop the context.  Measures
+// the synchronous cost on the calling thread -- header + footer parse,
+// property-address cache build, seek-table ingestion, plus the one-shot
+// std::async spawn for the eager chunk-0 warm.  Should be sub-millisecond.
+static void BM_ReaderOpenOnly(benchmark::State& state) {
+    const std::string path = FixturePath("synth_10k.vtx");
+    VtxBench::WarmFileCache(path);
+
+    for (auto _ : state) {
+        auto ctx = VTX::OpenReplayFile(path);
+        if (!ctx) {
+            state.SkipWithError("OpenReplayFile failed");
+            break;
+        }
+        benchmark::DoNotOptimize(ctx.reader.get());
+        // ctx goes out of scope here: reader dtor cancels the in-flight
+        // chunk-0 load, so the measured cost here does not pay the
+        // decompress.  That is intentional -- this benchmark isolates
+        // the synchronous open path.
+    }
+}
+BENCHMARK(BM_ReaderOpenOnly)->Unit(benchmark::kMicrosecond);
+
+// OpenReplayFile + WaitUntilReady.  Measures the end-to-end "file is
+// fully usable" latency, i.e. open + chunk-0 ZSTD decompress + FB /
+// protobuf deserialise, serialised onto the calling thread via the cv
+// wait.  This is the number to quote as "time to first frame".
+static void BM_ReaderOpenToReady(benchmark::State& state) {
+    const std::string path = FixturePath("synth_10k.vtx");
+    VtxBench::WarmFileCache(path);
+
+    for (auto _ : state) {
+        auto ctx = VTX::OpenReplayFile(path);
+        if (!ctx) {
+            state.SkipWithError("OpenReplayFile failed");
+            break;
+        }
+        const bool ready = ctx.WaitUntilReady(std::chrono::seconds(5));
+        if (!ready) {
+            state.SkipWithError("WaitUntilReady timed out");
+            break;
+        }
+        benchmark::DoNotOptimize(ctx.IsReady());
+    }
+}
+BENCHMARK(BM_ReaderOpenToReady)->Unit(benchmark::kMicrosecond);
+
+// OpenReplayFile + GetFrameSync(0).  Mirrors what a pre-ready-API caller
+// would do: no explicit ready wait, just ask for frame 0.  Under the
+// eager-warm pipeline GetFrameSync either hits the cache (async worker
+// finished first) or falls through to the sync load path; the caller
+// sees a non-null Frame* in both cases.  The measured cost is dominated
+// by ZSTD + deserialise, same as BM_ReaderOpenToReady -- by design, the
+// two should track each other within noise.
+static void BM_ReaderOpenToFirstFrame(benchmark::State& state) {
+    const std::string path = FixturePath("synth_10k.vtx");
+    VtxBench::WarmFileCache(path);
+
+    for (auto _ : state) {
+        auto ctx = VTX::OpenReplayFile(path);
+        if (!ctx) {
+            state.SkipWithError("OpenReplayFile failed");
+            break;
+        }
+        const VTX::Frame* f = ctx.reader->GetFrameSync(0);
+        if (!f) {
+            state.SkipWithError("GetFrameSync(0) returned null");
+            break;
+        }
+        benchmark::DoNotOptimize(f);
+    }
+}
+BENCHMARK(BM_ReaderOpenToFirstFrame)->Unit(benchmark::kMicrosecond);
diff --git a/samples/CMakeLists.txt b/samples/CMakeLists.txt
@@ -56,6 +56,11 @@ add_executable(vtx_sample_diff basic_diff.cpp)
 target_link_libraries(vtx_sample_diff PRIVATE vtx_reader vtx_differ)
 vtx_configure_sample(vtx_sample_diff)
 
+# --- ready_api (chunk-0 "ready" signalling: poll / block / callback) ---
+add_executable(vtx_sample_ready_api ready_api.cpp)
+target_link_libraries(vtx_sample_ready_api PRIVATE vtx_reader)
+vtx_configure_sample(vtx_sample_ready_api)
+
 # ==============================================================================
 # Arena-specific codegen (arena_data.proto + arena_data.fbs)
 #

diff --git a/samples/ready_api.cpp b/samples/ready_api.cpp
@@ -0,0 +1,213 @@
+// ready_api.cpp -- Demonstrates the three consumption styles for the
+// reader's chunk-0 "ready" signal introduced with the eager-warm change.
+//
+// Purpose
+//   After OpenReplayFile() returns, an async load of chunk 0 is already in
+//   flight.  The sample shows three ways a caller can wait for that load
+//   to complete before the first GetFrame* call, plus how to observe a
+//   failed load.
+//
+//   Style A -- Blocking wait with timeout    (simplest)
+//   Style B -- Polling loop                   (useful when you have other
+//                                              work to interleave, e.g. UI)
+//   Style C -- Callback (OnReady / OnReadyFailed)
+//                                              (reactive / pre-wired)
+//
+// Default input
+//   content/reader/arena/arena_from_fbs_ds.vtx
+//
+//   (same file vtx_sample_read uses).  Any .vtx path can be passed as
+//   argv[1] instead.
+//
+// Build
+//   Link against vtx_reader (vtx_common is transitive).  See
+//   samples/CMakeLists.txt.
+
+#include "vtx/reader/core/vtx_reader_facade.h"
+#include "vtx/common/vtx_logger.h"
+#include "vtx/common/vtx_types.h"
+
+#include <atomic>
+#include <chrono>
+#include <cstring>
+#include <string>
+#include <thread>
+
+namespace {
+
+    // --- Style A ---------------------------------------------------------
+    // Block the current thread (with a deadline) until chunk 0 is ready
+    // or the load fails.  WaitUntilReady(timeout) returns IsReady().
+    int RunBlockingStyle(const std::string& path) {
+        VTX_INFO("--- Style A: WaitUntilReady with 5s timeout ---");
+
+        auto ctx = VTX::OpenReplayFile(path);
+        if (!ctx) {
+            VTX_ERROR("OpenReplayFile failed: {}", ctx.error);
+            return 1;
+        }
+
+        const auto t0 = std::chrono::steady_clock::now();
+        const bool ready = ctx.WaitUntilReady(std::chrono::seconds(5));
+        const auto elapsed_ms =
+            std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::steady_clock::now() - t0).count();
+
+        if (!ready) {
+            if (ctx.IsReadyFailed()) {
+                VTX_ERROR("Chunk 0 failed after {} ms: {}", elapsed_ms, ctx.GetReadyError());
+            } else {
+                VTX_ERROR("Chunk 0 not ready after {} ms (timeout)", elapsed_ms);
+            }
+            return 1;
+        }
+
+        VTX_INFO("Ready after {} ms. Total frames: {}", elapsed_ms, ctx.reader->GetTotalFrames());
+
+        // First frame access now hits the warm cache.
+        const VTX::Frame* first = ctx.reader->GetFrameSync(0);
+        VTX_INFO("Frame 0 buckets: {}", first ? first->GetBuckets().size() : 0);
+        return 0;
+    }
+
+    // --- Style B ---------------------------------------------------------
+    // Poll IsReady() in a loop while doing other work.  Good fit for UI
+    // event loops that want to update a spinner / progress bar while the
+    // reader warms up, without committing a whole thread to blocking.
+    int RunPollingStyle(const std::string& path) {
+        VTX_INFO("--- Style B: Polling IsReady() with UI-tick cadence ---");
+
+        auto ctx = VTX::OpenReplayFile(path);
+        if (!ctx) {
+            VTX_ERROR("OpenReplayFile failed: {}", ctx.error);
+            return 1;
+        }
+
+        constexpr auto kTick = std::chrono::milliseconds(16); // ~60 Hz
+        const auto deadline = std::chrono::steady_clock::now() + std::chrono::seconds(5);
+        int ticks = 0;
+
+        while (!ctx.IsReady() && !ctx.IsReadyFailed()) {
+            if (std::chrono::steady_clock::now() >= deadline) {
+                VTX_ERROR("Timed out after {} polls", ticks);
+                return 1;
+            }
+            // Imagine the UI advancing a spinner frame here.
+            ++ticks;
+            std::this_thread::sleep_for(kTick);
+        }
+
+        if (ctx.IsReadyFailed()) {
+            VTX_ERROR("Chunk 0 failed after {} polls: {}", ticks, ctx.GetReadyError());
+            return 1;
+        }
+
+        VTX_INFO("Ready after {} polls (~{} ms). Total frames: {}", ticks, ticks * 16, ctx.reader->GetTotalFrames());
+        return 0;
+    }
+
+    // --- Style C ---------------------------------------------------------
+    // Pre-wire OnReady / OnReadyFailed on a direct facade, then trigger
+    // the warm ourselves.  This is the path to use when you want the
+    // callback to run exactly once from the async worker thread without
+    // any chance of a race with OpenReplayFile's own event wiring.
+    //
+    // Under the OpenReplayFile() flow the context's chunk-state events
+    // are wired internally before WarmAt(0) fires, so user callbacks
+    // registered AFTER OpenReplayFile() returns may miss the single-shot
+    // signal (it's already fired).  Driving the facade directly avoids
+    // that race.
+    int RunCallbackStyle(const std::string& path) {
+        VTX_INFO("--- Style C: Pre-wired OnReady / OnReadyFailed ---");
+
+        auto facade = VTX::CreateFlatBuffersFacade(path);
+        if (!facade) {
+            VTX_ERROR("CreateFlatBuffersFacade failed for: {}", path);
+            return 1;
+        }
+
+        std::atomic<bool> done {false};
+        std::atomic<bool> succeeded {false};
+
+        VTX::ReplayReaderEvents events;
+        events.OnReady = [&]() {
+            VTX_INFO("[callback] OnReady fired (from worker thread)");
+            succeeded.store(true);
+            done.store(true);
+        };
+        events.OnReadyFailed = [&](const std::string& err) {
+            VTX_ERROR("[callback] OnReadyFailed: {}", err);
+            done.store(true);
+        };
+        facade->SetEvents(events);
+
+        // Kick off the async warm.  Returns immediately.
+        facade->WarmAt(0);
+
+        // Wait for either callback to fire.  In a real app you would
+        // not spin -- you'd let the event loop run and handle the
+        // callback when it lands.
+        const auto deadline = std::chrono::steady_clock::now() + std::chrono::seconds(5);
+        while (!done.load() && std::chrono::steady_clock::now() < deadline) {
+            std::this_thread::sleep_for(std::chrono::milliseconds(5));
+        }
+
+        if (!done.load()) {
+            VTX_ERROR("Callback did not fire within 5s");
+            return 1;
+        }
+        if (!succeeded.load()) {
+            return 1;
+        }
+
+        VTX_INFO("Total frames: {}", facade->GetTotalFrames());
+        return 0;
+    }
+
+    void PrintUsage(const char* exe) {
+        VTX_INFO("Usage: {} [--style=a|b|c|all] [replay.vtx]", exe);
+        VTX_INFO("  --style=a    Blocking WaitUntilReady (default)");
+        VTX_INFO("  --style=b    Polling loop");
+        VTX_INFO("  --style=c    Pre-wired callback on direct facade");
+        VTX_INFO("  --style=all  Run all three styles in sequence");
+    }
+
+} // namespace
+
+int main(int argc, char* argv[]) {
+    const char* style = "a";
+    std::string path = "content/reader/arena/arena_from_fbs_ds.vtx";
+
+    for (int i = 1; i < argc; ++i) {
+        const char* arg = argv[i];
+        if (std::strncmp(arg, "--style=", 8) == 0) {
+            style = arg + 8;
+        } else if (std::strcmp(arg, "--help") == 0 || std::strcmp(arg, "-h") == 0) {
+            PrintUsage(argv[0]);
+            return 0;
+        } else {
+            path = arg;
+        }
+    }
+
+    VTX_INFO("Reading: {}", path);
+
+    if (std::strcmp(style, "a") == 0)
+        return RunBlockingStyle(path);
+    if (std::strcmp(style, "b") == 0)
+        return RunPollingStyle(path);
+    if (std::strcmp(style, "c") == 0)
+        return RunCallbackStyle(path);
+    if (std::strcmp(style, "all") == 0) {
+        int rc = RunBlockingStyle(path);
+        if (rc != 0)
+            return rc;
+        rc = RunPollingStyle(path);
+        if (rc != 0)
+            return rc;
+        return RunCallbackStyle(path);
+    }
+
+    VTX_ERROR("Unknown --style value: {}", style);
+    PrintUsage(argv[0]);
+    return 2;
+}