Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added

- **scripts**: `scripts/release_sdk.sh` -- Linux/macOS counterpart to `scripts/release_sdk.bat`. Builds the SDK libs + `vtx_cli` in Release mode and installs into `./dist`. Removes the build/release script asymmetry between Windows and Linux
- **reader/api**: `ReaderContext::IsReady()`, `IsReadyFailed()`, `GetReadyError()`, `WaitUntilReady()` + `WaitUntilReady(std::chrono::milliseconds)` for explicit "first chunk in RAM" signalling, plus new `ReplayReaderEvents::OnReady` / `OnReadyFailed` callbacks. Previously `ReaderContext::Loaded()` flipped to `true` the instant `OpenReplayFile()` returned -- header and footer parsed, property-address cache built, seek table ready, but zero chunks decompressed in RAM. The first `GetFrameSync()` call still paid the full ZSTD + deserialise cost synchronously, and the Inspector already carried a redundant `is_file_loaded_` flag alongside `Loaded()` to paper over the gap (`tools/inspector/include/inspector_session.h:25`). Now `OpenReplayFile()` eagerly kicks off an async load of chunk 0 as part of opening (via the existing `WarmAt(0)` / `UpdateCacheWindow` pipeline; empty 0-frame replays flip the flag vacuously through a new `MarkReadyVacuous()` facade hook so waiters never hang). Callers consume the signal in whichever style they prefer: poll (`while (!ctx.IsReady()) ...`), block (`ctx.WaitUntilReady(2s)`), or register a callback (`OnReady` / `OnReadyFailed` fire exactly once each, single-shot guarded under `ready_mutex_` so racing async + sync load paths cannot double-fire). Failure semantics: a corrupt or unreadable chunk 0 does NOT fail `OpenReplayFile()` itself -- the reader is still constructed, `IsReadyFailed()` returns `true`, `GetReadyError()` carries the message, and downstream `GetFrame*()` calls behave as before (return `nullptr` / empty). The header-parsed-ok-but-chunk-zero-broken state stays useful to inspector-style tools that want to show partial file info. Destructor best-effort unblocks any waiter by flipping `ready_failed_` + notifying the condition variable under `ready_mutex_`; callers remain responsible for joining their waiter threads before destroying the `ReaderContext` (C++ standard requires no blocked waiters at condition-variable destruction time)
- **tests**: six new cases in `tests/reader/test_reader_context.cpp` under "§READY: chunk-0 ready signalling". `ReaderContextHappy.ReadyFlipsWithinTimeoutOnValidReplay` asserts `WaitUntilReady(5s)` returns `true` on a well-formed replay; `ReadyIsStableAcrossRepeatedQueries` pins the terminal-state stability guarantee; `WaitUntilReadyIsIdempotent` asserts repeated calls after ready return immediately; `ReaderContextReady.OnReadyFiresOnDirectFacadeWithPreWiredEvents` uses `CreateFlatBuffersFacade()` directly, wires events before `WarmAt(0)`, and polls an atomic counter to verify single-shot firing; `ReadyIsVacuousForZeroFrameReplay` exercises the `MarkReadyVacuous` path with a `GTEST_SKIP` fallback if the writer refuses a 0-frame replay; `ReadyFailsOnCorruptChunkZero` writes a valid file then overwrites its middle third with `0xFF` bytes and verifies `WaitUntilReady` returns `false` + `IsReadyFailed()` + non-empty `GetReadyError()`. No destruction-race test: destroying `std::condition_variable` / `std::mutex` while waiters are blocked is UB per the standard, so the API contract is "join waiters before destroying" and the dtor's `notify_all` is best-effort only

### Changed

- **repo layout**: all five build/clean/release wrappers moved from the repo root into `scripts/` (`build_sdk.bat`, `build_sdk.sh`, `clean.bat`, `clean.sh`, `release_sdk.bat`). Each script now `cd`s to the repo root internally so invocations like `./scripts/build_sdk.sh` or `scripts\build_sdk.bat` work from any working directory. Documentation references (README, CONTRIBUTING, docs/BUILD.md) updated accordingly
- **repo layout**: `reports/benchmarks/` renamed to `docs/benchmarks/` to signal that the committed baseline outputs are reference documentation (co-located with `docs/PERFORMANCE.md` which narrates them) rather than stray CI artefacts. `reports/` directory removed. References in `docs/PERFORMANCE.md`, `docs/BUILD.md`, and the benchmark write-ups updated
- **reader/api**: `OpenReplayFile()` now triggers an eager prefetch of chunk 0 via the existing async pipeline before returning. Open latency on the calling thread is unchanged because the load runs on the same background thread `WarmAt` / `UpdateCacheWindow` already dispatches to; the prior "first `GetFrame*()` is slow" cost is moved off the first access onto the open-time spawn path (same total work, just overlapped with caller init). Only chunk 0 is warmed -- the facade temporarily narrows the cache window to `(0, 0)` around the warm call and restores it to the default `(2, 2)` immediately after, so callers that set a narrow window right after `OpenReplayFile()` (memory-constrained tools, tests that isolate a single chunk) observe exactly the cache contents they asked for. `ReaderContext::Loaded()` semantics are unchanged: still means "reader object exists". New concept is `IsReady()` == "chunk 0 decompressed and deserialised in RAM"

## [0.1.0] - 2026-04-24

Expand Down
1 change: 1 addition & 0 deletions benchmarks/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ endif()
# Initialize + RunSpecifiedBenchmarks + Shutdown.
add_executable(vtx_benchmarks
bench_reader.cpp
bench_reader_ready.cpp
bench_writer.cpp
bench_differ.cpp
bench_property_cache.cpp
Expand Down
122 changes: 122 additions & 0 deletions benchmarks/bench_reader_ready.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
// VTX SDK -- reader "ready" benchmarks.
//
// What the eager-chunk-0 warm changes
// OpenReplayFile() used to return as soon as header + footer were parsed;
// the first GetFrame* call then paid the full ZSTD decompress +
// deserialise cost synchronously. Now OpenReplayFile() kicks off an
// async load of chunk 0 as part of opening, so the decompress runs on a
// background thread and typically overlaps with caller initialisation.
//
// Scenarios
// BM_ReaderOpenOnly OpenReplayFile + return (no wait)
// BM_ReaderOpenToReady OpenReplayFile + WaitUntilReady
// BM_ReaderOpenToFirstFrame OpenReplayFile + GetFrameSync(0)
//
// The gap between BM_ReaderOpenOnly and BM_ReaderOpenToReady is the
// "how much chunk-0 work is already visible to the caller" -- low when
// the OS file cache is warm, larger on first open.
// BM_ReaderOpenToFirstFrame measures the same path a 0.1-style caller
// still takes (no explicit wait); it should match BM_ReaderOpenToReady
// closely because GetFrameSync falls through to the same sync path
// when the async load is not yet in cache.
//
// Fixture
// synth_10k.vtx, same fixture as bench_reader.cpp. VTX_BENCH_FIXTURES_DIR
// is set by benchmarks/CMakeLists.txt via target_compile_definitions.

#include "vtx/common/vtx_logger.h"
#include "vtx/reader/core/vtx_reader_facade.h"

#include "bench_utils.h"

#include <benchmark/benchmark.h>

#include <chrono>
#include <string>

namespace {

std::string FixturePath(const char* name) {
return std::string(VTX_BENCH_FIXTURES_DIR) + "/" + name;
}

struct SilenceDebugLogsAtInit {
SilenceDebugLogsAtInit() { VTX::Logger::Instance().SetDebugEnabled(false); }
};
const SilenceDebugLogsAtInit silence_debug_logs_at_init {};

} // namespace

// Baseline: just open the file and immediately drop the context. Measures
// the synchronous cost on the calling thread -- header + footer parse,
// property-address cache build, seek-table ingestion, plus the one-shot
// std::async spawn for the eager chunk-0 warm. Should be sub-millisecond.
static void BM_ReaderOpenOnly(benchmark::State& state) {
const std::string path = FixturePath("synth_10k.vtx");
VtxBench::WarmFileCache(path);

for (auto _ : state) {
auto ctx = VTX::OpenReplayFile(path);
if (!ctx) {
state.SkipWithError("OpenReplayFile failed");
break;
}
benchmark::DoNotOptimize(ctx.reader.get());
// ctx goes out of scope here: reader dtor cancels the in-flight
// chunk-0 load, so the measured cost here does not pay the
// decompress. That is intentional -- this benchmark isolates
// the synchronous open path.
}
}
BENCHMARK(BM_ReaderOpenOnly)->Unit(benchmark::kMicrosecond);

// OpenReplayFile + WaitUntilReady. Measures the end-to-end "file is
// fully usable" latency, i.e. open + chunk-0 ZSTD decompress + FB /
// protobuf deserialise, serialised onto the calling thread via the cv
// wait. This is the number to quote as "time to first frame".
static void BM_ReaderOpenToReady(benchmark::State& state) {
const std::string path = FixturePath("synth_10k.vtx");
VtxBench::WarmFileCache(path);

for (auto _ : state) {
auto ctx = VTX::OpenReplayFile(path);
if (!ctx) {
state.SkipWithError("OpenReplayFile failed");
break;
}
const bool ready = ctx.WaitUntilReady(std::chrono::seconds(5));
if (!ready) {
state.SkipWithError("WaitUntilReady timed out");
break;
}
benchmark::DoNotOptimize(ctx.IsReady());
}
}
BENCHMARK(BM_ReaderOpenToReady)->Unit(benchmark::kMicrosecond);

// OpenReplayFile + GetFrameSync(0). Mirrors what a pre-ready-API caller
// would do: no explicit ready wait, just ask for frame 0. Under the
// eager-warm pipeline GetFrameSync either hits the cache (async worker
// finished first) or falls through to the sync load path; the caller
// sees a non-null Frame* in both cases. The measured cost is dominated
// by ZSTD + deserialise, same as BM_ReaderOpenToReady -- by design, the
// two should track each other within noise.
static void BM_ReaderOpenToFirstFrame(benchmark::State& state) {
const std::string path = FixturePath("synth_10k.vtx");
VtxBench::WarmFileCache(path);

for (auto _ : state) {
auto ctx = VTX::OpenReplayFile(path);
if (!ctx) {
state.SkipWithError("OpenReplayFile failed");
break;
}
const VTX::Frame* f = ctx.reader->GetFrameSync(0);
if (!f) {
state.SkipWithError("GetFrameSync(0) returned null");
break;
}
benchmark::DoNotOptimize(f);
}
}
BENCHMARK(BM_ReaderOpenToFirstFrame)->Unit(benchmark::kMicrosecond);
5 changes: 5 additions & 0 deletions samples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,11 @@ add_executable(vtx_sample_diff basic_diff.cpp)
target_link_libraries(vtx_sample_diff PRIVATE vtx_reader vtx_differ)
vtx_configure_sample(vtx_sample_diff)

# --- ready_api (chunk-0 "ready" signalling: poll / block / callback) ---
add_executable(vtx_sample_ready_api ready_api.cpp)
target_link_libraries(vtx_sample_ready_api PRIVATE vtx_reader)
vtx_configure_sample(vtx_sample_ready_api)

# ==============================================================================
# Arena-specific codegen (arena_data.proto + arena_data.fbs)
#
Expand Down
213 changes: 213 additions & 0 deletions samples/ready_api.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
// ready_api.cpp -- Demonstrates the three consumption styles for the
// reader's chunk-0 "ready" signal introduced with the eager-warm change.
//
// Purpose
// After OpenReplayFile() returns, an async load of chunk 0 is already in
// flight. The sample shows three ways a caller can wait for that load
// to complete before the first GetFrame* call, plus how to observe a
// failed load.
//
// Style A -- Blocking wait with timeout (simplest)
// Style B -- Polling loop (useful when you have other
// work to interleave, e.g. UI)
// Style C -- Callback (OnReady / OnReadyFailed)
// (reactive / pre-wired)
//
// Default input
// content/reader/arena/arena_from_fbs_ds.vtx
//
// (same file vtx_sample_read uses). Any .vtx path can be passed as
// argv[1] instead.
//
// Build
// Link against vtx_reader (vtx_common is transitive). See
// samples/CMakeLists.txt.

#include "vtx/reader/core/vtx_reader_facade.h"
#include "vtx/common/vtx_logger.h"
#include "vtx/common/vtx_types.h"

#include <atomic>
#include <chrono>
#include <cstring>
#include <string>
#include <thread>

namespace {

// --- Style A ---------------------------------------------------------
// Block the current thread (with a deadline) until chunk 0 is ready
// or the load fails. WaitUntilReady(timeout) returns IsReady().
int RunBlockingStyle(const std::string& path) {
VTX_INFO("--- Style A: WaitUntilReady with 5s timeout ---");

auto ctx = VTX::OpenReplayFile(path);
if (!ctx) {
VTX_ERROR("OpenReplayFile failed: {}", ctx.error);
return 1;
}

const auto t0 = std::chrono::steady_clock::now();
const bool ready = ctx.WaitUntilReady(std::chrono::seconds(5));
const auto elapsed_ms =
std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::steady_clock::now() - t0).count();

if (!ready) {
if (ctx.IsReadyFailed()) {
VTX_ERROR("Chunk 0 failed after {} ms: {}", elapsed_ms, ctx.GetReadyError());
} else {
VTX_ERROR("Chunk 0 not ready after {} ms (timeout)", elapsed_ms);
}
return 1;
}

VTX_INFO("Ready after {} ms. Total frames: {}", elapsed_ms, ctx.reader->GetTotalFrames());

// First frame access now hits the warm cache.
const VTX::Frame* first = ctx.reader->GetFrameSync(0);
VTX_INFO("Frame 0 buckets: {}", first ? first->GetBuckets().size() : 0);
return 0;
}

// --- Style B ---------------------------------------------------------
// Poll IsReady() in a loop while doing other work. Good fit for UI
// event loops that want to update a spinner / progress bar while the
// reader warms up, without committing a whole thread to blocking.
int RunPollingStyle(const std::string& path) {
VTX_INFO("--- Style B: Polling IsReady() with UI-tick cadence ---");

auto ctx = VTX::OpenReplayFile(path);
if (!ctx) {
VTX_ERROR("OpenReplayFile failed: {}", ctx.error);
return 1;
}

constexpr auto kTick = std::chrono::milliseconds(16); // ~60 Hz
const auto deadline = std::chrono::steady_clock::now() + std::chrono::seconds(5);
int ticks = 0;

while (!ctx.IsReady() && !ctx.IsReadyFailed()) {
if (std::chrono::steady_clock::now() >= deadline) {
VTX_ERROR("Timed out after {} polls", ticks);
return 1;
}
// Imagine the UI advancing a spinner frame here.
++ticks;
std::this_thread::sleep_for(kTick);
}

if (ctx.IsReadyFailed()) {
VTX_ERROR("Chunk 0 failed after {} polls: {}", ticks, ctx.GetReadyError());
return 1;
}

VTX_INFO("Ready after {} polls (~{} ms). Total frames: {}", ticks, ticks * 16, ctx.reader->GetTotalFrames());
return 0;
}

// --- Style C ---------------------------------------------------------
// Pre-wire OnReady / OnReadyFailed on a direct facade, then trigger
// the warm ourselves. This is the path to use when you want the
// callback to run exactly once from the async worker thread without
// any chance of a race with OpenReplayFile's own event wiring.
//
// Under the OpenReplayFile() flow the context's chunk-state events
// are wired internally before WarmAt(0) fires, so user callbacks
// registered AFTER OpenReplayFile() returns may miss the single-shot
// signal (it's already fired). Driving the facade directly avoids
// that race.
int RunCallbackStyle(const std::string& path) {
VTX_INFO("--- Style C: Pre-wired OnReady / OnReadyFailed ---");

auto facade = VTX::CreateFlatBuffersFacade(path);
if (!facade) {
VTX_ERROR("CreateFlatBuffersFacade failed for: {}", path);
return 1;
}

std::atomic<bool> done {false};
std::atomic<bool> succeeded {false};

VTX::ReplayReaderEvents events;
events.OnReady = [&]() {
VTX_INFO("[callback] OnReady fired (from worker thread)");
succeeded.store(true);
done.store(true);
};
events.OnReadyFailed = [&](const std::string& err) {
VTX_ERROR("[callback] OnReadyFailed: {}", err);
done.store(true);
};
facade->SetEvents(events);

// Kick off the async warm. Returns immediately.
facade->WarmAt(0);

// Wait for either callback to fire. In a real app you would
// not spin -- you'd let the event loop run and handle the
// callback when it lands.
const auto deadline = std::chrono::steady_clock::now() + std::chrono::seconds(5);
while (!done.load() && std::chrono::steady_clock::now() < deadline) {
std::this_thread::sleep_for(std::chrono::milliseconds(5));
}

if (!done.load()) {
VTX_ERROR("Callback did not fire within 5s");
return 1;
}
if (!succeeded.load()) {
return 1;
}

VTX_INFO("Total frames: {}", facade->GetTotalFrames());
return 0;
}

void PrintUsage(const char* exe) {
VTX_INFO("Usage: {} [--style=a|b|c|all] [replay.vtx]", exe);
VTX_INFO(" --style=a Blocking WaitUntilReady (default)");
VTX_INFO(" --style=b Polling loop");
VTX_INFO(" --style=c Pre-wired callback on direct facade");
VTX_INFO(" --style=all Run all three styles in sequence");
}

} // namespace

int main(int argc, char* argv[]) {
const char* style = "a";
std::string path = "content/reader/arena/arena_from_fbs_ds.vtx";

for (int i = 1; i < argc; ++i) {
const char* arg = argv[i];
if (std::strncmp(arg, "--style=", 8) == 0) {
style = arg + 8;
} else if (std::strcmp(arg, "--help") == 0 || std::strcmp(arg, "-h") == 0) {
PrintUsage(argv[0]);
return 0;
} else {
path = arg;
}
}

VTX_INFO("Reading: {}", path);

if (std::strcmp(style, "a") == 0)
return RunBlockingStyle(path);
if (std::strcmp(style, "b") == 0)
return RunPollingStyle(path);
if (std::strcmp(style, "c") == 0)
return RunCallbackStyle(path);
if (std::strcmp(style, "all") == 0) {
int rc = RunBlockingStyle(path);
if (rc != 0)
return rc;
rc = RunPollingStyle(path);
if (rc != 0)
return rc;
return RunCallbackStyle(path);
}

VTX_ERROR("Unknown --style value: {}", style);
PrintUsage(argv[0]);
return 2;
}
Loading
Loading