Phase0 6/bench harness tiers by thorrester · Pull Request #300 · demml/scouter

thorrester · 2026-05-14T01:55:15Z

Pull Request

Short Summary

Adds a tiered OLAP benchmark harness with committed Tier 0 baseline artifacts and wires Phase 0 observability instrumentation through the full trace query stack — object store → Delta table lifecycle → DataFusion query pipeline → HTTP route handlers. The harness is the mechanism that makes the instrumentation measurable at PR time.

Context

Tiered benchmark harness (benches/tiers.rs, 1077 lines)

New BenchTier enum (Tier0/1/2) gates every benchmark group via tier_guard_for(bench, group), which reads SCOUTER_BENCH_TIER from env. Three make targets drive the tiers:

make bench.core — Tier 0, PR-blocking smoke suite (~50–120s per group)
make bench.extended — Tier 1, CI-gated extended load tests
make bench.certification — Tier 2, full certification against object storage

Each benchmark binary that was previously monolithic now calls tier_guard_for at the top of every group function, so a Tier 0 run only executes T0 groups and exits fast. Criterion still does the measuring; the tier system just controls which groups run and writes JSON artifacts to bench_metrics/.

Committed Tier 0 baseline artifacts (bench_metrics/*.json)

Four JSON files are checked in as the initial Tier 0 baseline:

Artifact	Entrypoint	p50 end-to-end	Result rows
`t0_bifrost_smoke`	`dataset_engine_manager.query`	927 µs	256
`t0_cold_query_smoke`	`trace_query_service.query_spans`	3,498 µs	5
`t0_hot_path_cold_query_smoke`	`trace_query_service.query_spans`	4,059 µs	5
`t0_refresh_origin_sentinel`	—	—	0 (guards refresh accounting)

make bench.core also runs bench_compare, a binary that loads these committed artifacts and fails if the current run regresses on bench.query.end_to_end or violates object-store operation counts. The sentinel exists specifically to assert that the refresh-on-request path produces zero LIST calls during normal query execution.

Object store observability (parquet/utils.rs, ~480 new lines)

ObjectStoreRequestTelemetry is the production instrumentation primitive. Every object store call now gets an object_store.request span with:

object_store.operation — list, get, get_range, head, put, delete, copy
object_store.path_kind — delta_log, parquet_data, checkpoint, unknown
object_store.backend — local, s3, gcs, azure, cache
object_store.status — ok, error
object_store.cache.hit — true/false/unknown

Three Prometheus counters accompany the spans: scouter_trace_object_store_requests_total, scouter_trace_object_store_request_duration_ms, scouter_trace_object_store_bytes_total.

observe_object_meta_stream wraps the lazy list() stream so object metadata is counted as it arrives, not after collection.

CachingStore instrumentation (caching_store.rs)

CachingStore previously passed through to the inner store silently. Every method (put_opts, put_multipart_opts, get_opts, get_range, delete_stream, list, list_with_delimiter, copy) now wraps with ObjectStoreRequestTelemetry, including the cache-hit path which records cache.hit = true without an inner span.

Delta table lifecycle spans (engine.rs)

Five named spans now wrap the Delta table operations that were previously invisible:

Span	When
`delta.table.load`	Table probe at startup, existing-table open
`delta.snapshot.refresh`	`refresh_table()` background tick
`update_incremental`	Delta log catch-up inside refresh
`delta.catalog.swap`	Provider swap after write, optimize, vacuum, expire, refresh
`delta.optimize`	ZOrder compaction

Query pipeline spans (queries.rs)

TraceQueryBuilder operations now emit spans that expose where query time goes:

df.table.resolve → df.logical.build → df.physical.plan → df.collect → arrow.convert → trace.tree.build

All carry endpoint and table attributes so you can filter by query type in Jaeger or Grafana.

Phase 0 observability contract (scouter_tracing/src/tracer.rs)

phase0_observability is a new public module that centralizes route constants, span name constants, and attribute key constants. A PHASE0_SPAN_NAMES BTreeSet constant and two contract tests (phase0_span_names_are_complete_and_unique, phase0_route_contract_preserves_in_scope_trace_endpoints) will fail at compile/test time if any name is renamed or dropped without updating the contract.

HTTP route spans (trace/route.rs)

All five trace handlers (paginated_traces, get_trace_spans, get_trace_spans_by_id, trace_metrics, v1_otel_traces) now declare 17 Phase 0 span fields upfront (trace.query.endpoint, trace.query.kind, trace.query.window_ms, trace.query.result.rows, trace.query.cache.hit, etc.) and call record_trace_query_common + record_trace_query_result to fill them at runtime.

Minor fixes

TransportConfig::is_mock() replaces the SCOUTER_OFFLINE env-var check in py_queue.rs — the offline guard was incorrectly tied to an env var rather than the configured transport type.
AgentEvalProfile::reset_workflow_agents() extracts the workflow reset logic so py_queue.rs can call it without reaching into the workflow field directly.

File	Change
`benches/tiers.rs`	New tiered harness: `BenchTier`, `tier_guard_for`, JSON artifact writer, `bench_compare` entry point
`bench_metrics/*.json`	Four committed Tier 0 baseline artifacts
`benches/counting_object_store.rs`	`ObjectStoreCounts` + `CountingObjectStore` for bench artifact output
`benches/dataset_benchmark.rs`	Tier guards + `bench_t0_bifrost_smoke`
`benches/hot_path_bench.rs`	Tier guards + `benchmark_t0_cold_query_smoke`, `seed_small_fixture`
`benches/trace_service_benchmark.rs`	Tier guards on all four existing groups
`benches/planner_bench.rs`, `session_config_bench.rs`, `stress_test.rs`	Tier guards
`src/parquet/utils.rs`	`ObjectStoreRequestTelemetry`, `ObservingObjectStore`, `observe_object_meta_stream`, path classifier, Prometheus counters
`src/caching_store.rs`	Full `ObjectStore` method instrumentation via `ObjectStoreRequestTelemetry`
`src/parquet/tracing/engine.rs`	Delta lifecycle spans: load, refresh, swap, optimize
`src/parquet/tracing/queries.rs`	Query pipeline spans: table resolve → collect → arrow convert → tree build
`src/parquet/tracing/summary.rs`	Summary path span coverage
`crates/scouter_tracing/src/tracer.rs`	`phase0_observability` contract module + contract tests
`crates/scouter_server/src/api/routes/trace/route.rs`	Phase 0 span fields on all five handlers, `record_trace_query_common/result`
`crates/scouter_events/src/queue/py_queue.rs`	`is_mock()` replaces `SCOUTER_OFFLINE` check
`crates/scouter_events/src/queue/types.rs`	`TransportConfig::is_mock()`
`crates/scouter_types/src/agent/profile.rs`	`AgentEvalProfile::reset_workflow_agents()`
`makefile`	`bench.core`, `bench.extended`, `bench.certification` targets

Is this a Breaking Change?

No. All new observability is purely additive — new spans, new Prometheus counters, new span attributes. Existing HTTP response shapes, public Rust API signatures, Python bindings, database schema, and config keys are unchanged.

codecov-commenter · 2026-05-14T02:36:35Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.57%. Comparing base (597d5d7) to head (420df65).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #300   +/-   ##
=======================================
  Coverage   76.57%   76.57%           
=======================================
  Files          26       26           
  Lines         918      918           
=======================================
  Hits          703      703           
  Misses        215      215

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This reverts commit a375b30.

This reverts commit a375b30. (cherry picked from commit f294366)

thorrester added 5 commits May 13, 2026 17:22

Add phase 0 observability contract (#297)

7b19ab3

Add phase0 trace query spans (#299)

94fa74c

Add object store observability wrapper (#298)

4116932

Add tiered OLAP benchmark harness

553a607

updating bench

420df65

thorrester marked this pull request as ready for review May 14, 2026 10:39

thorrester merged commit a375b30 into main May 14, 2026
20 of 21 checks passed

thorrester added a commit that referenced this pull request May 14, 2026

Revert "Phase0 6/bench harness tiers (#300)"

f294366

This reverts commit a375b30.

thorrester added a commit that referenced this pull request May 14, 2026

Revert "Phase0 6/bench harness tiers (#300)"

c908915

This reverts commit a375b30. (cherry picked from commit f294366)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase0 6/bench harness tiers#300

Phase0 6/bench harness tiers#300
thorrester merged 5 commits into
mainfrom
phase0_6/bench-harness-tiers

thorrester commented May 14, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented May 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thorrester commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Short Summary

Context

Is this a Breaking Change?

Uh oh!

codecov-commenter commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thorrester commented May 14, 2026 •

edited

Loading

codecov-commenter commented May 14, 2026 •

edited

Loading