Phase0 6/bench harness tiers#300
Merged
Merged
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #300 +/- ##
=======================================
Coverage 76.57% 76.57%
=======================================
Files 26 26
Lines 918 918
=======================================
Hits 703 703
Misses 215 215 🚀 New features to boost your workflow:
|
thorrester
added a commit
that referenced
this pull request
May 14, 2026
This reverts commit a375b30.
thorrester
added a commit
that referenced
this pull request
May 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request
Short Summary
Adds a tiered OLAP benchmark harness with committed Tier 0 baseline artifacts and wires Phase 0 observability instrumentation through the full trace query stack — object store → Delta table lifecycle → DataFusion query pipeline → HTTP route handlers. The harness is the mechanism that makes the instrumentation measurable at PR time.
Context
Tiered benchmark harness (
benches/tiers.rs, 1077 lines)New
BenchTierenum (Tier0/1/2) gates every benchmark group viatier_guard_for(bench, group), which readsSCOUTER_BENCH_TIERfrom env. Three make targets drive the tiers:make bench.core— Tier 0, PR-blocking smoke suite (~50–120s per group)make bench.extended— Tier 1, CI-gated extended load testsmake bench.certification— Tier 2, full certification against object storageEach benchmark binary that was previously monolithic now calls
tier_guard_forat the top of every group function, so a Tier 0 run only executes T0 groups and exits fast. Criterion still does the measuring; the tier system just controls which groups run and writes JSON artifacts tobench_metrics/.Committed Tier 0 baseline artifacts (
bench_metrics/*.json)Four JSON files are checked in as the initial Tier 0 baseline:
t0_bifrost_smokedataset_engine_manager.queryt0_cold_query_smoketrace_query_service.query_spanst0_hot_path_cold_query_smoketrace_query_service.query_spanst0_refresh_origin_sentinelmake bench.corealso runsbench_compare, a binary that loads these committed artifacts and fails if the current run regresses onbench.query.end_to_endor violates object-store operation counts. The sentinel exists specifically to assert that therefresh-on-requestpath produces zeroLISTcalls during normal query execution.Object store observability (
parquet/utils.rs, ~480 new lines)ObjectStoreRequestTelemetryis the production instrumentation primitive. Every object store call now gets anobject_store.requestspan with:object_store.operation—list,get,get_range,head,put,delete,copyobject_store.path_kind—delta_log,parquet_data,checkpoint,unknownobject_store.backend—local,s3,gcs,azure,cacheobject_store.status—ok,errorobject_store.cache.hit—true/false/unknownThree Prometheus counters accompany the spans:
scouter_trace_object_store_requests_total,scouter_trace_object_store_request_duration_ms,scouter_trace_object_store_bytes_total.observe_object_meta_streamwraps the lazylist()stream so object metadata is counted as it arrives, not after collection.CachingStoreinstrumentation (caching_store.rs)CachingStorepreviously passed through to the inner store silently. Every method (put_opts,put_multipart_opts,get_opts,get_range,delete_stream,list,list_with_delimiter,copy) now wraps withObjectStoreRequestTelemetry, including the cache-hit path which recordscache.hit = truewithout an inner span.Delta table lifecycle spans (
engine.rs)Five named spans now wrap the Delta table operations that were previously invisible:
delta.table.loaddelta.snapshot.refreshrefresh_table()background tickupdate_incrementaldelta.catalog.swapdelta.optimizeQuery pipeline spans (
queries.rs)TraceQueryBuilderoperations now emit spans that expose where query time goes:df.table.resolve→df.logical.build→df.physical.plan→df.collect→arrow.convert→trace.tree.buildAll carry
endpointandtableattributes so you can filter by query type in Jaeger or Grafana.Phase 0 observability contract (
scouter_tracing/src/tracer.rs)phase0_observabilityis a new public module that centralizes route constants, span name constants, and attribute key constants. APHASE0_SPAN_NAMESBTreeSetconstant and two contract tests (phase0_span_names_are_complete_and_unique,phase0_route_contract_preserves_in_scope_trace_endpoints) will fail at compile/test time if any name is renamed or dropped without updating the contract.HTTP route spans (
trace/route.rs)All five trace handlers (
paginated_traces,get_trace_spans,get_trace_spans_by_id,trace_metrics,v1_otel_traces) now declare 17 Phase 0 span fields upfront (trace.query.endpoint,trace.query.kind,trace.query.window_ms,trace.query.result.rows,trace.query.cache.hit, etc.) and callrecord_trace_query_common+record_trace_query_resultto fill them at runtime.Minor fixes
TransportConfig::is_mock()replaces theSCOUTER_OFFLINEenv-var check inpy_queue.rs— the offline guard was incorrectly tied to an env var rather than the configured transport type.AgentEvalProfile::reset_workflow_agents()extracts the workflow reset logic sopy_queue.rscan call it without reaching into the workflow field directly.benches/tiers.rsBenchTier,tier_guard_for, JSON artifact writer,bench_compareentry pointbench_metrics/*.jsonbenches/counting_object_store.rsObjectStoreCounts+CountingObjectStorefor bench artifact outputbenches/dataset_benchmark.rsbench_t0_bifrost_smokebenches/hot_path_bench.rsbenchmark_t0_cold_query_smoke,seed_small_fixturebenches/trace_service_benchmark.rsbenches/planner_bench.rs,session_config_bench.rs,stress_test.rssrc/parquet/utils.rsObjectStoreRequestTelemetry,ObservingObjectStore,observe_object_meta_stream, path classifier, Prometheus counterssrc/caching_store.rsObjectStoremethod instrumentation viaObjectStoreRequestTelemetrysrc/parquet/tracing/engine.rssrc/parquet/tracing/queries.rssrc/parquet/tracing/summary.rscrates/scouter_tracing/src/tracer.rsphase0_observabilitycontract module + contract testscrates/scouter_server/src/api/routes/trace/route.rsrecord_trace_query_common/resultcrates/scouter_events/src/queue/py_queue.rsis_mock()replacesSCOUTER_OFFLINEcheckcrates/scouter_events/src/queue/types.rsTransportConfig::is_mock()crates/scouter_types/src/agent/profile.rsAgentEvalProfile::reset_workflow_agents()makefilebench.core,bench.extended,bench.certificationtargetsIs this a Breaking Change?
No. All new observability is purely additive — new spans, new Prometheus counters, new span attributes. Existing HTTP response shapes, public Rust API signatures, Python bindings, database schema, and config keys are unchanged.