Skip to content

v0.25.0

Latest

Choose a tag to compare

@thorrester thorrester released this 26 Mar 10:34
· 248 commits to main since this release
3c509f3

v0.25.0 Release Summary

What changed

This release fixes a race condition in the trace storage layer. The old deregister_table + register_table two-step left a window where concurrent queries would get "table not found." All TableProvider updates are now atomic swaps through a DashMap-backed custom catalog (TraceCatalogProvider). The summary engine also gets a refresh ticker so read pods pick up commits from the write pod without a restart.


Breaking changes

None. No schema changes, no migration required. The ctx field on TraceSpanDBEngine is now private; use the new ctx() method instead. This only matters if you're constructing or testing the engine directly outside the service layer.


Changes

Trace storage: atomic TableProvider swaps

The built-in DataFusion catalog (datafusion.public) was replaced with TraceCatalogProvider, backed by a DashMap. All engines (TraceSpanDBEngine, TraceSummaryDBEngine, bifrost) now call catalog.swap(table_name, provider) instead of the deregister/register two-step.

Before:

self.ctx.deregister_table(TRACE_SPAN_TABLE_NAME)?;
self.ctx.register_table(TRACE_SPAN_TABLE_NAME, updated_table.table_provider().await?)?;

After:

let new_provider = updated_table.table_provider().await?;
self.catalog.swap(TRACE_SPAN_TABLE_NAME, new_provider);

DashMap::insert() is atomic. Concurrent readers see either the old provider or the new one, never a gap between them.

TraceSpanDBEngine and TraceSummaryDBEngine share the same TraceCatalogProvider via Arc. The span engine creates it; the summary service gets it through TraceSpanService::catalog. JOIN queries between trace_spans and trace_summaries work because both tables are in the same catalog.

Trace storage: summary engine refresh ticker

TraceSummaryDBEngine now has a background refresh loop (same as the span engine) that calls update_incremental() on the Delta table and swaps the TableProvider when a new version is found.

Refresh interval is SCOUTER_TRACE_REFRESH_INTERVAL_SECS, already in the server config. Values below 1 second are clamped up; tokio::time::interval panics on Duration::ZERO.

test_distributed_refresh covers the two-pod case: writer commits summaries, reader with a 1s ticker picks them up in the next query.

DataFusion session construction: get_session_with_catalog

ObjectStore has a new get_session_with_catalog(catalog_name, schema_name) method. It sets the named catalog as the SQL default, so unqualified table names and ctx.table(name) calls resolve through it instead of datafusion.public.

get_session() is unchanged. build_session_config() is now a private helper shared by both paths.

Bifrost engine: torn-write fix on refresh

The bifrost refresh path was calling table_provider() twice: once to update the write context, once for the catalog swap. If the second call failed, the write context would be updated but the catalog would not. Now there's one call, the result is shared, and the write context is only deregistered if the fetch succeeds.


Upgrading from v0.24.0

Nothing to do. The catalog wires up at server startup. SCOUTER_TRACE_REFRESH_INTERVAL_SECS already controls both the span and summary refresh rates.