Skip to content

v0.21.0

Choose a tag to compare

@thorrester thorrester released this 12 Mar 17:00
· 396 commits to main since this release

v0.21.0 Release Summary

What Changed

v0.21.0 adds a read-side caching layer over cloud object stores and tunes the DataFusion SessionContext for higher-concurrency GCS/S3 workloads. A bug where the trace summary table skipped vacuum after compaction is fixed. An internal record type rename is propagated across all crates.


Breaking Changes

None. No schema changes, no migration required.


Changes

Object store caching layer (CachingStore)

A new CachingStore<T: ObjectStore> wrapper in scouter_dataframe caches head() responses and small get_range() reads (≤2 MB) from cloud object stores.

After Delta Lake Z-ORDER compaction, Parquet files are immutable — the same path always returns the same bytes. DataFusion issues repeated HEAD + footer range reads on every query. Without caching, each read is a separate cloud round-trip (~30–60 ms on GCS). CachingStore eliminates these by serving repeated reads from an in-process mini_moka cache.

Cache configuration:

Setting Default Env var
Max cache size 64 MB SCOUTER_OBJECT_CACHE_MB
TTL 1 hour
Max cacheable range read 2 MB

All mutating and streaming operations (put, delete, list, get for large ranges) pass through to the inner store uncached.

DataFusion SessionContext tuning

The shared SessionContext used for trace queries now includes explicit read-path and write-path settings:

Setting Old New Why
metadata_size_hint 512 KB 1 MB Captures bloom filter + footer + column indexes in one GCS round-trip instead of the default multi-step chain
bloom_filter_on_read default true Activates bloom filters on trace_id and entity_id to skip non-matching row groups before decoding
schema_force_view_types default true Zero-copy Utf8View/BinaryView — prevents DataFusion from downgrading these on read-back from Parquet
meta_fetch_concurrency 32 64 Parallel HEAD stats during Delta log replay; matches pool_max_idle_per_host
maximum_parallel_row_group_writers default 4 Concurrent row group encoding during compaction and flush
maximum_buffered_record_batches_per_stream default 8 Smooths bursty reads from GCS

Connection pool tuning

Cloud object store HTTP client settings updated:

Setting Old New
pool_max_idle_per_host 16 64
pool_idle_timeout 90s 120s
Request timeout 30s
Connect timeout 5s

Bug fix: vacuum missing after summary optimize

TraceSummaryDBEngine::run_maintenance() called optimize_table() but not vacuum_table() afterward. Compaction tombstones old Parquet files; without an immediate vacuum those files remain on storage until the next scheduled vacuum cycle.

Fixed to vacuum immediately after a successful optimize:

Ok(()) => {
    if let Err(e) = self.vacuum_table(0).await {
        error!("Post-optimize vacuum failed: {}", e);
    }
    // release task ...
}

This matches the existing behavior in TraceSpanDBEngine.

Internal record type rename (PR #221)

Internal record type renamed across scouter_client, scouter_drift, scouter_evaluate, scouter_events, scouter_server, and py-scouter. No public API change for Python users — stub files updated.


Upgrading from v0.20.0

No action required. All changes are additive or internal.

SCOUTER_OBJECT_CACHE_MB is optional. The default (64 MB) is appropriate for most deployments. Increase it if you have many concurrent readers querying large numbers of Parquet files.