v0.20.0 Release Summary

What Changed

v0.20.0 removes PostgreSQL from the trace/span pipeline entirely. All trace ingestion, storage, querying, and maintenance now runs through DataFusion and Delta Lake only. This release also adds a distributed coordination layer for multi-pod compaction, a pre-aggregated trace summary table for fast listing, cloud storage fixes for GCS/S3/Azure, and three new tuning env vars.

Breaking Changes

The PostgreSQL trace schema is no longer used. Any tooling, migrations, or queries that read trace data from PostgreSQL will stop working after upgrading. The scouter_sql aggregator is now a thin forwarding layer; all trace reads and writes go through Delta Lake via scouter_dataframe.

Changes

Traces now fully on Delta Lake + DataFusion

PostgreSQL has been removed from the trace read/write path. The architecture is now:

gRPC / HTTP ingest → in-memory buffer (actor) → Delta Lake (span table + summary table)
                                                         ↑
                                              DataFusion query engine

The scouter_sql aggregator retains its interface for compatibility but no longer writes span data to PostgreSQL.

Trace summary table

A new trace_summaries Delta Lake table stores one row per trace with pre-computed fields:

Column	Type	Description
`trace_id`	`FixedSizeBinary(16)`	Trace identifier
`service_name`	`Dictionary(Int32, Utf8)`	Service that produced the root span
`root_operation`	`Utf8`	Name of the root span
`start_time` / `end_time`	`Timestamp(µs, UTC)`	Trace wall-clock bounds
`duration_ms`	`Int64`	End-to-end latency in milliseconds
`span_count`	`Int64`	Total spans in the trace
`error_count`	`Int64`	Spans with error status
`search_blob`	`Utf8`	Concatenated attribute text for full-text search
`entity_ids`	`List<Utf8>`	Application entity IDs attached to the trace
`queue_ids`	`List<Utf8>`	Queue message IDs attached to the trace

This table is partitioned by partition_date (Date32). Listing traces and applying filters no longer requires scanning the full span table.

Distributed compaction control table

A new _scouter_control Delta Lake table coordinates compaction, retention, and vacuum tasks across pods. Each task (summary_optimize, etc.) has a single row with idle/processing status, a pod_id, and a next_run_at timestamp. Locks older than 30 minutes are automatically reclaimed.

This prevents multiple pods from running conflicting Z-ORDER optimize operations simultaneously against shared object storage.

New attribute search UDF

A custom DataFusion scalar UDF (match_attr_expr) enables full-text attribute search against the search_blob column. This replaces SQL LIKE patterns that required per-attribute column scans.

// DataFusion query predicate
match_attr_expr(col("search_blob"), lit("user_id=abc123"))

New trace query API routes

Two new HTTP endpoints were added to scouter-server:

GET /traces/:trace_id/spans — returns all spans for a specific trace ID
POST /traces/spans/filter — returns spans matching TraceFilters (service name, time range, attribute values, entity ID, etc.)

Typed DataFusion predicates for Parquet pruning

Query helpers ts_lit() and date_lit() emit typed Timestamp(Microsecond, UTC) and Date32 literals. These match column types exactly, enabling Parquet row-group min/max pruning and partition directory skipping without type coercion overhead.

Cloud storage fixes

GCS / S3 / Azure: storage_root() now correctly extracts only the bucket name from URIs like gs://my-bucket/path/to/prefix. Previously returned the full path after stripping the scheme prefix, causing object store initialization failures.
Azure: Fixed path construction for Delta table locations.
PassthroughLogStoreFactory added for cloud log store registration when using GCS.

Span schema changes

Columns removed from the span table:

root_span_id — derivable from the summary table
depth, span_order, path — unused by query layer

Columns added:

search_blob — concatenated attribute text for UDF-based search
queue_ids — list of queue message IDs

New configuration env vars

Variable	Default	Description
`SCOUTER_TRACE_COMPACTION_INTERVAL_HOURS`	`24`	How often Delta Lake Z-ORDER optimize runs for trace tables
`SCOUTER_TRACE_FLUSH_INTERVAL_SECS`	`5`	How often the in-memory span buffer flushes to Delta Lake
`SCOUTER_TRACE_BUFFER_SIZE`	`10000`	Span buffer capacity before a forced flush

Larger SCOUTER_TRACE_BUFFER_SIZE values reduce the number of small Parquet files written to cloud storage but increase the window of data that could be lost on a crash.

Upgrading from v0.19.0

Remove any direct PostgreSQL queries against trace tables. These tables may still exist but are no longer written to.
Set SCOUTER_STORAGE_URI to a writable location (local path, s3://, gs://, or az://). This was required in v0.19.0 for spans and is now required for summaries and the control table as well.
On first startup, the server creates the trace_summaries and _scouter_control Delta tables automatically. No migration script is needed.
If running multiple server replicas, all replicas must share the same SCOUTER_STORAGE_URI. The control table coordinates cross-pod compaction; replicas pointing at different storage paths will not coordinate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.20.0

Choose a tag to compare

Sorry, something went wrong.