Skip to content

v0.20.0

Choose a tag to compare

@thorrester thorrester released this 11 Mar 19:52
· 404 commits to main since this release
aa7ba7d

v0.20.0 Release Summary

What Changed

v0.20.0 removes PostgreSQL from the trace/span pipeline entirely. All trace ingestion, storage, querying, and maintenance now runs through DataFusion and Delta Lake only. This release also adds a distributed coordination layer for multi-pod compaction, a pre-aggregated trace summary table for fast listing, cloud storage fixes for GCS/S3/Azure, and three new tuning env vars.


Breaking Changes

The PostgreSQL trace schema is no longer used. Any tooling, migrations, or queries that read trace data from PostgreSQL will stop working after upgrading. The scouter_sql aggregator is now a thin forwarding layer; all trace reads and writes go through Delta Lake via scouter_dataframe.


Changes

Traces now fully on Delta Lake + DataFusion

PostgreSQL has been removed from the trace read/write path. The architecture is now:

gRPC / HTTP ingest → in-memory buffer (actor) → Delta Lake (span table + summary table)
                                                         ↑
                                              DataFusion query engine

The scouter_sql aggregator retains its interface for compatibility but no longer writes span data to PostgreSQL.

Trace summary table

A new trace_summaries Delta Lake table stores one row per trace with pre-computed fields:

Column Type Description
trace_id FixedSizeBinary(16) Trace identifier
service_name Dictionary(Int32, Utf8) Service that produced the root span
root_operation Utf8 Name of the root span
start_time / end_time Timestamp(µs, UTC) Trace wall-clock bounds
duration_ms Int64 End-to-end latency in milliseconds
span_count Int64 Total spans in the trace
error_count Int64 Spans with error status
search_blob Utf8 Concatenated attribute text for full-text search
entity_ids List<Utf8> Application entity IDs attached to the trace
queue_ids List<Utf8> Queue message IDs attached to the trace

This table is partitioned by partition_date (Date32). Listing traces and applying filters no longer requires scanning the full span table.

Distributed compaction control table

A new _scouter_control Delta Lake table coordinates compaction, retention, and vacuum tasks across pods. Each task (summary_optimize, etc.) has a single row with idle/processing status, a pod_id, and a next_run_at timestamp. Locks older than 30 minutes are automatically reclaimed.

This prevents multiple pods from running conflicting Z-ORDER optimize operations simultaneously against shared object storage.

New attribute search UDF

A custom DataFusion scalar UDF (match_attr_expr) enables full-text attribute search against the search_blob column. This replaces SQL LIKE patterns that required per-attribute column scans.

// DataFusion query predicate
match_attr_expr(col("search_blob"), lit("user_id=abc123"))

New trace query API routes

Two new HTTP endpoints were added to scouter-server:

  • GET /traces/:trace_id/spans — returns all spans for a specific trace ID
  • POST /traces/spans/filter — returns spans matching TraceFilters (service name, time range, attribute values, entity ID, etc.)

Typed DataFusion predicates for Parquet pruning

Query helpers ts_lit() and date_lit() emit typed Timestamp(Microsecond, UTC) and Date32 literals. These match column types exactly, enabling Parquet row-group min/max pruning and partition directory skipping without type coercion overhead.

Cloud storage fixes

  • GCS / S3 / Azure: storage_root() now correctly extracts only the bucket name from URIs like gs://my-bucket/path/to/prefix. Previously returned the full path after stripping the scheme prefix, causing object store initialization failures.
  • Azure: Fixed path construction for Delta table locations.
  • PassthroughLogStoreFactory added for cloud log store registration when using GCS.

Span schema changes

Columns removed from the span table:

  • root_span_id — derivable from the summary table
  • depth, span_order, path — unused by query layer

Columns added:

  • search_blob — concatenated attribute text for UDF-based search
  • queue_ids — list of queue message IDs

New configuration env vars

Variable Default Description
SCOUTER_TRACE_COMPACTION_INTERVAL_HOURS 24 How often Delta Lake Z-ORDER optimize runs for trace tables
SCOUTER_TRACE_FLUSH_INTERVAL_SECS 5 How often the in-memory span buffer flushes to Delta Lake
SCOUTER_TRACE_BUFFER_SIZE 10000 Span buffer capacity before a forced flush

Larger SCOUTER_TRACE_BUFFER_SIZE values reduce the number of small Parquet files written to cloud storage but increase the window of data that could be lost on a crash.


Upgrading from v0.19.0

  1. Remove any direct PostgreSQL queries against trace tables. These tables may still exist but are no longer written to.
  2. Set SCOUTER_STORAGE_URI to a writable location (local path, s3://, gs://, or az://). This was required in v0.19.0 for spans and is now required for summaries and the control table as well.
  3. On first startup, the server creates the trace_summaries and _scouter_control Delta tables automatically. No migration script is needed.
  4. If running multiple server replicas, all replicas must share the same SCOUTER_STORAGE_URI. The control table coordinates cross-pod compaction; replicas pointing at different storage paths will not coordinate.