v0.24.0 Release Summary

What Changed

This release introduces Bifrost, a Delta Lake-backed dataset storage and query system that turns Pydantic models into queryable tables. Combined with streamlined event queue infrastructure and expanded evaluation capabilities, v0.24.0 adds a production-grade data layer to Scouter for storing and querying high-volume records in AI applications.

Breaking Changes

None. No schema migrations, no database changes, no API removals. Existing drift, evaluation, and tracing functionality is unchanged.

Changes

Bifrost: Delta Lake Dataset Storage

Bifrost turns a Pydantic model into a Delta Lake table. You define the schema, push records through a gRPC queue, and query with SQL.

Write path:

DatasetProducer.insert(record) — serializes to JSON and sends via an unbounded channel (returns immediately, sub-microsecond latency).
Background batching — accumulates records until batch_size or scheduled_delay_secs triggers a flush.
Arrow serialization — DynamicBatchBuilder converts JSON rows to Arrow RecordBatch, injecting system columns automatically.
gRPC transport — batch sent to server as Arrow IPC bytes.
Delta Lake — server appends to table partitioned by scouter_partition_date.

Read path:

DatasetClient.sql(query) — sends SQL to server via gRPC.
DataFusion execution — full SQL support (joins, CTEs, window functions, aggregations).
Zero-copy delivery — results returned as Arrow IPC bytes.
Format conversion — call .to_arrow(), .to_polars(), .to_pandas() to convert.
Strict reads — DatasetClient.read() returns validated Pydantic model instances.

Schema validation (schema-on-write):

Pydantic JSON Schema → Arrow schema conversion, fingerprinted.
Fingerprint checked on every batch write.
Schema mismatch caught before data lands.
System columns injected automatically: scouter_created_at (microsecond timestamp), scouter_partition_date (Date32), scouter_batch_id (UUID v7).

Supported types:

Primitives: str, int, float, bool, datetime, date
Collections: Optional[T], List[T] (nested supported)
Enums: Enum → Dictionary(Int16, Utf8)
Nested models: BaseModel → Struct(...) (recursive, up to 32 levels)

Clients:

Bifrost — unified read/write (long-lived, call shutdown() on exit)
DatasetProducer — write only (background queue, call shutdown() on exit)
DatasetClient — read only (stateless queries bound to a table via TableConfig)

All clients use gRPC transport configured via GrpcConfig. See Bifrost docs for examples.

Event Queue Refactor

Queue infrastructure refactored for clarity and maintainability:

Queue traits and implementations reorganized in scouter-events.
DatasetQueue added for high-throughput dataset inserts.
Existing Kafka, RabbitMQ, and Redis adapters unchanged.

gRPC API Expansion

New gRPC endpoints for dataset operations:

CreateDataset — register a table with fingerprint validation.
InsertBatch — append Arrow IPC bytes to Delta Lake.
QueryDataset — execute SQL and return Arrow IPC results.
ReadDataset — read records matching a filter.

Protobuf definitions updated in scouter.grpc.v1.proto.

Evaluation Improvements

Agent assertions:

TraceAssertionTask added — assertions on OpenTelemetry spans fetched from Delta Lake.
trace_id added to agent assertion context (enables cross-span evaluation).

Test coverage:

New test_agent_assertion.py with 84 lines of test cases.
New test_eval_orchestrator.py with 173 lines covering offline eval orchestration.
Trace evaluator test expanded.

Documentation

New Bifrost docs section:

Overview — architecture and design
Quickstart — end-to-end write and read example
Writing Data — producer config and patterns
Reading Data — SQL queries and format conversions
Schema Reference — TableConfig, type mapping, fingerprinting

Upgrading from v0.23.0

No action required.

Server: Standard build and deployment. No database migrations. Bifrost uses object storage (local, S3, GCS, Azure) configured via SCOUTER_STORAGE_URI.
Python client: Standard rebuild with make setup.project (rebuilds Rust extension).
Existing workflows: Drift, evaluation, and tracing work exactly as before.

To use Bifrost, define a Pydantic schema, create a TableConfig, and use Bifrost or DatasetProducer/DatasetClient to write and query data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.24.0

Choose a tag to compare

Sorry, something went wrong.