v0.24.0
v0.24.0 Release Summary
What Changed
This release introduces Bifrost, a Delta Lake-backed dataset storage and query system that turns Pydantic models into queryable tables. Combined with streamlined event queue infrastructure and expanded evaluation capabilities, v0.24.0 adds a production-grade data layer to Scouter for storing and querying high-volume records in AI applications.
Breaking Changes
None. No schema migrations, no database changes, no API removals. Existing drift, evaluation, and tracing functionality is unchanged.
Changes
Bifrost: Delta Lake Dataset Storage
Bifrost turns a Pydantic model into a Delta Lake table. You define the schema, push records through a gRPC queue, and query with SQL.
Write path:
DatasetProducer.insert(record)— serializes to JSON and sends via an unbounded channel (returns immediately, sub-microsecond latency).- Background batching — accumulates records until
batch_sizeorscheduled_delay_secstriggers a flush. - Arrow serialization —
DynamicBatchBuilderconverts JSON rows to ArrowRecordBatch, injecting system columns automatically. - gRPC transport — batch sent to server as Arrow IPC bytes.
- Delta Lake — server appends to table partitioned by
scouter_partition_date.
Read path:
DatasetClient.sql(query)— sends SQL to server via gRPC.- DataFusion execution — full SQL support (joins, CTEs, window functions, aggregations).
- Zero-copy delivery — results returned as Arrow IPC bytes.
- Format conversion — call
.to_arrow(),.to_polars(),.to_pandas()to convert. - Strict reads —
DatasetClient.read()returns validated Pydantic model instances.
Schema validation (schema-on-write):
- Pydantic JSON Schema → Arrow schema conversion, fingerprinted.
- Fingerprint checked on every batch write.
- Schema mismatch caught before data lands.
- System columns injected automatically:
scouter_created_at(microsecond timestamp),scouter_partition_date(Date32),scouter_batch_id(UUID v7).
Supported types:
- Primitives:
str,int,float,bool,datetime,date - Collections:
Optional[T],List[T](nested supported) - Enums:
Enum→Dictionary(Int16, Utf8) - Nested models:
BaseModel→Struct(...)(recursive, up to 32 levels)
Clients:
Bifrost— unified read/write (long-lived, callshutdown()on exit)DatasetProducer— write only (background queue, callshutdown()on exit)DatasetClient— read only (stateless queries bound to a table viaTableConfig)
All clients use gRPC transport configured via GrpcConfig. See Bifrost docs for examples.
Event Queue Refactor
Queue infrastructure refactored for clarity and maintainability:
- Queue traits and implementations reorganized in
scouter-events. DatasetQueueadded for high-throughput dataset inserts.- Existing Kafka, RabbitMQ, and Redis adapters unchanged.
gRPC API Expansion
New gRPC endpoints for dataset operations:
CreateDataset— register a table with fingerprint validation.InsertBatch— append Arrow IPC bytes to Delta Lake.QueryDataset— execute SQL and return Arrow IPC results.ReadDataset— read records matching a filter.
Protobuf definitions updated in scouter.grpc.v1.proto.
Evaluation Improvements
Agent assertions:
TraceAssertionTaskadded — assertions on OpenTelemetry spans fetched from Delta Lake.trace_idadded to agent assertion context (enables cross-span evaluation).
Test coverage:
- New
test_agent_assertion.pywith 84 lines of test cases. - New
test_eval_orchestrator.pywith 173 lines covering offline eval orchestration. - Trace evaluator test expanded.
Documentation
New Bifrost docs section:
- Overview — architecture and design
- Quickstart — end-to-end write and read example
- Writing Data — producer config and patterns
- Reading Data — SQL queries and format conversions
- Schema Reference —
TableConfig, type mapping, fingerprinting
Upgrading from v0.23.0
No action required.
- Server: Standard build and deployment. No database migrations. Bifrost uses object storage (local, S3, GCS, Azure) configured via
SCOUTER_STORAGE_URI. - Python client: Standard rebuild with
make setup.project(rebuilds Rust extension). - Existing workflows: Drift, evaluation, and tracing work exactly as before.
To use Bifrost, define a Pydantic schema, create a TableConfig, and use Bifrost or DatasetProducer/DatasetClient to write and query data.