Skip to content

v0.24.0

Choose a tag to compare

@thorrester thorrester released this 25 Mar 12:57
· 255 commits to main since this release
94afba6

v0.24.0 Release Summary

What Changed

This release introduces Bifrost, a Delta Lake-backed dataset storage and query system that turns Pydantic models into queryable tables. Combined with streamlined event queue infrastructure and expanded evaluation capabilities, v0.24.0 adds a production-grade data layer to Scouter for storing and querying high-volume records in AI applications.


Breaking Changes

None. No schema migrations, no database changes, no API removals. Existing drift, evaluation, and tracing functionality is unchanged.


Changes

Bifrost: Delta Lake Dataset Storage

Bifrost turns a Pydantic model into a Delta Lake table. You define the schema, push records through a gRPC queue, and query with SQL.

Write path:

  • DatasetProducer.insert(record) — serializes to JSON and sends via an unbounded channel (returns immediately, sub-microsecond latency).
  • Background batching — accumulates records until batch_size or scheduled_delay_secs triggers a flush.
  • Arrow serialization — DynamicBatchBuilder converts JSON rows to Arrow RecordBatch, injecting system columns automatically.
  • gRPC transport — batch sent to server as Arrow IPC bytes.
  • Delta Lake — server appends to table partitioned by scouter_partition_date.

Read path:

  • DatasetClient.sql(query) — sends SQL to server via gRPC.
  • DataFusion execution — full SQL support (joins, CTEs, window functions, aggregations).
  • Zero-copy delivery — results returned as Arrow IPC bytes.
  • Format conversion — call .to_arrow(), .to_polars(), .to_pandas() to convert.
  • Strict reads — DatasetClient.read() returns validated Pydantic model instances.

Schema validation (schema-on-write):

  • Pydantic JSON Schema → Arrow schema conversion, fingerprinted.
  • Fingerprint checked on every batch write.
  • Schema mismatch caught before data lands.
  • System columns injected automatically: scouter_created_at (microsecond timestamp), scouter_partition_date (Date32), scouter_batch_id (UUID v7).

Supported types:

  • Primitives: str, int, float, bool, datetime, date
  • Collections: Optional[T], List[T] (nested supported)
  • Enums: EnumDictionary(Int16, Utf8)
  • Nested models: BaseModelStruct(...) (recursive, up to 32 levels)

Clients:

  • Bifrost — unified read/write (long-lived, call shutdown() on exit)
  • DatasetProducer — write only (background queue, call shutdown() on exit)
  • DatasetClient — read only (stateless queries bound to a table via TableConfig)

All clients use gRPC transport configured via GrpcConfig. See Bifrost docs for examples.

Event Queue Refactor

Queue infrastructure refactored for clarity and maintainability:

  • Queue traits and implementations reorganized in scouter-events.
  • DatasetQueue added for high-throughput dataset inserts.
  • Existing Kafka, RabbitMQ, and Redis adapters unchanged.

gRPC API Expansion

New gRPC endpoints for dataset operations:

  • CreateDataset — register a table with fingerprint validation.
  • InsertBatch — append Arrow IPC bytes to Delta Lake.
  • QueryDataset — execute SQL and return Arrow IPC results.
  • ReadDataset — read records matching a filter.

Protobuf definitions updated in scouter.grpc.v1.proto.

Evaluation Improvements

Agent assertions:

  • TraceAssertionTask added — assertions on OpenTelemetry spans fetched from Delta Lake.
  • trace_id added to agent assertion context (enables cross-span evaluation).

Test coverage:

  • New test_agent_assertion.py with 84 lines of test cases.
  • New test_eval_orchestrator.py with 173 lines covering offline eval orchestration.
  • Trace evaluator test expanded.

Documentation

New Bifrost docs section:


Upgrading from v0.23.0

No action required.

  • Server: Standard build and deployment. No database migrations. Bifrost uses object storage (local, S3, GCS, Azure) configured via SCOUTER_STORAGE_URI.
  • Python client: Standard rebuild with make setup.project (rebuilds Rust extension).
  • Existing workflows: Drift, evaluation, and tracing work exactly as before.

To use Bifrost, define a Pydantic schema, create a TableConfig, and use Bifrost or DatasetProducer/DatasetClient to write and query data.