Real-time observability tool for verifiers training and evaluation

Hey all! 👋🏽 

First of all, kudos for the amazing work with @verifiers!

As the open-source RL ecosystem matures and as verifiers begin to standardize how we build and share environments, it’s becoming clear that we need observability tooling that truly understands RL primitives.  

This week, I decided to experiment with building an initial version of an open-source, RL-native observability framework. Thought it would be fun and interesting to hack.

https://github.com/kaushikb11/verifiers-monitor
[Tweet](https://x.com/kaushik_bokka/status/1974692666689614076)

Running RL experiments without direct visibility into rollout quality, reward distributions, or failure modes is chaotic. 
Monitor provides live tracking, per-example and failure inspection, and programmatic access—so you can see what’s happening during runs and debug what went wrong afterward.

Multi-rollout analysis highlights high-variance examples where your model behaves inconsistently.
Reward attribution surfaces which functions drive the final score. Session comparison lets you track metrics across iterations.
The verifiers monitor SDK provides you with structured access to relevant data from your past runs. 

I believe the future of RL observability could look like this:

You’re working alongside your model, spawning multiple versions of your environment by tweaking components at different points, much like using git worktrees for RL experiments.

Would love to learn what the verifiers team and community think about it! 🤗 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Real-time observability tool for verifiers training and evaluation #433

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Real-time observability tool for verifiers training and evaluation #433

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions