-
Notifications
You must be signed in to change notification settings - Fork 520
Description
Hey all! 👋🏽
First of all, kudos for the amazing work with @Verifiers!
As the open-source RL ecosystem matures and as verifiers begin to standardize how we build and share environments, it’s becoming clear that we need observability tooling that truly understands RL primitives.
This week, I decided to experiment with building an initial version of an open-source, RL-native observability framework. Thought it would be fun and interesting to hack.
https://github.com/kaushikb11/verifiers-monitor
Tweet
Running RL experiments without direct visibility into rollout quality, reward distributions, or failure modes is chaotic.
Monitor provides live tracking, per-example and failure inspection, and programmatic access—so you can see what’s happening during runs and debug what went wrong afterward.
Multi-rollout analysis highlights high-variance examples where your model behaves inconsistently.
Reward attribution surfaces which functions drive the final score. Session comparison lets you track metrics across iterations.
The verifiers monitor SDK provides you with structured access to relevant data from your past runs.
I believe the future of RL observability could look like this:
You’re working alongside your model, spawning multiple versions of your environment by tweaking components at different points, much like using git worktrees for RL experiments.
Would love to learn what the verifiers team and community think about it! 🤗