v0.2.0
25 PRs merged since v0.1.0.
Highlights
-
Leaderboard pipeline rebuild: Two-stage extract/refine with schema validation and fabrication guardrails. Monthly updates via AI-assisted pipeline. #37, #38, #39, #40, #42, #43, #45
-
Per-benchmark/model README docs: Every benchmark and model server now has its own directory with a README containing setup, configs, and Docker image info. Browse
configs/to get started. #61, #62, #63 -
Composite EpisodeRecorder: Pluggable per-episode recording (MP4 + JSONL) with
--devbind-mount for live code editing in Docker. #52, #65 -
SimplerEnv Variant Aggregation (VA): Aggregate per-variant scores matching the SimplerEnv paper protocol. #60
-
Model weight availability checking:
require_model_available()indirs.pygives instant failure with download instructions instead of minutes-long timeouts. Same pattern asensure_license(). #67
New benchmarks
- BEHAVIOR-1K: OmniGibson, R1Pro action space, demo-replay baseline, runtime licence confirmation. #57
- LIBERO-Plus: Perturbation evaluation with 5 difficulty-classified suites. #47
- MolmoSpaces-Bench: MolmoSpaces simulation. #33
New model servers
- VLANeXt: HuggingFace auto-download, LIBERO multi-suite configs. #34 by @hiteshK03
- MolmoBot: Molmo2-4B + DiT flow-matching action head. #33
- MME-VLA: pi0.5 baseline + 14 memory-augmented variants for RoboMME. #30
Dependency and reproducibility
- All model server git deps moved to
[tool.uv.sources]with pinned commit hashes andexclude-newerdates. #67 - Missing upstream deps annotated: molmobot (
torchmetrics), pi0/mme_vla (pytest/chex). #67
Docker and infrastructure
--helpforbuild.sh/push.sh, derived image support, curobo v0.7.8 pin, RLBench license gating. #55, #63, #67- Smoke test timeout 300s to 600s, cogact TF GPU init fix, RTC default level fallback. #67
- Orchestrator
output_dirresolved to absolute path for Docker. #67
Bug fixes
- molmobot: fix uv cache corruption with
#subdirectory=wheel builds, usesnapshot_downloadfor path resolution, position-based image access, boundedobs_historyviadeque. #67 - config_loader: fix
cast()for Python 3.8 Docker environments. - serve: support dict type in auto-generated CLI args.
Breaking changes
- Config directory structure:
configs/*.yamlis nowconfigs/benchmarks/<name>/andconfigs/model_servers/<name>/.