Skip to content

v0.2.0

Latest

Choose a tag to compare

@MilkClouds MilkClouds released this 09 May 03:17
· 4 commits to main since this release
f740abc

v0.2.0

25 PRs merged since v0.1.0.

Highlights

  • Leaderboard pipeline rebuild: Two-stage extract/refine with schema validation and fabrication guardrails. Monthly updates via AI-assisted pipeline. #37, #38, #39, #40, #42, #43, #45

  • Per-benchmark/model README docs: Every benchmark and model server now has its own directory with a README containing setup, configs, and Docker image info. Browse configs/ to get started. #61, #62, #63

  • Composite EpisodeRecorder: Pluggable per-episode recording (MP4 + JSONL) with --dev bind-mount for live code editing in Docker. #52, #65

  • SimplerEnv Variant Aggregation (VA): Aggregate per-variant scores matching the SimplerEnv paper protocol. #60

  • Model weight availability checking: require_model_available() in dirs.py gives instant failure with download instructions instead of minutes-long timeouts. Same pattern as ensure_license(). #67

New benchmarks

  • BEHAVIOR-1K: OmniGibson, R1Pro action space, demo-replay baseline, runtime licence confirmation. #57
  • LIBERO-Plus: Perturbation evaluation with 5 difficulty-classified suites. #47
  • MolmoSpaces-Bench: MolmoSpaces simulation. #33

New model servers

  • VLANeXt: HuggingFace auto-download, LIBERO multi-suite configs. #34 by @hiteshK03
  • MolmoBot: Molmo2-4B + DiT flow-matching action head. #33
  • MME-VLA: pi0.5 baseline + 14 memory-augmented variants for RoboMME. #30

Dependency and reproducibility

  • All model server git deps moved to [tool.uv.sources] with pinned commit hashes and exclude-newer dates. #67
  • Missing upstream deps annotated: molmobot (torchmetrics), pi0/mme_vla (pytest/chex). #67

Docker and infrastructure

  • --help for build.sh / push.sh, derived image support, curobo v0.7.8 pin, RLBench license gating. #55, #63, #67
  • Smoke test timeout 300s to 600s, cogact TF GPU init fix, RTC default level fallback. #67
  • Orchestrator output_dir resolved to absolute path for Docker. #67

Bug fixes

  • molmobot: fix uv cache corruption with #subdirectory= wheel builds, use snapshot_download for path resolution, position-based image access, bounded obs_history via deque. #67
  • config_loader: fix cast() for Python 3.8 Docker environments.
  • serve: support dict type in auto-generated CLI args.

Breaking changes

  • Config directory structure: configs/*.yaml is now configs/benchmarks/<name>/ and configs/model_servers/<name>/.