Release v0.3.0 · NVIDIA-NeMo/Gym

Release Summary

NeMo Gym v0.3.0 ships alongside the NVIDIA Nemotron 3 Ultra model release, open sourcing the environments and corresponding datasets used during training.

Highlights:

70+ new environments, including benchmarks such as Tau2 and Nemotron RL training environments
Popular harness available out-of-the-box such as Claude Code and Hermes
Integrations with OpenEnv and Harbor - use environments from these libraries directly with NeMo Gym
Integration with VeRL - train with VeRL and scale rollout collection with NeMo Gym

First-Time Contributors

We welcomed 30+ new contributors to this release! Here are a few highlights:

@grace-lam added the integration to run Harbor environments with NeMo Gym
@aleksficek — added Competitive Coding Challenges environment
@jthomson04 improved rollout resilience when models emit malformed tool-call arguments or missing message content

Thank you to all the new contributors for helping make NeMo Gym better!

New Environments & Benchmarks

Added 70+ new environments including novel datasets and integrations of popular benchmarks. New coverage spans:

Coding — competitive programming, code infilling, SQL generation, and software-engineering benchmarks with execution-based verification
Math & proofs — olympiad-style problems, proof grading and validation, and formal verification (including Lean)
Knowledge & science — graduate-level QA, chemistry and physics tasks, and lab-style reasoning (including multimodal figure, table, and protocol tasks)
Agentic — multi-turn tool use, search, sandboxed execution, finance workflows, and tau-bench-style conversational agents
Instruction following — format constraints, citation compliance, and IFBench-style rule verification
Safety & RLHF — jailbreak detection, abstention calibration, prompt-injection resistance, and generative reward modeling
Multimodal, speech & translation — VLM benchmarks, visual grounding, ASR evaluation, and machine-translation quality metrics
Chat & broad knowledge — arena-style preference evaluation and MMLU-family benchmarks
Interactive RL — Gymnasium-style multi-step environments for spatial and game-based training

See the Available Environments table for the full list.

Configure Agent Harnesses

Claude Code — available out of the box in NeMo Gym
Hermes — available out of the box in NeMo Gym
LangGraph agent — an adapter that lets you build custom agents using LangGraph patterns (reflection, subagent orchestration, parallel thinking, rewoo)
Gymnasium agent — generic multi-turn harness for use with OpenAI Gym-style environments

Configure Models

Optional max_concurrent_requests on the OpenAI model server to cap in-flight API calls — useful for rate-limited external endpoints when rollout concurrency is high

Rollout Collection & Profiling

New ng_aggregate_rollouts command to merge rollout shards collected independently across multiple nodes, enabling distributed eval without requiring a single coordinated collection job

Environment Library Integrations

OpenEnv — combine OpenEnv environments with NeMo Gym environments
Harbor — combine Harbor environments with NeMo Gym environments

Deprecation Notices

Documentation has moved from Sphinx to Fern. Old Sphinx URLs redirect to the new site at docs.nvidia.com/nemo/gym. The docs/ directory is no longer used for publishing.

Bug Fixes

Fixed aiohttp connection limit exhaustion under FastAPI/Uvicorn with multiple workers
Fixed session cookie propagation for Starlette >= 1.0.0
Fixed duplicated usage counting and errors on empty usage in subsequent model calls
Improved rollout resilience when models emit malformed tool-call arguments or missing message content
Fixed prompt-key hashing when inputs contain Pydantic BaseModel objects

Documentation

New concepts pages for environments, evaluation, and training
Improved Architecture page to clarify how environments map to NeMo Gym components
Consolidated detailed setup and quickstart into a single improved quickstart with clearer descriptions
Expanded Ecosystem page with environment library, training framework, and agent harness integrations

Changelog Details

feat: VLM circle click environment (#837) by @cmunley1
feat: LocalVLLMModel bump to vLLM 0.17.0 (#839) by @bxyu-nvidia
feat: Status updates for agent refs during rollout collection (#843) by @bxyu-nvidia
feat: ether0 chemistry benchmark environment (#838) by @cmunley1
docs: prime intellect verifiers dataset generation instruction update (#851) by @cmunley1
Finance Agent Environment (#742) by @ushnish-de
feat: Add XSTest safety benchmark resource server (#764) by @dcfarris
Create a guide to build environments in NeMo Gym (#711) by @shashank3959
Add multi-step tool-calling data generation example (#778) by @shashank3959
docs: Fix TRL docs link (#857) by @bxyu-nvidia
Swap readme table columns (to main) (#856) by @fsiino-nvidia
Introduce Benchmarks directory (#858) by @gwarmstrong
add gpqa diamond dataset (#845) by @azkalot1
docs: rl <> gym compatibility table (#803) by @lbliii
Updated contributing guide message (#862) by @cwing-nvidia
docs: Nemotron 3 Super recipe link (#863) by @bxyu-nvidia
Gym 0.2.0 huggingface dataset pointers (#859) by @fsiino-nvidia
Add support for SWE-Multilingual benchmark (#822) by @roclark
chore: Bump python package version to 0.3.0.rc0 and descriptions (#883) by @chtruong814
feat: add Harbor integration (#751) by @grace-lam
docs: Fix MultiChallenge train dataset description (#885) by @bxyu-nvidia
docs: update GPQA-D readme (#888) by @cmunley1
feat: add spider2_lite resource server (#864) by @ryan-lempka
Add prompt config for templating (#861) by @gwarmstrong
Compute aggregate metrics (#890) by @gwarmstrong
Streamline Benchmark rollouts and add aime24/math_with_judge metrics (#891) by @gwarmstrong
added bbh-train support to gym (#894) by @arnavkomaragiri
updated README with license info (#895) by @arnavkomaragiri
feat: VLMEvalKit (#872) by @vadam5
bug: Fix README table display (#897) by @bxyu-nvidia
feat: Initial integration with OpenEnv (#898) by @ahmadki
feat: add aime25 benchmark (#899) by @gwarmstrong
GPQA benchmark (#903) by @gwarmstrong
Structured Outputs update with YAML and XML (#865) by @jkyi-nvidia
feat: langgraph integration (#877) by @vadam5
Add proof environments (#907) by @smahdavi4
feat: Benchmark infra refactors (#906) by @bxyu-nvidia
[Fix] use venv Python for swerl_gen Ray workers instead of hardcoded PYTHONPATH (#920) by @spacegoing
[Fix] guard nltk download with local find() to avoid unnecessary remote fetch (#919) by @spacegoing
[fix] (code_gen): use runtime_env py_executable for Ray workers (#913) by @spacegoing
docs: version bump, CTA link changes (#880) by @vadam5
Add zero reward group option for proof judge environment (#923) by @smahdavi4
fix: always send session cookie for starlette >= 1.0.0 (#942) by @cmunley1
feat: Fix duplicated usage counting and errors on empty usage in subsequent model calls (#939) by @bxyu-nvidia
benchmark: LiveCodeBench v5 and v6 (#933) by @bxyu-nvidia
fix: reasoning gym duplicate license (#947) by @cmunley1
SWE agent refactor (#934) by @sdevare-nv
feat: tee gym server subprocess logs to a configurable directory (#950) by @ananthsub
feat: Browsecomp benchmark exposure (#944) by @bxyu-nvidia
ci: upgrade GitHub Actions for Node.js 24 compatibility (#932) by @ko3n1g
docs: add aiohttp-over-httpx guidance and multi-turn agent patterns (#957) by @cwing-nvidia
feat: add dataset preparation script for spider2_lite (#959) by @ryan-lempka
feat: Start Nemotron 3 Ultra benchmarks config; expose Spider 2 lite and XSTest benchmarks (#958) by @bxyu-nvidia
docs: dataset availability (#962) by @cmunley1
fix: Match torch backend auto in genrm model (#963) by @bxyu-nvidia
Support for multiple gold choices in swerl_llm_judge (#956) by @atefehsz
feat(ether0): Add boxed and Answer: LETTER extraction fallbacks (#925) by @jubick1337
fix: RMtree ignores errors (#964) by @bxyu-nvidia
feat: AALCR and Ruler benchmarks; Misc infra (#966) by @bxyu-nvidia
terminus judge improvement for sim only mode (#968) by @jialeiwang
Abstention Environment (HotpotQA) (#954) by @MahanFathi
chore: bump _code_freeze workflow to v0.86.0 (#978) by @ko3n1g
SWE: update OH version (#979) by @sdevare-nv
fix: Handle BaseModel inputs in prompt-key hashing. (#991) by @ffrujeri
docs: llm-as-a-judge (#926) by @fsiino-nvidia
Add the RDKit-Chemistry RL Environment (#984) by @danecor
feat: mmlu_pro and mmlu_prox benchmarks (#988) by @fsiino-nvidia
feat: Misc infra (#970) by @bxyu-nvidia
feat: Introduce NVARC Resource Server with inductive and transductive modes (#1003) by @cmunley1
Add CVDP benchmark resource server with apptainer instead of docker (#928) by @arti4nvj
feat: add ifbench (#999) by @fsiino-nvidia
Upstream 20260408 (#1039) by @bxyu-nvidia
fix: GenRM lock in order to properly handle concurrent requests. (#1041) by @ffrujeri
Tau2 benchmark (#1049) by @bxyu-nvidia
Add tau2 to Nemotron 3 Ultra benchmarks (#1052) by @bxyu-nvidia
feat: Fix sequential reasoning allowed (#1053) by @bxyu-nvidia
Fix aiohttp connection limit under FastAPI/Uvicorn workers > 1 (#1054) by @bxyu-nvidia
fix: pypi (#1056) by @cmunley1
Additional Tau2 metrics (#1064) by @bxyu-nvidia
Bump version to 0.2.1 and make wheel test mandatory (#1065) by @kajalj22
renamed simple_agent to cvdp_agent for consistency (#1024) by @arti4nvj
feat: VLM counting environment (#930) by @cmunley1
fix: add value field to circle vlm envs (#1074) by @cmunley1
Update ns_tools to use NeMo Skills nemo-skills-tools subpackage (#1078) by @gwarmstrong
Update lc_judge.yaml (#1082) by @fayejf
fix: remove XSTest string-match fallback, require judge model (#1058) by @dcfarris
New structured outputs formats envs (#1037) by @jkyi-nvidia
fix: Revert package info version to 0.3.0.rc0 (#1088) by @chtruong814
StructEval (Text) Environment (#1085) by @jkyi-nvidia
terminal pivot multi harness (#1036) by @jialeiwang
Competitive Coding Challenges Gym Environment (#994) by @aleksficek
Update jailbreak env: response policy based verification (#1059) by @prasoonvarshney
RL Environment for Indirect Prompt Injection (#1051) by @makeshn
fix: remove mini-swe dummy resources server (#1077) by @cmunley1
feat: add labbench2 VLM benchmark (#1093) by @azkalot1
feat: add new env for lc retrieval & count ability (#927) by @fayejf
Add omniscience benchmark and resource server (#1095) by @gwarmstrong
ci: Fix release workflow (#1084) by @chtruong814
Add birdbench benchmark and bird_sql resource server (#1098) by @gwarmstrong
Add MRCR benchmark and resource server (#1100) by @gwarmstrong
feat: add new browsecomp benchmark (#1087) by @yuki-97
fix: pass num_repeats_add_seed via metadata.extra_body (#1099) by @gwarmstrong
docs: add GitHub badges to README (#1002) by @cwing-nvidia
Add gsm8k and hendrycks_math benchmarks (#1104) by @gwarmstrong
feat: add miniF2F benchmark (#1111) by @stephencge
feat: add ProofNet benchmark (#1114) by @stephencge
feat: add PutnamBench benchmark (#1115) by @stephencge
feat: Gymnasium style base environment (#1072) by @cmunley1
feat: add MOBench benchmark (#1113) by @stephencge
Simplify verifiers_agent to use upstream NeMoRLChatCompletionsClient (#1076) by @mferrato
Add hmmt_feb25 benchmark (#1112) by @gwarmstrong
fix: include agent-only environments in readme table (#1091) by @cmunley1
Add hmmt_nov25 benchmark (#1117) by @gwarmstrong
Fern docs migration with fidelity fixes (#1045) by @lbliii
Add proof_bench_judge benchmark and resource server (#1118) by @gwarmstrong
Add AIME24-X, AIME25-X, and GPQA-X benchmarks (#1120) by @wedu-nvidia
Add APEX Shortlist benchmark (#1105) by @gwarmstrong
feat: add aime26 benchmark (symbolic-only, MathArena source) (#1123) by @gwarmstrong
feat: improve browsecomp (#1109) by @yuki-97
feat: support disable interleaved reasoning (#1110) by @yuki-97
Add Stirrup agent + GDPVal eval/RL environment (#1090) by @Kh4L
rename gymnasium (#1136) by @cmunley1
add mmlu, mmmlu, and mmlu-redux benchmarks (#1125) by @wedu-nvidia
Simple version of #700 (#1138) by @tdene
structured outputs v4 tool calls (#1127) by @jkyi-nvidia
fix(stirrup_agent): pin Ray worker venv via runtime_env (fixes GDPVal reward=0) (#1140) by @agronskiy
Add task-info logging to nemo_gym/rollout_collection.py through optional logging flag. + Add codex debugging skill (#1142) by @jkyi-nvidia
Add LibriSpeech-PC benchmark, asr_with_pc resource server, and audio sidechannel in vllm_model (#1144) by @gwarmstrong
fix(ruler): factor RULER's thread-unsafe nltk init into its own module (#1150) by @agronskiy
Add ioi benchmark and resource server (#1124) by @gwarmstrong
Add ifeval benchmark (data-only) (#1158) by @gwarmstrong
Add longbench-v2 benchmark (data-only) (#1159) by @gwarmstrong
Add longcodebench benchmark (data-only) (#1157) by @gwarmstrong
Add answer-judge, global-piqa, math-500, proof-arena-judge, and supergpqa (#1151) by @wedu-nvidia
Add livecodebench-x benchmark (data-only) (#1169) by @gwarmstrong
fix(stirrup_agent): accept local paths in reference_file_urls (#1173) by @Kh4L
Add file-path audio support to the vllm_model audio sidechannel (#1170) by @gwarmstrong
SWE Updates 0428 (#1172) by @sdevare-nv
fix(gdpval): plumb judge_responses_create_params_overrides into create() call (#1174) by @Kh4L
Add imo_answerbench benchmark (#1155) by @gwarmstrong
[StirrupAgent] Persist GDPVal deliverables per repeat (task_X/repeat_N/) (#1183) by @Kh4L
fix: don't crash rollouts on malformed tool-call arguments and missing message content (#1180) by @jthomson04
[Stirrup] Fix per-task Apptainer code_exec for GDPVal (#1182) by @Kh4L
fix(gdpval): compare each eval rollout against all reference repeats (#1198) by @agronskiy
[Stirrup] Lift LLM kwargs from config and clear stale deliverables (#1187) by @Kh4L
ci: selective PR tests, full suite on merge, health-check polling (#1149) by @kajalj22
fix: Tau2 propogates max_output_tokens (#1202) by @bxyu-nvidia
ci(fern): track latest Fern CLI via npx instead of pinning (#1194) by @lbliii
docs(fern): drop scheme from instance url, enable basepath-aware (#1156) by @lbliii
feat(benchmarks): BrowseComp fixes and efficiency improvements (#1203) by @e-dobrowolska
docs: expand CI/CD section in development setup guide (#1212) by @kajalj22
Fix stirrup summarization tool history (#1181) by @syadav481
fix: verifiers agent ToolEnv downstream use (#1214) by @cmunley1
[GDPVal] Add opt-in persistence of raw judge responses (#1225) by @Kh4L
fix(stirrup_agent): parse Tavily key list + rotate on 401/403/429 (#1226) by @agronskiy
fix(gdpval): make Office→PDF preconvert actually work in comparison mode (#1228) by @agronskiy
fix(gdpval): bound /verify wallclock on multimodal long-tail tasks (#1229) by @agronskiy
HLE Benchmark (#1028) by @fsiino-nvidia
docs: remove "Design a customer evaluation" page (#1240) by @cwing-nvidia
docs: document how VLLMModel handles max_seq_length exceeded errors (#1207) by @cwing-nvidia
docs: verl integration (#1116) by @cmunley1
docs: improve product overview (#1186) by @cwing-nvidia
Turing Envs (Covers Multichallenge, InverseIFEval, CFBench and SysBench datasets) (#951) by @MahanFathi
Add simpleqa benchmark + simpleqa resource server (#1162) by @gwarmstrong
Add physics benchmark + physics_judge resource server (#1163) by @gwarmstrong
Add imo-gradingbench benchmark + imo_gradingbench resource server (#1161) by @gwarmstrong
feat: hermes agent harness (#1033) by @cmunley1
Add frontierscience-olympiad benchmark + frontierscience_judge resource server (#1164) by @gwarmstrong
Add hotpotqa_closedbook benchmark + hotpotqa_qa resource server (#1166) by @gwarmstrong
Add wmt24pp benchmark and wmt_translation resource server (#1199) by @gwarmstrong
Add human-eval benchmark and resource server (#1201) by @gwarmstrong
Add arena-hard-v2 benchmark and resource server (#1122) by @gwarmstrong
ci: skip tests for benchmark-only changes (#1260) by @kajalj22
Add flores200 benchmark (#1259) by @gwarmstrong
Add ugphysics benchmark + ugphysics_judge resource server (#1167) by @gwarmstrong
Add mbpp benchmark (#1257) by @gwarmstrong
Add arena-hard benchmark (#1261) by @gwarmstrong
Add m-arena-hard-v2 benchmark (#1262) by @gwarmstrong
Add m-arena-hard benchmark (#1263) by @gwarmstrong
Add human-eval-infilling (FIM) benchmark + code_fim resource server (#1258) by @gwarmstrong
Add speed-bench benchmark and resource server (#1232) by @gwarmstrong
Add imo_proofbench benchmark and imo_proofbench_judge resource server (#1230) by @gwarmstrong
Add bigcodebench benchmark and resource server (#1211) by @gwarmstrong
feat(asr_with_pc): add Hallucination and ASR_LEADERBOARD task_types (#1177) by @gwarmstrong
Add polymath benchmark + polymath resource server (weighted, per-language) (#1168) by @gwarmstrong
feat(benchmarks): add musan benchmark (data-only) (#1179) by @gwarmstrong
feat(benchmarks): add numb3rs benchmark (data-only) (#1178) by @gwarmstrong
feat(benchmarks): add asr_leaderboard benchmark (data-only) (#1176) by @gwarmstrong
docs: add explicit guidance on when to use NeMo Gym (#1266) by @cwing-nvidia
fix(gdpval): normalize python-docx ns0 namespacing before LibreOffice convert (#1270) by @agronskiy
chore: pin verifiers to 0.1.14 (#1271) by @cmunley1
benchmark: protocolqa2 labbench2 (#1238) by @azkalot1
docs: add release notes page to About section (#1279) by @cwing-nvidia
ci: Major refactor of release-workflows (#1242) by @ko3n1g
docs: generalize training workflow reference in vLLM page (#1287) by @cwing-nvidia
fix(docs): replace generic "Index" link text with actual page titles (#1284) by @cwing-nvidia
fix(gdpval): also apt-install JRE when libreoffice is pre-baked into image (#1268) by @agronskiy
CVDP Resources Server Fixes for commercial tooling support (#1276) by @arti4nvj
docs: add Daytona Harbor tutorial (#1227) by @mu-hashmi
docs(fern): main + latest GA alias, rename v0.2 → v0.2.1, fix CI fork-secret access (#1241) by @lbliii
SWE - Update openhands, add skip eval (#1288) by @sdevare-nv
SWE: add golden patch validation (#1296) by @sdevare-nv
docs: restructure concepts section (#1278) by @cwing-nvidia
ci(fern): generate library reference before publishing docs (#1297) by @lbliii
Rollout to Metrics Mapping + Partial Reward Profiling + Reward Profiling Skill (#1145) by @jkyi-nvidia
docs(fern): fix sidebar ordering to match original site (#1299) by @lbliii
add openai headers (#1027) by @cdreetz
add pivot dataset creation skill (#1308) by @jkyi-nvidia
Codex/equiv judge extraction (#1313) by @jiacheng-xu
Support sharded rollout aggregation via ng_aggregate_rollouts (#1314) by @gwarmstrong
docs: improve getting started (#1283) by @cwing-nvidia
docs: add environment concepts pages (#1265) by @cwing-nvidia
docs: move index page to About section (#1320) by @cwing-nvidia
docs: make Main the default docs version (#1321) by @cwing-nvidia
docs(fern): fix redirects so Sphinx URLs actually resolve (#1310) by @lbliii
fix(stirrup_agent): libreoffice whitespace + reference_files double-nest (#1333) by @agronskiy
feat(openai_model): opt-in concurrency cap via per-server semaphore (#1208) by @agronskiy
docs(fern): retire Latest slug, flatten /latest/ redirects to /main/, drop Main beta, fix duplicate H1 (#1328) by @lbliii
feat: GRL Tetris Gymnasium Environment (#1331) by @cmunley1
feat: GRL Sokoban Gymnasium Environment (#1330) by @cmunley1
fix(stirrup): restore tool messages for model calls (#1277) by @syadav481
docs: generalize training card links to all training tutorials (#1355) by @cwing-nvidia
Consolidate benchmarks/prompts/ onto NeMo Skills' directory layout and naming (#1316) by @gwarmstrong
build: add root Makefile with Fern dev convenience targets (#1255) by @lbliii
Port codex skills to claude (#1369) by @jkyi-nvidia
fix(security): upgrade dependencies for CVE remediation (#1370) by @kajalj22
feat: Gracefully handle hangs in math verify calls (#1354) by @Kipok
Harden finance_sec_search resource server and agent for GRPO training (#1304) by @ushnish-de
fix(stirrup_agent): stage GDPVal reference files on Ray worker, not head (#1366) by @agronskiy
feat(stirrup_agent): per-task timeout + walltime-resilient failure routing (#1367) by @agronskiy
example dataset/pipeline for terminus judge (#1374) by @kbhardwaj-nvidia
feat: add environments/ and example_environments/ (#1324) by @cmunley1
fix(security): upgrade transformers 4.x → 5.8.1 (CVE remediation) (#1372) by @kajalj22
feat: claude code agent harness (#1336) by @cmunley1
docs: add Verification Patterns section and move LLM-as-Judge into it (#1364) by @cwing-nvidia
Revert OpenAI model override (#1380) by @bxyu-nvidia
ci: remove build-docs and build-test-publish-wheel workflows (#1293) by @ko3n1g
docs: point root README at Fern canonical URLs (#1325) by @lbliii
fix(stirrup_agent): reasoning fallback + surface tool-arg validation errors (#1397) by @agronskiy
feat: example multi turn gymnasium env (#1332) by @cmunley1
docs: fix agent_ref description to reference agent server instead of resources server (#1400) by @JOBEBOLDER
fix: wmt_translation - stage COMET Python mirror per-writer to avoid races (#1407) by @ananthsub
docs(fern): adopt NVIDIA global theme as source of truth (#1413) by @lbliii
docs: replace core-components with architecture page (#1323) by @cwing-nvidia
ci: validate release branch-rules (#1392) by @ko3n1g
chore: align CLAUDE.md with current docs, reduce drift (#1438) by @cwing-nvidia
docs: retire docs/ Sphinx tree, replace with pointer to fern/ (#1376) by @lbliii
docs: restore GitHub badges to README (#1443) by @cwing-nvidia
ci: add request-nvskills-ci workflow (#1441) by @ananthsub
ci: exclude .oms.sig from secrets-detector baseline (#1444) by @ananthsub
docs: add tip for FlashInfer JIT cache install for vLLM model server (#1405) by @ananthsub
docs: fix agent_ref description to reference agent server (#1418) by @ananthsub
fix: return in-scope-filtered agent configs from load_and_validate_server_instance_configs (#1410) by @ananthsub
feat: add '/' endpoint to SimpleServer and HeadServer (#1431) by @marta-sd
docs: fix fern docs issues from NVBugs 6193091 (#1393) by @lbliii
fix: skip multi-call assistant targets in chat and conversational converters for pivot datasets (#1409) by @ananthsub
fix: rename workplace assistant environment (#1462) by @cmunley1
Rename Turing VIF environment into VerifIF (#1470) by @odelalleau
fix: process cleanup in CLI / rdkit_chemistry / ns_tools / code_gen (#1406) by @ananthsub
Patch to CCC Environment Code and Docs (#1121) by @aleksficek
docs: fix dead core-components links and canonical /main URLs (#1480) by @cwing-nvidia
fix: process lifecycle for apptainer & nstools (#1507) by @ananthsub
docs: update verl pin (#1504) by @cmunley1
ci: Add community bot labeler (#1512) by @chtruong814
ci: Update community bot version to add token fix (#1516) by @chtruong814
docs: add v0.3.0 release notes (#1486) by @cwing-nvidia
docs: update RL framework compatibility table (#1523) by @ananthsub
feat: add ultra-v3 post-training environments and agent updates (#1529) by @ananthsub
fix: add example data + metrics for swe_pivot and inverse_if (#1530) by @ananthsub
fix: add example rollouts for swe_pivot/inverse_if and restore core coverage (#1531) by @ananthsub
docs(fern): add v0.3.0 version snapshot for GA release (#1521) by @kajalj22
chore: drop rc0 pre-release tag for 0.3.0 release by @kajalj22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0

Choose a tag to compare

Sorry, something went wrong.