fix: increase max_new_tokens to 2048 and make configurable via GRPOConfig by abrichr · Pull Request #62 · OpenAdaptAI/openadapt-ml

abrichr · 2026-03-23T18:50:12Z

Fixes zero-reward GRPO runs caused by 100-token limit truncating reasoning models.

The GRPO rollout prompt was missing the "Thought:" line and action history that the SFT training uses. Models fine-tuned via SFT output "Thought: ...\nAction: CLICK(...)" but the GRPO prompt didn't prompt for this format, causing verbose free-form output that couldn't be parsed → reward 0.0 → zero gradients. Changes: - Add "Thought:" and "Action:" prompt lines matching SFT format - Add action_history parameter for step context - Parser extracts action from "Action: ..." line before regex matching - Parser handles JSON format {"action_type": "click", "coordinate": [x,y]} - Debug logging of raw VLM output for zero-reward diagnosis Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The default of 100 tokens truncated reasoning models mid-thought, producing unparseable output → DONE → reward 0.0 → zero gradients. Caused 4 failed training runs (~20 GPU-hours wasted). - Add max_new_tokens to GRPOConfig (default 2048) - Use config value instead of hardcoded 100 - Add truncation warning when generation hits the limit Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

abrichr and others added 2 commits March 22, 2026 18:17

abrichr merged commit fecf461 into main Mar 23, 2026
0 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: increase max_new_tokens to 2048 and make configurable via GRPOConfig#62

fix: increase max_new_tokens to 2048 and make configurable via GRPOConfig#62
abrichr merged 2 commits intomainfrom
fix/max-new-tokens

abrichr commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abrichr commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant