Skip to content

fix: critical TRL trainer bugs — wrong prompt, ignored task_ids, DSL parsing#236

Merged
abrichr merged 2 commits into
mainfrom
fix/trl-critical-bugs
Mar 29, 2026
Merged

fix: critical TRL trainer bugs — wrong prompt, ignored task_ids, DSL parsing#236
abrichr merged 2 commits into
mainfrom
fix/trl-critical-bugs

Conversation

@abrichr
Copy link
Copy Markdown
Member

@abrichr abrichr commented Mar 29, 2026

Summary

Three critical bugs reported from client testing the TRL path on a live GPU:

  • Garbage output (Thought: # # # # # # #): TRL used a JSON system prompt but the model (Qwen2.5-VL-7B) was SFT'd on the DSL format. Now imports SYSTEM_PROMPT from the standalone trainer so both paths use the identical prompt.
  • task_ids ignored: trl_wrapper loaded ALL tasks from task_dir, ignoring TrainingConfig.task_ids. Now filters by task_ids when specified.
  • DSL parsing missing: parse_action_json only handled JSON, but constrained decoding produces DSL (CLICK(x=0.5, y=0.3)). Now falls back to DSL regex with fractional coordinate preservation.

Root cause analysis

The standalone trainer produces valid output because it uses SYSTEM_PROMPT from prompt.py — the exact same prompt format the base model was SFT'd on. The TRL wrapper used a completely different JSON-format prompt that the model had never seen during training. This is equivalent to asking someone a question in a language they don't speak.

Test plan

  • All 27 TRL-specific tests pass
  • Full CI suite: 1452 passed, 50 skipped
  • parse_action_json handles JSON input (existing tests)
  • parse_action_json handles DSL input (falls through to regex)
  • Client should re-test TRL path with this fix

🤖 Generated with Claude Code

abrichr and others added 2 commits March 29, 2026 13:12
…parsing

Three bugs reported from client testing the TRL path:

1. Garbage output: TRL used a JSON system prompt but the model was SFT'd
   on DSL format (Thought/Action). Now imports SYSTEM_PROMPT from the
   standalone trainer so both paths use the identical prompt.

2. task_ids ignored: trl_wrapper loaded ALL tasks from task_dir into the
   TRL dataset, ignoring TrainingConfig.task_ids. Now filters task_configs
   by task_ids when specified (matching by id or name).

3. parse_action_json only handled JSON: constrained decoding produces DSL
   (CLICK(x=0.5, y=0.3)), but the parser only tried JSON. Now falls back
   to DSL regex parsing, keeping fractional coordinates for pixel_action.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lth check

- Add system_prompt parameter to make_waa_rollout_func (default = DSL
  prompt from standalone trainer). Users can override if they SFT on
  a different format.
- Log the system prompt at startup for debugging.
- Make Outlines failure loud: ImportError raises instead of silent
  fallback. Other failures log CRITICAL warning.
- Fix health check to skip mock adapters (unittest.mock.MagicMock).
- Fix test mocks to accept **kwargs for stuck_threshold.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@abrichr abrichr merged commit de515b8 into main Mar 29, 2026
1 check passed
abrichr added a commit that referenced this pull request Mar 29, 2026
- Add openadapt-types>=0.1.0 to core dependencies (canonical action
  schema for the OpenAdapt ecosystem — Pydantic v2, lightweight)
- Add _AgentOutput Pydantic model for future Outlines JSON schema
  constrained decoding (currently unused — default is DSL regex)
- Does NOT change the system prompt (DSL format, matching #236 fix)

The _AgentOutput model enables switching to outlines.json(model, schema)
once models are SFT'd on JSON format. For now, DSL regex remains default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
abrichr added a commit that referenced this pull request Mar 29, 2026
- Add openadapt-types>=0.1.0 to core dependencies (canonical action
  schema for the OpenAdapt ecosystem — Pydantic v2, lightweight)
- Add _AgentOutput Pydantic model for future Outlines JSON schema
  constrained decoding (currently unused — default is DSL regex)
- Does NOT change the system prompt (DSL format, matching #236 fix)

The _AgentOutput model enables switching to outlines.json(model, schema)
once models are SFT'd on JSON format. For now, DSL regex remains default.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant