Skip to content

Wrappers etc#23

Merged
lkevinzc merged 14 commits intomainfrom
logging-wrapper
Jun 17, 2025
Merged

Wrappers etc#23
lkevinzc merged 14 commits intomainfrom
logging-wrapper

Conversation

@anyasims
Copy link
Collaborator

@anyasims anyasims commented Jun 16, 2025

Changes: (Sorry they're all in one pull request!)

  • TrackingEnvWrapper for tracking episode metrics eg. env step, cumulative reward, num call tools, etc (Useful for debugging e.g. see gem/eval/eval.py)

  • Refactored observation wrapper and added one that concatenates both obs and action but without the chat template. (This matches the ToRL format)

  • Edited python tool

  • Logic: Finds the first complete python call (if possible). Instead of the whole action being appended to the history, the python tool returns parsed action which is the action truncated at the end of the tool call and only this is added to the observation history.

  • Traceback: Added option include_traceback to control whether the python tool returns the traceback in errors. Default = False which aligns with ToRL format.

  • These changes above improve eval: ToRL and VerlTool models now both get 75% (eg. python -m eval.eval --env_name eval:MATH500 --model_name GAIR/ToRL-1.5B --num_episodes 20 --batch_size 5 --wrappers "'python_tool,concat_with_action,episode_tracking'")

@anyasims
Copy link
Collaborator Author

This also includes another attempt of async environment (since the previous asyncio implementation was not compatible with math_verify) but this still gives errors when called with the LLM, e.g uncomment last test and run python -m tests.test_tool.test_python_code_tool llm_episode --env_name eval:MATH500 --model_name Qwen/Qwen3-0.6B-Base >>

Traceback (most recent call last):
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/aiops/simsay/Documents/llm_tiny_ideas/agent-oat-outer/goat/gem/tests/test_tool/test_python_code_tool.py", line 202, in <module>
    fire.Fire(
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/aiops/simsay/Documents/llm_tiny_ideas/agent-oat-outer/goat/gem/tests/test_tool/test_python_code_tool.py", line 97, in test_episode
    run_and_print_episode(
  File "/home/aiops/simsay/Documents/llm_tiny_ideas/agent-oat-outer/goat/gem/gem/utils/debug.py", line 27, in run_and_print_episode
    next_obs, reward, terminated, truncated, _ = env.step(action)
  File "/home/aiops/simsay/Documents/llm_tiny_ideas/agent-oat-outer/goat/gem/gem/vector/async_vector_env.py", line 52, in step
    obs, reward, terminated, truncated, info = res.get(timeout=10)
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/multiprocessing/pool.py", line 540, in _handle_tasks
    put(task)
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: cannot pickle 'generator' object

Maybe we should look into math_verify and see if we can make it work though since even when using with the standard SyncVectorEnv there are errors when MathEnv calls math_verify, e.g.

E0616 22:39:58.289302 140654826985024 parser.py:699] Error parsing: 8282
[actor_0_0/0] Traceback (most recent call last):
[actor_0_0/0]   File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/site-packages/math_verify/parser.py", line 692, in parse
[actor_0_0/0]     return timeout(timeout_seconds=parsing_timeout)(extract_target_from_pred)(
[actor_0_0/0]   File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/site-packages/math_verify/utils.py", line 48, in wrapper
[actor_0_0/0]     signal.signal(signal.SIGALRM, handler)
[actor_0_0/0]   File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/signal.py", line 56, in signal
[actor_0_0/0]     handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
[actor_0_0/0] ValueError: signal only works in main thread of the main interpreter

Copy link
Contributor

@lkevinzc lkevinzc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lkevinzc lkevinzc merged commit 49fa361 into main Jun 17, 2025
@lkevinzc lkevinzc deleted the logging-wrapper branch June 17, 2025 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants