Wrappers etc by anyasims · Pull Request #23 · axon-rl/gem

anyasims · 2025-06-16T22:40:25Z

Changes: (Sorry they're all in one pull request!)

TrackingEnvWrapper for tracking episode metrics eg. env step, cumulative reward, num call tools, etc (Useful for debugging e.g. see gem/eval/eval.py)
Refactored observation wrapper and added one that concatenates both obs and action but without the chat template. (This matches the ToRL format)
Edited python tool
Logic: Finds the first complete python call (if possible). Instead of the whole action being appended to the history, the python tool returns parsed action which is the action truncated at the end of the tool call and only this is added to the observation history.
Traceback: Added option include_traceback to control whether the python tool returns the traceback in errors. Default = False which aligns with ToRL format.
These changes above improve eval: ToRL and VerlTool models now both get 75% (eg. python -m eval.eval --env_name eval:MATH500 --model_name GAIR/ToRL-1.5B --num_episodes 20 --batch_size 5 --wrappers "'python_tool,concat_with_action,episode_tracking'")

anyasims · 2025-06-16T22:45:31Z

This also includes another attempt of async environment (since the previous asyncio implementation was not compatible with math_verify) but this still gives errors when called with the LLM, e.g uncomment last test and run python -m tests.test_tool.test_python_code_tool llm_episode --env_name eval:MATH500 --model_name Qwen/Qwen3-0.6B-Base >>

Traceback (most recent call last):
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/aiops/simsay/Documents/llm_tiny_ideas/agent-oat-outer/goat/gem/tests/test_tool/test_python_code_tool.py", line 202, in <module>
    fire.Fire(
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/aiops/simsay/Documents/llm_tiny_ideas/agent-oat-outer/goat/gem/tests/test_tool/test_python_code_tool.py", line 97, in test_episode
    run_and_print_episode(
  File "/home/aiops/simsay/Documents/llm_tiny_ideas/agent-oat-outer/goat/gem/gem/utils/debug.py", line 27, in run_and_print_episode
    next_obs, reward, terminated, truncated, _ = env.step(action)
  File "/home/aiops/simsay/Documents/llm_tiny_ideas/agent-oat-outer/goat/gem/gem/vector/async_vector_env.py", line 52, in step
    obs, reward, terminated, truncated, info = res.get(timeout=10)
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/multiprocessing/pool.py", line 540, in _handle_tasks
    put(task)
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: cannot pickle 'generator' object

Maybe we should look into math_verify and see if we can make it work though since even when using with the standard SyncVectorEnv there are errors when MathEnv calls math_verify, e.g.

E0616 22:39:58.289302 140654826985024 parser.py:699] Error parsing: 8282
[actor_0_0/0] Traceback (most recent call last):
[actor_0_0/0]   File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/site-packages/math_verify/parser.py", line 692, in parse
[actor_0_0/0]     return timeout(timeout_seconds=parsing_timeout)(extract_target_from_pred)(
[actor_0_0/0]   File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/site-packages/math_verify/utils.py", line 48, in wrapper
[actor_0_0/0]     signal.signal(signal.SIGALRM, handler)
[actor_0_0/0]   File "/home/aiops/simsay/miniconda3/envs/agent-oat/lib/python3.10/signal.py", line 56, in signal
[actor_0_0/0]     handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
[actor_0_0/0] ValueError: signal only works in main thread of the main interpreter

lkevinzc

LGTM!

anyasims added 7 commits June 16, 2025 20:58

tool_env_wrapper

20c59ec

edit assert print

c12f70a

episode tracking wrapper

befc607

eval with episode tracker

b5cac65

wrappers and eval

d49aa09

async

ac9a43d

old delete old async test

9f71ab6

anyasims added 7 commits June 17, 2025 07:02

test async with llm

fa7f321

cleaning

8b5b61e

fixed async math_verify error

301cdc8

make format

a1d0744

lowering mastermind-easy env steps. math_verify fix 2

b6b66cc

math verify fix 3

359a6da

handling timeout

09fcb39

lkevinzc approved these changes Jun 17, 2025

View reviewed changes

lkevinzc merged commit 49fa361 into main Jun 17, 2025

lkevinzc deleted the logging-wrapper branch June 17, 2025 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrappers etc#23

Wrappers etc#23
lkevinzc merged 14 commits intomainfrom
logging-wrapper

anyasims commented Jun 16, 2025 •

edited

Loading

Uh oh!

anyasims commented Jun 16, 2025

Uh oh!

lkevinzc left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anyasims commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anyasims commented Jun 16, 2025

Uh oh!

lkevinzc left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anyasims commented Jun 16, 2025 •

edited

Loading