Hi, I believe I have found a couple of issues in the Maze/AntMaze environments. I have resolved both of these issues in commit 5573d5e, and I’m happy to submit a PR.
1) AntMaze sparse reward always zero
For continuing tasks in AntMazeEnv, the sparse reward is always zero. This is because at each step, .compute_terminated() resets the goal when the Ant is sufficiently close, before the reward is calculated.
Here’s a code example to test this:
import numpy as np
import gymnasium as gym
env = gym.make("AntMaze_UMaze-v3", continuing_task=True, reward_type="sparse")
total_reward = 0
# 1000 "episodes" of 100 steps
for _ in range(1000):
env.reset()
for _ in range(100):
action = env.action_space.sample()
_, rew, _, _, _ = env.step(action)
total_reward += rew
print(total_reward)
Currently this returns exactly zero since no reward is collected, but placing .compute_reward() above .compute_terminated() gives a non-zero reward. In PointMaze, this issue was fixed in commit ace181e.
2) AntMaze can reset into a terminal state
The Ant will sometimes start within the goal radius. This is because there is a maze_size_scaling factor missing in the distance check in MazeEnv.generate_reset_pos().
In AntMaze maze_size_scaling = 4, so the xy position noise can be up to 1.0 in each direction. An unfortunate combination of goal noise and reset noise can cause the Ant to start within 0.45 of self.goal. This issue does not affect PointMaze because there maze_size_scaling = 1.
Here’s a code example to test this:
import numpy as np
import gymnasium as gym
env = gym.make("AntMaze_UMaze-v3", continuing_task=True)
for _ in range(1000):
obs, _ = env.reset()
dist = np.linalg.norm(obs["achieved_goal"] - obs["desired_goal"])
assert dist > 0.45
Checklist
Hi, I believe I have found a couple of issues in the Maze/AntMaze environments. I have resolved both of these issues in commit 5573d5e, and I’m happy to submit a PR.
1) AntMaze sparse reward always zero
For continuing tasks in AntMazeEnv, the sparse reward is always zero. This is because at each step,
.compute_terminated()resets the goal when the Ant is sufficiently close, before the reward is calculated.Here’s a code example to test this:
Currently this returns exactly zero since no reward is collected, but placing
.compute_reward()above.compute_terminated()gives a non-zero reward. In PointMaze, this issue was fixed in commit ace181e.2) AntMaze can reset into a terminal state
The Ant will sometimes start within the goal radius. This is because there is a
maze_size_scalingfactor missing in the distance check inMazeEnv.generate_reset_pos().In AntMaze
maze_size_scaling = 4, so the xy position noise can be up to 1.0 in each direction. An unfortunate combination of goal noise and reset noise can cause the Ant to start within 0.45 ofself.goal. This issue does not affect PointMaze because theremaze_size_scaling = 1.Here’s a code example to test this:
Checklist