[Bug Report] Maze/AntMaze issues #155

alexdavey · 2023-06-15T20:30:50Z

Hi, I believe I have found a couple of issues in the Maze/AntMaze environments. I have resolved both of these issues in commit 5573d5e, and I’m happy to submit a PR.

1) AntMaze sparse reward always zero

For continuing tasks in AntMazeEnv, the sparse reward is always zero. This is because at each step, .compute_terminated() resets the goal when the Ant is sufficiently close, before the reward is calculated.

Here’s a code example to test this:

import numpy as np
import gymnasium as gym

env = gym.make("AntMaze_UMaze-v3", continuing_task=True, reward_type="sparse")

total_reward = 0

# 1000 "episodes" of 100 steps
for _ in range(1000):
    env.reset()

    for _ in range(100):
        action = env.action_space.sample()
        _, rew, _, _, _ = env.step(action)
        total_reward += rew

print(total_reward)

Currently this returns exactly zero since no reward is collected, but placing .compute_reward() above .compute_terminated() gives a non-zero reward. In PointMaze, this issue was fixed in commit ace181e.

2) AntMaze can reset into a terminal state

The Ant will sometimes start within the goal radius. This is because there is a maze_size_scaling factor missing in the distance check in MazeEnv.generate_reset_pos().

In AntMaze maze_size_scaling = 4, so the xy position noise can be up to 1.0 in each direction. An unfortunate combination of goal noise and reset noise can cause the Ant to start within 0.45 of self.goal. This issue does not affect PointMaze because there maze_size_scaling = 1.

Here’s a code example to test this:

import numpy as np
import gymnasium as gym

env = gym.make("AntMaze_UMaze-v3", continuing_task=True)

for _ in range(1000):
    obs, _ = env.reset()
    dist = np.linalg.norm(obs["achieved_goal"] - obs["desired_goal"])
    assert dist > 0.45

Checklist

I have checked that there is no similar issue in the repo (required)

The text was updated successfully, but these errors were encountered:

Kallinteris-Andreas · 2023-06-16T04:59:12Z

Your test for sparse rewards, is not correct, since using random actions is very unlikely to get you to the terminal state

Does this also affect the PointEnv?

alexdavey · 2023-06-16T12:35:54Z

Thanks for taking a look at this issue.

I agree that it is not a robust test in general, it’s primarily just a check to compare pre/post commit (after the suggested changes, it does get a very small but non-zero reward). If you have suggestions for a more robust test I can implement that, but I think the issue is fairly clear from the step method of AntMaze:

Gymnasium-Robotics/gymnasium_robotics/envs/maze/ant_maze.py

Lines 277 to 289 in d0a10dc

    
           def step(self, action): 
        
               ant_obs, _, _, _, info = self.ant_env.step(action) 
        
               obs = self._get_obs(ant_obs) 
        
               terminated = self.compute_terminated(obs["achieved_goal"], self.goal, info) 
        
               truncated = self.compute_truncated(obs["achieved_goal"], self.goal, info) 
        
               reward = self.compute_reward(obs["achieved_goal"], self.goal, info) 
        
               if self.render_mode == "human": 
        
                   self.render() 
        
               return obs, reward, terminated, truncated, info

and comparing to the PointMaze step:

Gymnasium-Robotics/gymnasium_robotics/envs/maze/point_maze.py

Lines 379 to 391 in d0a10dc

    
           def step(self, action): 
        
               obs, _, _, _, info = self.point_env.step(action) 
        
               obs_dict = self._get_obs(obs) 
        
               info["success"] = bool( 
        
                   np.linalg.norm(obs_dict["achieved_goal"] - self.goal) <= 0.45 
        
               ) 
        
               reward = self.compute_reward(obs_dict["achieved_goal"], self.goal, info) 
        
               terminated = self.compute_terminated(obs_dict["achieved_goal"], self.goal, info) 
        
               truncated = self.compute_truncated(obs_dict["achieved_goal"], self.goal, info) 
        
               return obs_dict, reward, terminated, truncated, info

.compute_terminated() is common to both and resets the goal location if within 0.45.

This issue previously existed for PointMaze. I’m proposing to make the same fix as commit ace181e, but for AntMaze.

The PointEnv/PointMaze env’s are not affected by either of these issues, due to the commit referenced above, and because maze_size_scaling = 1 so the missing factor does not change anything.

Kallinteris-Andreas · 2023-06-16T13:44:33Z

I would say it is worrying that compute_terminated is not a pure function, it should be refactored into 2 functions: compute_terminated (but now it just computes if it is terminated), and update_goal

These changes would requite a new revision ("AntMaze_UMaze-v4")

@rodrigodelazcano

Kallinteris-Andreas · 2023-06-16T14:09:47Z

@alexdavey feel free to make a PR

alexdavey mentioned this issue Jun 20, 2023

Maze/AntMaze issues #156

Merged

10 tasks

rodrigodelazcano closed this as completed in #156 Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] Maze/AntMaze issues #155

[Bug Report] Maze/AntMaze issues #155

alexdavey commented Jun 15, 2023

Kallinteris-Andreas commented Jun 16, 2023

alexdavey commented Jun 16, 2023

Kallinteris-Andreas commented Jun 16, 2023 •

edited

Kallinteris-Andreas commented Jun 16, 2023

[Bug Report] Maze/AntMaze issues #155

[Bug Report] Maze/AntMaze issues #155

Comments

alexdavey commented Jun 15, 2023

1) AntMaze sparse reward always zero

2) AntMaze can reset into a terminal state

Checklist

Kallinteris-Andreas commented Jun 16, 2023

alexdavey commented Jun 16, 2023

Kallinteris-Andreas commented Jun 16, 2023 • edited

Kallinteris-Andreas commented Jun 16, 2023

Kallinteris-Andreas commented Jun 16, 2023 •

edited