Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] Maze/AntMaze issues #155

Closed
1 task done
alexdavey opened this issue Jun 15, 2023 · 4 comments · Fixed by #156
Closed
1 task done

[Bug Report] Maze/AntMaze issues #155

alexdavey opened this issue Jun 15, 2023 · 4 comments · Fixed by #156

Comments

@alexdavey
Copy link
Contributor

Hi, I believe I have found a couple of issues in the Maze/AntMaze environments. I have resolved both of these issues in commit 5573d5e, and I’m happy to submit a PR.

1) AntMaze sparse reward always zero

For continuing tasks in AntMazeEnv, the sparse reward is always zero. This is because at each step, .compute_terminated() resets the goal when the Ant is sufficiently close, before the reward is calculated.

Here’s a code example to test this:

import numpy as np
import gymnasium as gym

env = gym.make("AntMaze_UMaze-v3", continuing_task=True, reward_type="sparse")

total_reward = 0

# 1000 "episodes" of 100 steps
for _ in range(1000):
    env.reset()

    for _ in range(100):
        action = env.action_space.sample()
        _, rew, _, _, _ = env.step(action)
        total_reward += rew

print(total_reward)

Currently this returns exactly zero since no reward is collected, but placing .compute_reward() above .compute_terminated() gives a non-zero reward. In PointMaze, this issue was fixed in commit ace181e.

2) AntMaze can reset into a terminal state

The Ant will sometimes start within the goal radius. This is because there is a maze_size_scaling factor missing in the distance check in MazeEnv.generate_reset_pos().

In AntMaze maze_size_scaling = 4, so the xy position noise can be up to 1.0 in each direction. An unfortunate combination of goal noise and reset noise can cause the Ant to start within 0.45 of self.goal. This issue does not affect PointMaze because there maze_size_scaling = 1.

Here’s a code example to test this:

import numpy as np
import gymnasium as gym

env = gym.make("AntMaze_UMaze-v3", continuing_task=True)

for _ in range(1000):
    obs, _ = env.reset()
    dist = np.linalg.norm(obs["achieved_goal"] - obs["desired_goal"])
    assert dist > 0.45

Checklist

  • I have checked that there is no similar issue in the repo (required)
@Kallinteris-Andreas
Copy link
Collaborator

Your test for sparse rewards, is not correct, since using random actions is very unlikely to get you to the terminal state

Does this also affect the PointEnv?

@alexdavey
Copy link
Contributor Author

Thanks for taking a look at this issue.

  1. I agree that it is not a robust test in general, it’s primarily just a check to compare pre/post commit (after the suggested changes, it does get a very small but non-zero reward). If you have suggestions for a more robust test I can implement that, but I think the issue is fairly clear from the step method of AntMaze:

def step(self, action):
ant_obs, _, _, _, info = self.ant_env.step(action)
obs = self._get_obs(ant_obs)
terminated = self.compute_terminated(obs["achieved_goal"], self.goal, info)
truncated = self.compute_truncated(obs["achieved_goal"], self.goal, info)
reward = self.compute_reward(obs["achieved_goal"], self.goal, info)
if self.render_mode == "human":
self.render()
return obs, reward, terminated, truncated, info

and comparing to the PointMaze step:

def step(self, action):
obs, _, _, _, info = self.point_env.step(action)
obs_dict = self._get_obs(obs)
info["success"] = bool(
np.linalg.norm(obs_dict["achieved_goal"] - self.goal) <= 0.45
)
reward = self.compute_reward(obs_dict["achieved_goal"], self.goal, info)
terminated = self.compute_terminated(obs_dict["achieved_goal"], self.goal, info)
truncated = self.compute_truncated(obs_dict["achieved_goal"], self.goal, info)
return obs_dict, reward, terminated, truncated, info

.compute_terminated() is common to both and resets the goal location if within 0.45.

This issue previously existed for PointMaze. I’m proposing to make the same fix as commit ace181e, but for AntMaze.

  1. The PointEnv/PointMaze env’s are not affected by either of these issues, due to the commit referenced above, and because maze_size_scaling = 1 so the missing factor does not change anything.

@Kallinteris-Andreas
Copy link
Collaborator

Kallinteris-Andreas commented Jun 16, 2023

  1. I would say it is worrying that compute_terminated is not a pure function, it should be refactored into 2 functions: compute_terminated (but now it just computes if it is terminated), and update_goal

These changes would requite a new revision ("AntMaze_UMaze-v4")

@rodrigodelazcano

@Kallinteris-Andreas
Copy link
Collaborator

@alexdavey feel free to make a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants