[Question] Algorithm / Parameters for Ant Maze #193

meppe · 2023-12-10T12:58:17Z

Question

I tried hard to train an agent to solve any of the AntMaze environments. I tried the stable baselines 3 implementations of SAC (dense and sparse) and PPO, but could not solve even a small open AntMaze environment. I tried random goal and starting position and fixed positions. Has anyone successfully trained an agent one of the maze envs so far? If so what were the parameters and algorithm used, and which variant of the environment?

What happens a lot in my case is that the Ant flips upside-down. It cannot walk on its "elbows" in that position, so that it remains stuck.

Thanks a lot!
Manfred (Hamburg University of Technology, Germany)

Kallinteris-Andreas · 2023-12-20T07:31:12Z

Hey,

it sounds like your agent is incapable of locomotion, can you solve PointEnv?, can you locomote the Ant environment by itself?

@rodrigodelazcano may know which algorithms can solve it

meppe · 2023-12-21T09:05:01Z

Hi,
yes, PointMaze solves really well.

However, I have to add that I changed the reward a bit. I remember that SAC with sparse rewards and HER went really well with the Ant4Rooms environment as in the hierarchical actor-critic paper by Levy et al https://www.youtube.com/watch?v=TY3gr4SRmPk&ab_channel=AndrewLevy
This was originally for hierarchical actor-critic but went fine without hierarcical approaches. There, the reward was from -1 (goal not reached) to 0 (goal reached).

After I found that your original version of AntMaze with the original reward function did not work with SAC, no matter if sparse or dense rewards, I tried the reward [-1,0], but without improvement. Also, I played around with continuing_task and reset_target parameters, etc. no success in any configuration. I usually stop training after 500k steps, which should be more than sufficient.

My team's general goal is to add a working config for AntMaze to our Scilab-RL framework you can find here: https://scilab-rl.github.io/Scilab-RL/

Any hint is appreciated!
Best, Manfred!
M,

meppe · 2023-12-21T09:32:45Z

Oh, and yes, the locomotion itself works. At least, sometimes, and only until it flips over.
Here are some videos:
https://api.wandb.ai/links/manfred-eppe/avqvho1o

Kallinteris-Andreas · 2023-12-21T13:29:19Z

Hey, I am familiar with this trained locomotion behavior of the Ant model from the Gymnasium/MuJoCo/Ant environment (also happens to other quadrupeds robots, from my testing)

Where:

the Ant falls over
the Ant Jumps really high and fails to land back on its feet

This was addressed in Gymnasium/MuJoCo/Ant environment by adding the healthy_z_range behavior, which adds a negative reward and terminated the environment when the ant falls over or jumps really high.
https://gymnasium.farama.org/main/environments/mujoco/ant/#termination

You could try implementing that by creating a wrapper or forking the environment class.

Kallinteris-Andreas · 2023-12-21T15:19:17Z

You could also check minari's datasets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Algorithm / Parameters for Ant Maze #193

[Question] Algorithm / Parameters for Ant Maze #193

meppe commented Dec 10, 2023

Kallinteris-Andreas commented Dec 20, 2023

meppe commented Dec 21, 2023

meppe commented Dec 21, 2023

Kallinteris-Andreas commented Dec 21, 2023

Kallinteris-Andreas commented Dec 21, 2023

[Question] Algorithm / Parameters for Ant Maze #193

[Question] Algorithm / Parameters for Ant Maze #193

Comments

meppe commented Dec 10, 2023

Question

Kallinteris-Andreas commented Dec 20, 2023

meppe commented Dec 21, 2023

meppe commented Dec 21, 2023

Kallinteris-Andreas commented Dec 21, 2023

Kallinteris-Andreas commented Dec 21, 2023