Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add Maze-like env #2

Open
carlosluis opened this issue Dec 5, 2023 · 6 comments
Open

[Feature Request] Add Maze-like env #2

carlosluis opened this issue Dec 5, 2023 · 6 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@carlosluis
Copy link

carlosluis commented Dec 5, 2023

Hi!

Awesome job on the repo!

Feel free to ignore this request if it's not part of your roadmap. It's more of a suggestion to have other type of exploration tasks.

There's partial code on Farama-Foundation/Minigrid#317 to generate feasible mazes (with a unique direct path to the goal, I believe) based off mini-grid envs. Taking that code I was able to generate envs such as these:

I thought it might do for an interesting meta-RL exploration benchmark, i.e., can your algorithm learn to exhaustively explore the maze until it finds the goal? In principle it might not be that much different than exploring in an open-space grid, but who knows! Maybe the more constrained state-space might even accelerate (or slow down) training progress.

Cheers!

@Howuhh Howuhh added enhancement New feature or request good first issue Good for newcomers labels Dec 5, 2023
@Howuhh
Copy link
Collaborator

Howuhh commented Dec 5, 2023

Hi @carlosluis! This is actually a very important suggestion and we plan to add procedural generation in some form sooner or later anyway. However, in our experience (and this is actually one of the reasons why this is still not there) procedural map generation is quite difficult to represent in an efficient and jit-compatible way (like recursive maze generation algos). There are some successful examples tho, for example in Jumanji or in minimax.

We're unfortunately unlikely to be doing this anytime soon (it's in the plans for post v1.0, ~2-3 months), as we're currently busy working on getting XLand-MiniGrid to full paper and focused on meta-RL part (benchmarks), but we welcome any contributions, as grid randomization will definitely add new challenges to the meta-learning, as well as would allow to port procedural multi-room envs from original MiniGrid. Thus, its highly valuable addition.

P.S. Maze exploration alone is not a meta-RL problem I think, since a new maze can be solved zero-shot without the need for adaptation, only generalization (like ProcGen).

@alexunderch
Copy link

Maybe, it is worth trying to add any simple procedural generation algorithm to test the concept, maybe it would be not that hard. Jax could be paired with recursive algorithms (for tree-search, for example), and some simple example could be a way to start.

Sounds promising 🤗

@Howuhh
Copy link
Collaborator

Howuhh commented Dec 5, 2023

There's another problem at the moment. The agent can see through walls 🥲! Unfortunately, the naive porting of the FOV algorithm from MiniGrid slows things down too much (although it is available in the current version, but disabled). We haven't come up with a replacement for it yet, although we've tried different things (like simple ray casting). Without it I think maze will be easy enough to solve. We are open to any suggestions/help on this! For now we just reduce FOV size in most cases to make it a bit harder.

@carlosluis
Copy link
Author

Thank you all for having a look at this so quickly!

I understand the challenges of jitting procedural generation algos, but why not start simple and take maze-generation outside of the jitting? Basically pre-generate a bunch of mazes on initialization and then sample from this list whenever the meta-RL algorithm asks for a new task? Maybe I'm being naive here and missing a key detail of why this wouldn't work.

@Howuhh re: why maze exploration may or may not be a good benchmark for meta-RL
I agree with you that normally this is a test for generalization rather than adaptation, but at the same time the line between generalization and adaptation can be quite fuzzy. You are right that a new maze requires no adaptation, but it requires to follow a "good" exploration strategy, i.e., exhaustively search for the goal location. Once the goal is found, you have completely identified the task and from there on the agent should find the shortest path towards the now known goal location. Then this benchmark would test the capability of meta-RL algorithms to learn this exploration behavior during meta-training under very sparse rewards, i.e., you meta-learn the generalization. From this perspective, I do see value on testing meta-RL in such benchmarks!

Happy to hear your thoughts and arguments here though, I think it's an interesting discussion without a clear right/wrong answer.

@Howuhh
Copy link
Collaborator

Howuhh commented Dec 6, 2023

Maybe I'm being naive here and missing a key detail of why this wouldn't work.

There are actually two reasons why I didn't already done this: the first is the inconvenience of having to store and download the maps separately in addition to the benchmarks, and the second is that a million maps in unit8 can start to take up a lot of memory on the GPU (height x width x 2 x 8bits x 1M ~ at least 0.5GB). This is actually quite a lot, as GPU memory is highly valuable. We can store them on CPU tho, but additional FPS benchmarks is needed for this case, maybe overhead is low..

But it's probably the only way. I'll see if I can get it done in time besides the main roadmap.

@carlosluis
Copy link
Author

Maybe I'm being naive here and missing a key detail of why this wouldn't work.

There are actually two reasons why I didn't already done this: the first is the inconvenience of having to store and download the maps separately in addition to the benchmarks, and the second is that a million maps in unit8 can start to take up a lot of memory on the GPU (height x width x 2 x 8bits x 1M ~ at least 0.5GB). This is actually quite a lot, as GPU memory is highly valuable. We can store them on CPU tho, but additional FPS benchmarks is needed for this case, maybe overhead is low..

But it's probably the only way. I'll see if I can get it done in time besides the main roadmap.

I see, that makes it inconvenient, I agree! Also an appropriate sample size would depend on the size of the maze. Maybe 1M maps is overkill for 10x10 mazes, but insufficient for 100x100 mazes. Hard to tell a priori what would be a good value. Although I believe you can get a lot of signal regarding the effectiveness of meta-RL exploration with relatively small mazes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants