Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Environment API changes #32

Open
DavidSlayback opened this issue Aug 29, 2022 · 3 comments
Open

[Proposal] Environment API changes #32

DavidSlayback opened this issue Aug 29, 2022 · 3 comments

Comments

@DavidSlayback
Copy link
Contributor

Hey, first off, I love this project and the general idea of defining environments in JAX so that they can be easily batched and integrated into RL training loops!

I tend to do a lot of work with POMDPs and have built a few branches in my own fork that implement various POMDP environments. It works fine for my purposes, but I've run into a couple instances where I just ignore the base Environment API

Specifically:

  1. In typical POMDP formulations, the observation o is a function of state AND action (i.e., o ~ O(s,a)), but currently get_obs() only uses state
  2. Similarly, there are many instances where we need the state AND/OR action to determine if the episode is done (e.g., in the Tiger problem, it ends when you open a door, but the state is just which door the tiger is behind). is_terminal() and discount() only use state
  3. Finally, in the same way that your FourRooms example has noisy actions, there are many environments where a noisy observation is core to the environment, requiring get_obs to also have an RNG key

Obviously I'm just overriding the methods with the extra arguments as needed for my own environments, but some of this might be common enough to justify a different base API?

@RobertTLange
Copy link
Owner

Hi @DavidSlayback! Thank you so much for raising this, the suggestions and the kind words. And please excuse the late response. These are all good and valid points. What POMDP environments have you been working with? I am generally open to adapting the API and adding more (somewhat classic) environments to gymnax. It is a bit of fine line, since I also want to circumvent blowing up the package too much and each new env will require some testing against a numpy version. With regards to your three points:

  1. Yes, that sounds reasonable. It would require a small modification to each environment (e.g. adding an optional action input to get_obs and terminal. Alternatively, one could absorb the action into the state, but I think treating it as a separate input is cleaner.
  2. See 1.
  3. Sounds good to me as well. I am also planning on adding more general wrappers for sticky actions and termination handling at some point.

My hands are currently tied up with my internship, but feel free to open a PR! I would be happy to merge it in if all the unit tests pass. Also feel free to open PRs for your environments. I would love to see what you have been brewing up.

@DavidSlayback
Copy link
Contributor Author

Most of my POMDPs are the classic ones from POMDP literature as seen here. Things like Tiger, HeavenHell, RockSample, Hallway. I also have a few of my own like a multistory fourrooms variant (with various observation functions), partially-observable Taxi, some modifications of continuous control domains, etc...I'm specifically interested in ones that require extremely long-term memory and reasoning.

I'll definitely look at doing some PRs for some of the more "classic" ones then! I think an option for different observation functions for already-implemented environments provided on environment creation (like you already have in your FourRooms domain) could be a good way to expand some of these without expanding the repository too much

@carlosgmartin
Copy link

pgx and OpenSpiel put the observation, action mask, reward, discount, current player (for sequential games), etc. in the state. That corresponds to o = env.step(s, a, key).observation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants