-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] action_space as gym.spaces.Dict is supported? Is it going to be supported? #731
Comments
Hello,
It is not planned for now but you can have a look at projects that implement it. Duplicate of #527 You can probably take a look at https://github.com/minerllabs/basalt_competition_baseline_submissions/blob/master/basalt_utils/src/basalt_utils/sb3_compat/distributions.py#L9 |
Hello Antonin. A number of environments implemented in Godot RL Agents include independant discrete and continuous action spaces. This is represented as a Gym Dict with nested Box and Discrete spaces. I was curious if you have any ideas about how to handle this use case? Thanks |
As I wrote, you can take a look at https://github.com/minerllabs/basalt_competition_baseline_submissions/blob/master/basalt_utils/src/basalt_utils/sb3_compat/distributions.py#L9 to combine different distributions. If you have only discrete and continuous, you could also hack it quickly by transforming the discrete actions to continuous ones (or the other way around, discretize the continuous actions). |
I think that this adaptation should be taken into account at some point. Sometimes you have to choose an entity (Discrete) to perform an action (Binary, Discrete, Continous, ...). The clearest example is the Deepmind's agent to beat Dota2. Of course you can make the combinaroty of some of those actions a MultiDiscrete action space, but now I have an environment that making that combinatory makes a Discrete environment of 145k actions instead of using a Dict action space. RLLib has this implemented and, as you said, there are some implementations available. Is there any strong reason for this not being at the roadmap? I think that the library could be much more versatile with this. |
Question
I have seen that little by little the algorithms are adapting to be able to receive a state of the dictionary type as specified here, but I have not seen any reference to algorithms that can use this same of strategies for the case of use of the space of the actions. Is it possible to do this with SB3 or do you plan to address work in this direction?
Additional context
I am trying to assess SB3 as a good option to address the grid2op environments of l2rpn competition. Such type of environments, have a functionality to be translated to a GYM environment, but both the action space and the observation space are dictionary type. More information can be found here.
Checklist
The text was updated successfully, but these errors were encountered: