Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard action space for DiscreteSpace #23

Open
darsnack opened this issue Apr 23, 2019 · 4 comments
Open

Standard action space for DiscreteSpace #23

darsnack opened this issue Apr 23, 2019 · 4 comments

Comments

@darsnack
Copy link
Member

darsnack commented Apr 23, 2019

Currently, the DiscreteSpace is defined as {1, ..., n} (as it should be), but the lines in CartPole.jl that map {1, 2} --> {-1, 1} are commented out. Additionally, the assertion is commented out. Is there a reason for this? Someone has already written the code to transfer the step! logic to a {1, ..., n} action space, so why aren't we using it?

If there is a reason, can we settle what the standard action space should be?

@tejank10
Copy link
Contributor

tejank10 commented Apr 28, 2019

Hi @darsnack ! If we have a discrete space then the environment is not differentiable. Because in discrete space, we extract the index and pass it to step!. Mapping {1, 2} --> {-1, 1} is just a hack we found for CartPole's action space, to turn it into a continuous one. But in long term, we would want to be able to use a Discrete space still keep it differentiable, or a hack to map {1, ..., n} --> some continuous space would also be helpful.

@darsnack
Copy link
Member Author

I think logically, a discrete to continuous mapping would be {1, ..., n} --> [1.0, n]. Beyond that, I think it is unique to each environment. For example, in CartPole, we would have the standard mapping {1, 2} --> [1.0, 2.0], then CartPole would calculate force = 2f0 * (continuous_action - 1f0) - 1f0. Is this along the lines you are thinking?

@tejank10
Copy link
Contributor

tejank10 commented Apr 29, 2019

Yeah right, it is dependent on environment. Ideally, I would like to keep an environment's discrete action space as it is and introduce a black box between model and step! that would take the gradient and pass it through the index from where the action value came.
The hack which you provided should also work. Continuous action space runs from -inf to inf. Negative and positive values are equally likely. Because of this it is suitable for Discrete space of size 2 to map to it. By mapping {1, 2} --> [1.0, 2.0], I assume we would shift origin to 1.5 such that anything below it is rounded to 1 and above it to 2.

@darsnack
Copy link
Member Author

Been thinking about this recently. Should we establish an experimental zygote branch that uses custom adjoints to implement differentiable DiscreteSpaces?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants