Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Flatten Discrete box potentially problematic #102

Closed
rusu24edward opened this issue Oct 31, 2022 · 17 comments · Fixed by #164
Closed

[Question] Flatten Discrete box potentially problematic #102

rusu24edward opened this issue Oct 31, 2022 · 17 comments · Fixed by #164
Labels
question Further information is requested

Comments

@rusu24edward
Copy link
Contributor

Question

Flattening Discrete space to Box space may be problematic. The flatten wrapper converts Discrete to Box as a one-hot encoding. Suppose the original space is Discrete(3), then:

0 maps to [1, 0, 0]
1 maps to [0, 1, 0]
3 maps to [0, 0, 1]

When we sample the action space for random actions, it samples the Box, which can produce any of the eight combination of 0s and 1s in a three-element array, namely:

[0, 0, 0],
[0, 0, 1], *
[0, 1, 0], *
[0, 1, 1],
[1, 0, 0], *
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]

Only three of these eight that I’ve starred are useable in the strict sense of the mapping. The unflatten function for a Discrete space uses np.nonzero(x)[0][0], and here’s at table of what the above arrays map to:

+ ------------------ + ---------------- + --------------------------------------------- +
| In Flattened Space | np.nonzero(x)[0] | np.nonzero(x)[0][0] (aka discrete equivalent) |
+ ------------------ + ---------------- + --------------------------------------------- +
| 0, 0, 0            | Error            | Error                                         |
| 0, 0, 1            | [2]              | 2                                             |
| 0, 1, 0            | [1]              | 1                                             |
| 0, 1, 1            | [1, 2]           | 1                                             |
| 1, 0, 0            | [0]              | 0                                             |
| 1, 0, 1            | [0, 2]           | 0                                             |
| 1, 1, 0            | [0, 1]           | 0                                             |
| 1, 1, 1            | [0, 1, 2]        | 0                                             |
+ ------------------ + ---------------- + --------------------------------------------- +

Implications

Obviously, [0, 0, 0] will fail because there is no nonzero.
Importantly, only one eighth of the random samples will map to 2. One fourth will map to 1, and one half will map to 0. This has some important implications on exploration, especially if action 2 is the “correct action” throughout much of the simulation. I'm very curious why I have not seen this come up before. This type of skewing in the random sampling can have major implications in the way the algorithm explores and learns, and the problem is exacerbated when Discrete(n), n is large. Am I missing something here?

Solution

This is unique to Discrete spaces. Instead of mapping to a one-hot encoding we could just map to a box of a single element with the appropriate range. Discrete(n) maps to Box(0, n-1, (1,), int) instead of Box(0, 1, (n,), int).

@pseudo-rnd-thoughts
Copy link
Member

Thanks for moving the PR to Gymnasium.
Could you explain the context of the

My initial ideas for solving this was changing the flattened Discrete type to MultiBinary however has the same issue.

Your solution doesn't seem to change anything, just changing Discrete to Box.

There are two options to solve this

  1. Add a new space however it is not obvious how this space would be generally used except in this particular case
  2. Not guarantee flatten -> sample -> unflatten

Do you have any other ideas?

@rusu24edward
Copy link
Contributor Author

As for context, many RL algorithms utilize randomness in their process, such as filling a buffer with data based on random actions at the beginning of training, or taking random actions to explore. Some implementations use action_space.sample() to generate these random actions. If the space has been flattened via the flatten wrapper, then the random sampling will occur in the flattened space, and the unflatten process will have the results I explain above.

To clarify, my solution is not to just change Discrete to Box. This is what flatten already does. My suggestion is to set up the Box differently. Right now, its set up like so:

Discrete(n) -> Box(0, 1, (n,))
sample() -> [0, 1, 0, 0, 1, 1, ...] # n-element vector, not one-hot encoded

I suggest change it to this

Discrete(n) -> Box(0, n, (1,))
sample() -> [n-1] # single-element vector, value between 0 and n-1

Why do this at all instead of just keeping it as a Discrete space? Frankly, I think that's a great question, and it's not clear to me why the current implementation flattens to a n-element array instead of using Discrete as is. It seems that he purpose is to represent it as a one-hot encoded vector, and if so, then that needs to be enforced is the resulting space. Otherwise, flattening Discrete should not do anything. I think this argument extends to MultiDiscrete and MultiBinary spaces.

As you say, it may not make sense to have a new space for a Flattened Discrete (and MultiDiscrete) space since it's only used in this one case. It may be okay not to guarantee that sampling from the flattened space is "path-independent", but does present a "gotcha" that some developers (such as myself) may fall into when designing our RL frameworks.

@pseudo-rnd-thoughts
Copy link
Member

My suggestion would be to require user to flatten their environments such that observations / actions are np.ndarray / int.
Then environment that are already np.ndarray would not need to be flattened.
I think this should avoid your issue

@rusu24edward
Copy link
Contributor Author

Mark, thank you for the thought you have given to this ticket so far. For the software I develop, I take the mindset that anything I provide to my users should work as expected. When something does not work as expected, I see it as either a bug or an opportunity to clarify expectations. I believe the question that I have raised reveals a potential flaw in the design of gym's spaces. Based on the language used in the flatten wrapper, it seems that flattening a discrete (and also multidiscrete) space should result in a one-hot-encoded vector, but the one-hot-encoding is not enforced in the way that object is then used (since it has just been turned into a Box). Personally, I would classify this as a bug, and I would either fix it with some clever redesign or update the documentation to make it abundantly clear that the flatten wrapper is limited. Telling my users to pre-flatten their own spaces is not really an option for me. I like all the gym spaces, I want to support all of them, and I don't want to burden my users with implementation details that makes their environments messier.

This discussion has helped inform me on how I should change my own flatten wrapper to work with gym spaces more cleanly. I hope that the gymnasium team can see it as an opportunity to look at the gym spaces from a new perspective and make changes as needed.

@pseudo-rnd-thoughts
Copy link
Member

Thanks for the response. I could imagine adding a OneHot space however it feels inefficient to add.
Thinking about this, I can't see an elegant solution that both fixes the problem and allows backward compatibility.

The best idea I have is Discrete -> MultiBinary and MultiDiscrete -> MultiBinary where MultiBinary has a new parameter, one_hot that enforces one hot sampling.
There is a possible issue with multi-axis MultiDiscrete spaces that we would need to solve, we would just need to follow a similar method to the current flatten function.

What do you think of this solution? If you are happy with it and @RedTachyon agrees (he might take a couple of days due to being at a conference currently). We can make a PR and include the change in v27

@rusu24edward
Copy link
Contributor Author

This sounds good to me. You may need to flatten to Box instead of MultiBinary in order to handle nested spaces. In fact, one could make the argument that Discrete itself really is just MultiBinary with one-hot encoding turned on.

@RedTachyon
Copy link
Member

I'm not yet completely sure what we want to do here. I'd definitely be opposed to adding an extra space just for the right flattening, unless we completely change our philosophy with regards to the spaces.

If this one change makes it so that we can always ensure the reversibility of flatten -> unflatten, then it'd be tempting.

At the same time, the whole idea of flatten and unflatten is not necessarily well though-out, since it was introduced in some OpenAI code, and then received various updates which might or might not have followed the original intentions.

My interpretation of this functionality is more or less "I don't care what this fancy space is, I want a fixed sized float32 vector for my NN". In which case any reversibility fails due to the dtypes. But we don't actually do any automatic type conversion.

I also feel like flattening into one-hot is much better on the algorithmic side. If you want to "flatten" a discrete action, you definitely don't want to keep the ordinal structure -- i.e. action 1 isn't between action 0 and action 2 in any meaningful sense, so a one-hot embedding is likely to work much better.

@pseudo-rnd-thoughts
Copy link
Member

@rusu24edward Would you be able to make a test that checks the basic idea of flatten a space -> sample from flattened space -> unflatten sample -> unflattened space is contained within the original space.
If the only spaces that fail this currently is Discrete and MultiDiscrete (composite spaces might also fail due to using Discrete) then could you implement the proposed change that I made?

@rusu24edward
Copy link
Contributor Author

In which case any reversibility fails due to the dtypes. But we don't actually do any automatic type conversion.

I don't understand this, can you say a bit more about it?

@pseudo-rnd-thoughts I will take a look

@rusu24edward
Copy link
Contributor Author

Some issues right off the bat:

  1. Flattening MD and D to MB won't play nicely with a nested space. The end result would be a Box.
  2. So we should use a Box with a one-hot flag. However, this doesn't make sense for Boxes generically because there is no notion of a "null value" for the Box. You can have a box that spans from -8 to -2, and in that case, a one-hot encoding would set the unchosen elements to...?

My interpretation of this functionality is more or less "I don't care what this fancy space is, I want a fixed sized float32 vector for my NN".

Totally agree, and yet, is this even tenable? It's curious that this issue has not come up sooner in gym's history. Several RL libraries make heavy use of gym's spaces; perhaps they're just sampling more intelligently? I bring this up because I am curious to know how many libraries make use of gym's flatten. I asked RLlib specifically to see what they did. This issue should appear anytime you unflatten the action output from a NN. Are users designing their NN with a softmax on some part of the output? Is anyone actually using the flatten an unflatten features? How are they getting around this problem?

I'm posting these questions since I think y'all would have a better idea of the context for flatten since you engage with the users regularly.

@RedTachyon
Copy link
Member

I don't understand this, can you say a bit more about it?

Let's say you have an unknown space. It might be Discrete(2), it might be Dict(Discrete, Box, Dict(Tuple(Box, Box), Discrete, Box)). You don't care what the internal structure is, you just want to flatten it and feed it into a neural network - which is the case in like 90% of RL. To do this, after flattening you need to cast it to a common dtype, which in 99% of cases will be float32.

The moment you cast to float32, you lose reversibility if you try to do flatten+cast -> sample -> unflatten, because if you do this with Discrete(2) and sample 0.5, what do you "unflatten" it to?

This issue should appear anytime you unflatten the action output from a NN.

I was not aware that this is something that people would do, probably mainly because complex action spaces that would require it are rather rare. And if you use a space like this, you're probably better off writing your own utilities and definitely not use sample

@rusu24edward
Copy link
Contributor Author

Good point, definitely complicates the situation...

It seems like flatten was not meant to be followed up with a sample. One should sample the space, then flatten the sample. Not sure how this is maintained on the other side of the NN when the action comes out and it is unflattened. But as you said, this is something that users would enforce themselves.

Maybe the solution here is to add some documentation guiding the users to sample before flattening and warning them about the shortcomings of unflattening?

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Nov 10, 2022

It seems that it is not possible to unflatten a flattened sample.
We can add this to the documentation to note this fact but I just realised why are you doing this?
If you need to unflatten a flattened sample, can you not just use sample from the original space?
Or if you need a sample and its flattened version, then can you not generate a normal sample and then flatten it?

@rusu24edward
Copy link
Contributor Author

Or if you need a sample and its flattened version, then can you not generate a normal sample and then flatten it?

Yes, I can do this. I didn't want the extra nuance, but it's fine.

If you need to unflatten a flattened sample, can you not just use sample from the original space?

If I'm just sampling, then yes, I can do everything in the original space. However, if it's output from a NN or some other function and its in the flattened state, then no, I will need to unflatten it directly. And in this case, I'll need to design the NN or function to enforce the structure.

@rusu24edward
Copy link
Contributor Author

Just for conciseness, here's an example script:

from gym.spaces import Discrete, Box
from gym.spaces.utils import flatten_space, unflatten

def to_string(array):
    return ''.join([str(i) for i in array])

discrete_space = Discrete(3)
discrete_sample_counter = {0: 0, 1: 0, 2:0}
for _ in range(10000):
    sample = discrete_space.sample()
    discrete_sample_counter[sample] += 1

print(discrete_sample_counter)

flattened_space = flatten_space(discrete_space)
flattend_sample_counter = {
    '000': 0,
    '001': 0,
    '010': 0,
    '011': 0,
    '100': 0,
    '101': 0,
    '110': 0,
    '111': 0
}
unflattened_sample_counter = {0: 0, 1: 0, 2:0, 'error': 0}
for _ in range(10000):
    sample = flattened_space.sample()
    flattend_sample_counter[to_string(sample)] += 1

    try:
        unflattened_sample = unflatten(Discrete(3), sample)
        unflattened_sample_counter[unflattened_sample] += 1
    except IndexError:
        unflattened_sample_counter['error'] += 1

print(flattend_sample_counter)
print(unflattened_sample_counter)

# Output:
discrete_sample_counter >> {0: 3462, 1: 3333, 2: 3205}
# ^ Notice equal distribution among the three options, each of them is chosen about 1/3 of the time
flattend_sample_counter >> {'000': 1241, '001': 1266, '010': 1211, '011': 1245, '100': 1251, '101': 1279, '110': 1284, '111': 1223}
# ^ Notice equal distribution among the eight options, each of them is chosen about 1/8 of the time
unflattened_sample_counter >> {0: 5037, 1: 2456, 2: 1266, 'error': 1241}
# ^ Notice skewed distribution among the three options. As in the table I showed above, 1/2 of the samples are for 0, 1/4 are for 1, 1/8 for 2, and the other 1/8 are errors.

@pseudo-rnd-thoughts
Copy link
Member

@rusu24edward As we discussed above, it doesn't seem possible for accurately unflattering a flattened space's sample.
Would you be able to make a PR for the unflatten and flatten functions adding a note?

@rusu24edward
Copy link
Contributor Author

@pseudo-rnd-thoughts Yup, just wanted to provide an example script in case anyone else comes by this post.

Sure, I can add that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants