[Question] Flatten Discrete box potentially problematic #102

rusu24edward · 2022-10-31T18:43:01Z

Question

Flattening Discrete space to Box space may be problematic. The flatten wrapper converts Discrete to Box as a one-hot encoding. Suppose the original space is Discrete(3), then:

0 maps to [1, 0, 0]
1 maps to [0, 1, 0]
3 maps to [0, 0, 1]

When we sample the action space for random actions, it samples the Box, which can produce any of the eight combination of 0s and 1s in a three-element array, namely:

[0, 0, 0],
[0, 0, 1], *
[0, 1, 0], *
[0, 1, 1],
[1, 0, 0], *
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]

Only three of these eight that I’ve starred are useable in the strict sense of the mapping. The unflatten function for a Discrete space uses np.nonzero(x)[0][0], and here’s at table of what the above arrays map to:

+ ------------------ + ---------------- + --------------------------------------------- +
| In Flattened Space | np.nonzero(x)[0] | np.nonzero(x)[0][0] (aka discrete equivalent) |
+ ------------------ + ---------------- + --------------------------------------------- +
| 0, 0, 0            | Error            | Error                                         |
| 0, 0, 1            | [2]              | 2                                             |
| 0, 1, 0            | [1]              | 1                                             |
| 0, 1, 1            | [1, 2]           | 1                                             |
| 1, 0, 0            | [0]              | 0                                             |
| 1, 0, 1            | [0, 2]           | 0                                             |
| 1, 1, 0            | [0, 1]           | 0                                             |
| 1, 1, 1            | [0, 1, 2]        | 0                                             |
+ ------------------ + ---------------- + --------------------------------------------- +

Implications

Obviously, [0, 0, 0] will fail because there is no nonzero.
Importantly, only one eighth of the random samples will map to 2. One fourth will map to 1, and one half will map to 0. This has some important implications on exploration, especially if action 2 is the “correct action” throughout much of the simulation. I'm very curious why I have not seen this come up before. This type of skewing in the random sampling can have major implications in the way the algorithm explores and learns, and the problem is exacerbated when Discrete(n), n is large. Am I missing something here?

Solution

This is unique to Discrete spaces. Instead of mapping to a one-hot encoding we could just map to a box of a single element with the appropriate range. Discrete(n) maps to Box(0, n-1, (1,), int) instead of Box(0, 1, (n,), int).

The text was updated successfully, but these errors were encountered:

pseudo-rnd-thoughts · 2022-10-31T21:00:32Z

Thanks for moving the PR to Gymnasium.
Could you explain the context of the

My initial ideas for solving this was changing the flattened Discrete type to MultiBinary however has the same issue.

Your solution doesn't seem to change anything, just changing Discrete to Box.

There are two options to solve this

Add a new space however it is not obvious how this space would be generally used except in this particular case
Not guarantee flatten -> sample -> unflatten

Do you have any other ideas?

rusu24edward · 2022-11-02T16:30:01Z

As for context, many RL algorithms utilize randomness in their process, such as filling a buffer with data based on random actions at the beginning of training, or taking random actions to explore. Some implementations use action_space.sample() to generate these random actions. If the space has been flattened via the flatten wrapper, then the random sampling will occur in the flattened space, and the unflatten process will have the results I explain above.

To clarify, my solution is not to just change Discrete to Box. This is what flatten already does. My suggestion is to set up the Box differently. Right now, its set up like so:

Discrete(n) -> Box(0, 1, (n,))
sample() -> [0, 1, 0, 0, 1, 1, ...] # n-element vector, not one-hot encoded

I suggest change it to this

Discrete(n) -> Box(0, n, (1,))
sample() -> [n-1] # single-element vector, value between 0 and n-1

Why do this at all instead of just keeping it as a Discrete space? Frankly, I think that's a great question, and it's not clear to me why the current implementation flattens to a n-element array instead of using Discrete as is. It seems that he purpose is to represent it as a one-hot encoded vector, and if so, then that needs to be enforced is the resulting space. Otherwise, flattening Discrete should not do anything. I think this argument extends to MultiDiscrete and MultiBinary spaces.

As you say, it may not make sense to have a new space for a Flattened Discrete (and MultiDiscrete) space since it's only used in this one case. It may be okay not to guarantee that sampling from the flattened space is "path-independent", but does present a "gotcha" that some developers (such as myself) may fall into when designing our RL frameworks.

pseudo-rnd-thoughts · 2022-11-02T16:56:06Z

My suggestion would be to require user to flatten their environments such that observations / actions are np.ndarray / int.
Then environment that are already np.ndarray would not need to be flattened.
I think this should avoid your issue

rusu24edward · 2022-11-03T14:16:13Z

Mark, thank you for the thought you have given to this ticket so far. For the software I develop, I take the mindset that anything I provide to my users should work as expected. When something does not work as expected, I see it as either a bug or an opportunity to clarify expectations. I believe the question that I have raised reveals a potential flaw in the design of gym's spaces. Based on the language used in the flatten wrapper, it seems that flattening a discrete (and also multidiscrete) space should result in a one-hot-encoded vector, but the one-hot-encoding is not enforced in the way that object is then used (since it has just been turned into a Box). Personally, I would classify this as a bug, and I would either fix it with some clever redesign or update the documentation to make it abundantly clear that the flatten wrapper is limited. Telling my users to pre-flatten their own spaces is not really an option for me. I like all the gym spaces, I want to support all of them, and I don't want to burden my users with implementation details that makes their environments messier.

This discussion has helped inform me on how I should change my own flatten wrapper to work with gym spaces more cleanly. I hope that the gymnasium team can see it as an opportunity to look at the gym spaces from a new perspective and make changes as needed.

pseudo-rnd-thoughts · 2022-11-03T15:13:30Z

Thanks for the response. I could imagine adding a OneHot space however it feels inefficient to add.
Thinking about this, I can't see an elegant solution that both fixes the problem and allows backward compatibility.

The best idea I have is Discrete -> MultiBinary and MultiDiscrete -> MultiBinary where MultiBinary has a new parameter, one_hot that enforces one hot sampling.
There is a possible issue with multi-axis MultiDiscrete spaces that we would need to solve, we would just need to follow a similar method to the current flatten function.

What do you think of this solution? If you are happy with it and @RedTachyon agrees (he might take a couple of days due to being at a conference currently). We can make a PR and include the change in v27

rusu24edward · 2022-11-04T03:34:59Z

This sounds good to me. You may need to flatten to Box instead of MultiBinary in order to handle nested spaces. In fact, one could make the argument that Discrete itself really is just MultiBinary with one-hot encoding turned on.

RedTachyon · 2022-11-09T23:30:26Z

I'm not yet completely sure what we want to do here. I'd definitely be opposed to adding an extra space just for the right flattening, unless we completely change our philosophy with regards to the spaces.

If this one change makes it so that we can always ensure the reversibility of flatten -> unflatten, then it'd be tempting.

At the same time, the whole idea of flatten and unflatten is not necessarily well though-out, since it was introduced in some OpenAI code, and then received various updates which might or might not have followed the original intentions.

My interpretation of this functionality is more or less "I don't care what this fancy space is, I want a fixed sized float32 vector for my NN". In which case any reversibility fails due to the dtypes. But we don't actually do any automatic type conversion.

I also feel like flattening into one-hot is much better on the algorithmic side. If you want to "flatten" a discrete action, you definitely don't want to keep the ordinal structure -- i.e. action 1 isn't between action 0 and action 2 in any meaningful sense, so a one-hot embedding is likely to work much better.

pseudo-rnd-thoughts · 2022-11-10T12:13:36Z

@rusu24edward Would you be able to make a test that checks the basic idea of flatten a space -> sample from flattened space -> unflatten sample -> unflattened space is contained within the original space.
If the only spaces that fail this currently is Discrete and MultiDiscrete (composite spaces might also fail due to using Discrete) then could you implement the proposed change that I made?

rusu24edward · 2022-11-10T18:22:27Z

In which case any reversibility fails due to the dtypes. But we don't actually do any automatic type conversion.

I don't understand this, can you say a bit more about it?

@pseudo-rnd-thoughts I will take a look

rusu24edward · 2022-11-10T18:48:51Z

Some issues right off the bat:

Flattening MD and D to MB won't play nicely with a nested space. The end result would be a Box.
So we should use a Box with a one-hot flag. However, this doesn't make sense for Boxes generically because there is no notion of a "null value" for the Box. You can have a box that spans from -8 to -2, and in that case, a one-hot encoding would set the unchosen elements to...?

My interpretation of this functionality is more or less "I don't care what this fancy space is, I want a fixed sized float32 vector for my NN".

Totally agree, and yet, is this even tenable? It's curious that this issue has not come up sooner in gym's history. Several RL libraries make heavy use of gym's spaces; perhaps they're just sampling more intelligently? I bring this up because I am curious to know how many libraries make use of gym's flatten. I asked RLlib specifically to see what they did. This issue should appear anytime you unflatten the action output from a NN. Are users designing their NN with a softmax on some part of the output? Is anyone actually using the flatten an unflatten features? How are they getting around this problem?

I'm posting these questions since I think y'all would have a better idea of the context for flatten since you engage with the users regularly.

RedTachyon · 2022-11-10T19:00:42Z

I don't understand this, can you say a bit more about it?

Let's say you have an unknown space. It might be Discrete(2), it might be Dict(Discrete, Box, Dict(Tuple(Box, Box), Discrete, Box)). You don't care what the internal structure is, you just want to flatten it and feed it into a neural network - which is the case in like 90% of RL. To do this, after flattening you need to cast it to a common dtype, which in 99% of cases will be float32.

The moment you cast to float32, you lose reversibility if you try to do flatten+cast -> sample -> unflatten, because if you do this with Discrete(2) and sample 0.5, what do you "unflatten" it to?

This issue should appear anytime you unflatten the action output from a NN.

I was not aware that this is something that people would do, probably mainly because complex action spaces that would require it are rather rare. And if you use a space like this, you're probably better off writing your own utilities and definitely not use sample

rusu24edward · 2022-11-10T19:16:01Z

Good point, definitely complicates the situation...

It seems like flatten was not meant to be followed up with a sample. One should sample the space, then flatten the sample. Not sure how this is maintained on the other side of the NN when the action comes out and it is unflattened. But as you said, this is something that users would enforce themselves.

Maybe the solution here is to add some documentation guiding the users to sample before flattening and warning them about the shortcomings of unflattening?

pseudo-rnd-thoughts · 2022-11-10T19:54:22Z

It seems that it is not possible to unflatten a flattened sample.
We can add this to the documentation to note this fact but I just realised why are you doing this?
If you need to unflatten a flattened sample, can you not just use sample from the original space?
Or if you need a sample and its flattened version, then can you not generate a normal sample and then flatten it?

rusu24edward · 2022-11-10T20:22:28Z

Or if you need a sample and its flattened version, then can you not generate a normal sample and then flatten it?

Yes, I can do this. I didn't want the extra nuance, but it's fine.

If you need to unflatten a flattened sample, can you not just use sample from the original space?

If I'm just sampling, then yes, I can do everything in the original space. However, if it's output from a NN or some other function and its in the flattened state, then no, I will need to unflatten it directly. And in this case, I'll need to design the NN or function to enforce the structure.

rusu24edward · 2022-11-17T16:47:12Z

Just for conciseness, here's an example script:

from gym.spaces import Discrete, Box
from gym.spaces.utils import flatten_space, unflatten

def to_string(array):
    return ''.join([str(i) for i in array])

discrete_space = Discrete(3)
discrete_sample_counter = {0: 0, 1: 0, 2:0}
for _ in range(10000):
    sample = discrete_space.sample()
    discrete_sample_counter[sample] += 1

print(discrete_sample_counter)

flattened_space = flatten_space(discrete_space)
flattend_sample_counter = {
    '000': 0,
    '001': 0,
    '010': 0,
    '011': 0,
    '100': 0,
    '101': 0,
    '110': 0,
    '111': 0
}
unflattened_sample_counter = {0: 0, 1: 0, 2:0, 'error': 0}
for _ in range(10000):
    sample = flattened_space.sample()
    flattend_sample_counter[to_string(sample)] += 1

    try:
        unflattened_sample = unflatten(Discrete(3), sample)
        unflattened_sample_counter[unflattened_sample] += 1
    except IndexError:
        unflattened_sample_counter['error'] += 1

print(flattend_sample_counter)
print(unflattened_sample_counter)

# Output:
discrete_sample_counter >> {0: 3462, 1: 3333, 2: 3205}
# ^ Notice equal distribution among the three options, each of them is chosen about 1/3 of the time
flattend_sample_counter >> {'000': 1241, '001': 1266, '010': 1211, '011': 1245, '100': 1251, '101': 1279, '110': 1284, '111': 1223}
# ^ Notice equal distribution among the eight options, each of them is chosen about 1/8 of the time
unflattened_sample_counter >> {0: 5037, 1: 2456, 2: 1266, 'error': 1241}
# ^ Notice skewed distribution among the three options. As in the table I showed above, 1/2 of the samples are for 0, 1/4 are for 1, 1/8 for 2, and the other 1/8 are errors.

pseudo-rnd-thoughts · 2022-11-18T13:52:16Z

@rusu24edward As we discussed above, it doesn't seem possible for accurately unflattering a flattened space's sample.
Would you be able to make a PR for the unflatten and flatten functions adding a note?

rusu24edward · 2022-11-21T19:29:09Z

@pseudo-rnd-thoughts Yup, just wanted to provide an example script in case anyone else comes by this post.

Sure, I can add that

rusu24edward added the question Further information is requested label Oct 31, 2022

This was referenced Oct 31, 2022

[Question] Flatten Discrete to Box is problematic openai/gym#3139

Closed

Flatten Discrete to Box is problematic LLNL/Abmarl#355

Closed

rusu24edward mentioned this issue Nov 28, 2022

Flatten space docstring note about unflattening samples #164

Merged

pseudo-rnd-thoughts closed this as completed in #164 Nov 28, 2022

PierreMardon mentioned this issue Jan 18, 2023

Add explicit error messages when unflatten discrete and multidiscrete fail #267

Merged

7 tasks

pseudo-rnd-thoughts mentioned this issue May 17, 2023

[Proposal] Add a flag to MultiBinary to allow one-hot encoding #508

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Flatten Discrete box potentially problematic #102

[Question] Flatten Discrete box potentially problematic #102

rusu24edward commented Oct 31, 2022

pseudo-rnd-thoughts commented Oct 31, 2022

rusu24edward commented Nov 2, 2022

pseudo-rnd-thoughts commented Nov 2, 2022

rusu24edward commented Nov 3, 2022

pseudo-rnd-thoughts commented Nov 3, 2022

rusu24edward commented Nov 4, 2022

RedTachyon commented Nov 9, 2022

pseudo-rnd-thoughts commented Nov 10, 2022

rusu24edward commented Nov 10, 2022

rusu24edward commented Nov 10, 2022

RedTachyon commented Nov 10, 2022

rusu24edward commented Nov 10, 2022

pseudo-rnd-thoughts commented Nov 10, 2022 •

edited

Loading

rusu24edward commented Nov 10, 2022

rusu24edward commented Nov 17, 2022

pseudo-rnd-thoughts commented Nov 18, 2022

rusu24edward commented Nov 21, 2022

[Question] Flatten Discrete box potentially problematic #102

[Question] Flatten Discrete box potentially problematic #102

Comments

rusu24edward commented Oct 31, 2022

Question

Implications

Solution

pseudo-rnd-thoughts commented Oct 31, 2022

rusu24edward commented Nov 2, 2022

pseudo-rnd-thoughts commented Nov 2, 2022

rusu24edward commented Nov 3, 2022

pseudo-rnd-thoughts commented Nov 3, 2022

rusu24edward commented Nov 4, 2022

RedTachyon commented Nov 9, 2022

pseudo-rnd-thoughts commented Nov 10, 2022

rusu24edward commented Nov 10, 2022

rusu24edward commented Nov 10, 2022

RedTachyon commented Nov 10, 2022

rusu24edward commented Nov 10, 2022

pseudo-rnd-thoughts commented Nov 10, 2022 • edited Loading

rusu24edward commented Nov 10, 2022

rusu24edward commented Nov 17, 2022

pseudo-rnd-thoughts commented Nov 18, 2022

rusu24edward commented Nov 21, 2022

pseudo-rnd-thoughts commented Nov 10, 2022 •

edited

Loading