Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Box casting and sampling edge-cases #774

Merged
merged 9 commits into from
Mar 23, 2024

Conversation

Jammf
Copy link
Contributor

@Jammf Jammf commented Nov 12, 2023

Description

Changes Box to raise an error during init if either low or high are outside the limits of the Box's dtype. Values that are np.inf, -np.inf, or np.nan are ignored.

Also changes Box's sampling to prevent an edge-case. Having a low value near a dtype's max int and an unbounded high value (or vice versa) would lead to an integer overflow (underflow) and invalid sampled values. Now the values are clipped before casting to ensure they're within the limits of the Box's dtype. For np.int64, it also clips again after casting, since the cast itself can cause the value to go outside the Box bounds for extreme values.

I also had to change a unit test for the DtypeObservation wrapper, since CartPole's observation space now cannot be cast to np.int32 due to it using large (but finite) bounds outside the np.int32 range.

Fixes #768

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@Jammf
Copy link
Contributor Author

Jammf commented Nov 13, 2023

@pseudo-rnd-thoughts Looks like casting np.inf to a uint has undefined behavior, and macOS sets it to MAX_INT while the CI machine sets it to 0. There's special handling in box._broadcast() for signed ints, but nothing for uints. We can either define a behavior for uint (which would break backward compatibility for some platforms) or we can just disable those tests and leave things undefined?

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Nov 14, 2023

@Jammf That seems pretty bad.

Before we make a change, can we be precise on the current functionally about this.

  • Is low and high case to the dtype in __init__?
  • What happens to low and high if outside the bounds of the dtype in __init__?
  • What happens during sample if low and high are outside the bounds of the dtype?
  • What is the value of bounded_below and bounded_above with inf and val == dtype.max (is the Box only bounded if inf is used or if the low or high is equal to the max value)?

If you can think of any more relevant questions please add, would like to not mess up the Box space for old users where there isn't an issue already.
Then can be specify which of these answers we wish to change

@Kallinteris-Andreas
Copy link
Collaborator

@Jammf how are you casting np.inf to integer, that should result in an arithmetic error

>>> import numpy as np
>>> int(np.inf)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: cannot convert float infinity to integer
>>> np.inf.astype(np.int)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'float' object has no attribute 'astype'

@Jammf
Copy link
Contributor Author

Jammf commented Nov 14, 2023

@Kallinteris-Andreas The relevant code is in box._broadcast(), and it looks like the logic for inf-to-int conversion was originally added in 3746741. Here's what's in main currently:

elif isinstance(value, np.ndarray):
# this is needed because we can't stuff np.iinfo(int).min into an array of dtype float
casted_value = value.astype(dtype)
# change bounds only if values are negative or positive infinite
if np.dtype(dtype).kind == "i":
casted_value[np.isneginf(value)] = np.iinfo(dtype).min + 2
casted_value[np.isposinf(value)] = np.iinfo(dtype).max - 2

(There's also similar logic for casting+broadcasting scalar np.inf and -np.inf)

It looks like np.inf and -np.inf are set to MAX_INT - 2 and MIN_INT + 2 for signed ints only. and uints were just left undefined.

With just numpy's casting, since ndarray.astype defaults to unsafe casting, it only emits a warning instead of raising an error:

>>> import numpy as np
>>> a = np.array(np.inf)
>>> a.dtype
dtype('float64')

>>> a.astype(np.int16)
<stdin>:1: RuntimeWarning: invalid value encountered in cast
array(-1, dtype=int16)

>>> a.astype(np.int16, casting="safe")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Cannot cast scalar from dtype('float64') to dtype('int16') according to the rule 'safe'

@Jammf
Copy link
Contributor Author

Jammf commented Nov 18, 2023

@pseudo-rnd-thoughts I spent some time looking into it the past few days, and it does seem pretty messy. I tried to condense the issues down to just the main points, but let me know anything doesn't make sense.

Core issues:

  • out-of-range float low/high args act different from np.inf float low/high args
  • out-of-range sint or uint low/high args can overflow, which causes errors during sampling
  • -inf/inf sint low/high args can overflow when sampled
  • inf uint box.high has different values depending on platform (and also can overflow when sampled)
  • nans can be passed as low/high, which seems intentionally allowed by unit tests?

Other related discussions are at

Current behavior

I made some charts to capture the current casting behavior in main (it doesn't cover any changes in this PR at all, since I'll probably end up needing to recommit new changes regardless). The first column is the input values (regardless of the dtype of the input arrays), the second column is the value of the box.low or box.high attributes, the third is the value of the box.bounded_below or box.bounded_above attributes, and the fourth is the sampling method and any special sampling issues or behavior.

FLOATS attr. value bounded? sampling
in-range (value) True Uniform
out-of-range inf True (!) Uniform (!)
inf/-inf inf/-inf False Exp/Normal
nan nan False
SIGNED INTS attr. value bounded? sampling
in-range (value) True Uniform
out-of-range high overflow value (!) True Uniform
inf high MAX-2 False Exp, can overflow (!)
out-of-range low underflow value (!) True Uniform
-inf low MIN+2 False Exp, can underflow (!)
nan 0 or MIN (depends on arch. and precision) False
UNSIGNED INTS attr. value bounded? sampling
in-range (value) True Uniform
out-of-range high overflow value (!) True Uniform
inf high MAX (arm64) or 0 (x86_64) False Exp, can overflow (!)
out-of-range low underflow value (!) True Uniform
-inf low 0 False Exp, can underflow (!)
nan 0 False

Also, Exponential sampling is replaced with Normal if both high and low are unbounded, but I didn't include those cases. Normal sampling should lead to valid samples for floats and sints, and have a 50% chance of generating an invalid value for uints. Sampling from something as biased as a standard normal or exponential distribution doesn't seem ideal, but that's really a separate discussion.

As for potential fixes, I have a few ideas ordered by how much it'd break existing code. There's probably other solutions I missed, or mixes of different solutions would work. And of course, there's always the option to just do nothing.

Proposal 1

We could change uints/sints to be like how inf sints are currently handled, add clipping to sampling to prevent invalid samples, and raise errors for clearly nonsensical values (having nans, or bounded_below=False uints). This is the only solution that allows infinite-bounds integer boxes, by using magic values to represent inf and -inf. Those magic values could also be MAX and MIN instead of MAX-2 and MIN+2, as long as we're clipping properly during sampling.

FLOATS attr. value bounded? sampling
in-range (value) True Uniform
out-of-range inf False Exp/Normal
inf/-inf inf/-inf False Exp/Normal
nan nan False
SIGNED INTS attr. value bounded? sampling
in-range (value) True Uniform
out-of-range high MAX-2 False Exp, clipped to MAX-2
inf high MAX-2 False Exp, clipped to MAX-2
out-of-range low MIN+2 False Exp, clipped to MIN+2
-inf low MIN+2 False Exp, clipped to MIN+2
nan raise Error n/a n/a
UNSIGNED INTS attr. value bounded? sampling
in-range (value) True Uniform
out-of-range high MAX-2 False Exp, clipped to MAX-2
inf high MAX-2 False Exp, clipped to MAX-2
out-of-range low raise Error n/a n/a
-inf low raise Error n/a n/a
nan raise Error n/a n/a

Proposal 2

We could change infs and out-of-range ints to be like they were before d1f35fe, making it so they're always bounded.

FLOATS attr. value bounded? sampling
out-of-range inf False Exp/Normal
inf/-inf inf/-inf False Exp/Normal
nan nan False
SIGNED INTS attr. value bounded? sampling
in-range (value) True Uniform
out-of-range high MAX True Uniform
inf high MAX True Uniform
out-of-range low MIN True Uniform
-inf low MIN True Uniform
nan raise Error n/a n/a
UNSIGNED INTS attr. value bounded? sampling
in-range (value) True Uniform
out-of-range high MAX True Uniform
inf high MAX True Uniform
out-of-range low raise Error n/a n/a
-inf low raise Error n/a n/a
nan raise Error n/a n/a

Proposal 3

We could raise an error if an out-of-range uint/sint is passed to low/high. This would force the caller to handle the casting.

FLOATS attr. value bounded? sampling
out-of-range inf False Exp/Normal
inf/-inf inf/-inf False Exp/Normal
nan nan False
SIGNED INTS attr. value bounded? sampling
in-range (value) True Uniform
out-of-range high raise Error n/a n/a
inf high raise Error n/a n/a
out-of-range low raise Error n/a n/a
-inf low raise Error n/a n/a
nan raise Error n/a n/a
UNSIGNED INTS attr. value bounded? sampling
in-range (value) True Uniform
out-of-range high raise Error n/a n/a
inf high raise Error n/a n/a
out-of-range low raise Error n/a n/a
-inf low raise Error n/a n/a
nan raise Error n/a n/a

Proposal 4

We could make Box a continuous space only, plan to deprecate non-float Box, and redirect to the proper discrete spaces (MultiDiscrete and MultiBinary) instead. This would break existing pixel-observation envs that uses Box to encode image observations (and most of them probably do use Box), and would make a lot of tutorials be incorrect.

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Nov 20, 2023

@Jammf Thank you for your highly detailed comment, this is as complex as I feared

I found the proposals a bit confusing so used your tables and some testing to propose what I think is the optimal solution that makes the most sense (and preserves backward compatibility)

Proposed (optimal) solution

Looking at your proposals this is a bit different from all of yours with my primary change being to raise more errors in cases where sampling will most likely not make sense / work for users and not change anything anywhere else. What are your thoughts?

Edit: the error raising would happen within __init__ not sample despite the column name. Second, we should include a table like this within the Box documentation

FLOATS attr value bounded? current sampling proposed sampling
in-range value True uniform(low, high) uniform(low, high)
-inf / in-range inf / value False / True high - exponential high - exponential
in-range / inf value / inf True / False low + exponential low + exponential
-inf / inf -inf / inf False normal(0, 1) normal(0, 1)
out-of-range -inf / inf True uniform(low, high) with overflow raise error!
nan nan / nan False normal(0, 1) raise error!
SIGNED INTS attr value bounded? current sampling proposed sampling
in-range value True uniform(low, high) uniform(low, high)
-inf / in-range MIN-2 / value False / True high - exponential high - exponential
in-range / inf value / MAX-2 True / False low + exponential low + exponential
-inf / inf MIN-2 / MAX-2 False normal(0, 1) normal(0, 1)
out-of-range under/overflow! True uniform(low, high) with overflow raise error!
nan MIN-2 / MAX-2 False normal(0, 1) raise error!
UNSIGNED INTS attr value bounded? current sampling proposed sampling
in-range value True uniform(low, high) uniform(low, high)
-inf / in-range 0 / value False / True high - exponential raise error!
in-range / inf value / 0 True / False low + exponential raise error!
-inf / inf 0 / 0 False normal(0, 1) raise error!
out-of-range under/overflow! True uniform(low, high) with overflow raise error!
nan 0 / 0 False normal(0, 1) raise error!

The exponential sampling doesn't make much sense as the exponential distribution values (with scale=1.0) are so small compared to the range of values. However, due to backward compatibility, I don't think we can change this.

We could add clipping at the end to ensure that samples are within the bounds but this is very unlikely, only possible with extreme values of one bound being inf and the other being very close to the bound value.

@Kallinteris-Andreas @jjshoots @RedTachyon do any of you have additional comments or thoughts?

@jjshoots
Copy link
Member

I like @pseudo-rnd-thoughts's approach, makes it very verbose and doesn't silently allow the user to do something that the Box space was not supposed to do.
However, I'm not sure why we're raising errors only on sampling and not on init - for example, for signed ints, putting negative values or negative infinites should cause it to raise errors from the start.
Is this for legacy support?

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Nov 20, 2023

@jjshoots apologies I didn't put in my comment that the error raising would happen all within the __init__ as you noted. It was easiest to note that information in that column

@Kallinteris-Andreas
Copy link
Collaborator

Kallinteris-Andreas commented Nov 24, 2023

Foremost, the most important purposes of the spaces if to inform the learning algorithm what its observation/action spaces are, and secondly to sample an action space.

FLOATS attr value bounded? proposed sampling
in-range value True uniform(low, high)
-inf / in-range inf / value False / True low + exponential
in-range / inf value / inf True / False high - exponential
-inf / inf -inf / inf False normal(0, 1)
out-of-range raise error! [1] raise error! [1] raise error!
nan raise error! [1] raise error! [1] raise error!
SIGNED INTS attr value bounded? proposed sampling
in-range value True uniform(low, high)
-inf / in-range MIN / value [3] or [3b] False / True or [3b] low + geometric [2] or [3b]
in-range / inf value / MAX [3] or [3b] True / False or [3b] high - geometric [2] or [3b]
-inf / inf MIN / MAX [3] or [3b] False or [3b] normal(0, 1) or [3b]
out-of-range raise error! [1] raise error! [1] raise error!
nan raise error! [1] raise error! [1] raise error!
UNSIGNED INTS attr value bounded? proposed sampling
in-range value True uniform(low, high)
-inf / in-range raise error! [1] raise error! [1] raise error!
in-range / inf value / 0 or [3b] True / False or [3b] low + geometric [2, 4] or [3b]
-inf / inf raise error! [1] raise error! [1] raise error!
out-of-range raise error! [1] raise error! [1] raise error!
nan raise error! [1] raise error! [1] raise error!

[1] as @JJshots mentioned, this should be an error.
[2] exponential distribution does not exist for integers, geometric should be used instead. (keep in mind, geometric distributions may still overflow).
[3] no reason to restrict the range to MIN-2/MAX-2 (even if still need it, it should be MIN+2).
[3b] or alternatively raise error for anything involving inf and integers, since it overcomplicates code, and only provides the benefit of an alternative sample() method. ()
[4] I see no reason to not allow uint's "in-range / inf" if int's are also allowed.

The exponential sampling doesn't make much sense as the exponential distribution values (with scale=1.0) are so small compared to the range of values. However, due to backward compatibility, I don't think we can change this.

To address this, we could add an optional argument (example name sample_variable) which would be normal distribution's std, exponential distribution's scale and geometric distribution's p.
(this could be a separate PR)

@jjshoots
Copy link
Member

I can't quite remember why I did min+2/max-2, but it was fairly important for some other part of the Box space. Perhaps we'll figure out why that's needed when we remove it and see if tests fail.

@Jammf
Copy link
Contributor Author

Jammf commented Nov 26, 2023

I like the idea of raising errors for nan and out-of-range, that's a lot cleaner of a solution than I had originally came up with.

For MIN and MAX instead of MIN+2 and MAX-2, as far as I can tell it seems to work fine? All unit tests are passing at least. The only issue I can see is if the value were to be hardcoded in a custom environment somewhere. I think the initial issue was perhaps an edge case related to sampling, which is fixed by clipping the sample to ensure it's in-bounds.

For unsigned in-range / inf, the existing casting behavior is different depending on (probably) cpu architecture, with infs being cast to MAX (arm64) or 0 (x86_64). Considering that'd we'll be forced to break one or the other in order to make things consistent, this could potentially be a good opportunity to just deprecate uint infs all together. But perhaps we should keep them, assuming we can come up with a compelling enough use case.

I also agree with @Kallinteris-Andreas that we should either have infs for both uints and ints, or for neither, as long as it's consistent. Personally, I'm in favor of just deprecating/removing infs entirely for non-floating Box (including bool Box). They aren't supported by the underlying numpy arrays, so as long as a large part of gymnasium's functionality is coupled with numpy it'd just be safer to follow their rules for number representations. Later on, if gymnasium ever removes the hard-coded numpy internals, then allowing infinite integer boxes could be reevaluated.

For sampling, I think that geometric is more correct conceptually than rounding an exponential sample. But it would also break backward compatibility. But if users need that precise of reproducibility, then they'll probably be pinning their gymnasium version anyway, so perhaps it's not that much of an issue?

I also like the idea of having adding sample_variable argument, but that'd definitely be for a separate PR.

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Nov 28, 2023

I suspect that the MAX-2 and MIN-2 was to account for np.inf and np.nan special values within the encoding I believe but if it works then we should be able to remove it

I believe we are very close to an agreed change to Box.sample, my proposed sampling, building on @Kallinteris-Andreas solution is:

FLOATS attr value bounded? proposed sampling
in-range value True uniform(low, high)
-inf / in-range inf / value False / True low + exponential(scaling)
in-range / inf value / inf True / False high - exponential(scaling)
-inf / inf -inf / inf False normal(0, scaling)
out-of-range raise error! raise error! raise error!
nan raise error! raise error! raise error!
SIGNED INTS attr value bounded? proposed sampling
in-range value True uniform(low, high)
-inf / in-range MIN / value False / True low + exponential(scale)
in-range / inf value / MAX True / False high - exponential(scale)
-inf / inf MIN / MAX False normal(0, scale)
out-of-range raise error! raise error! raise error!
nan raise error! raise error! raise error!
UNSIGNED INTS attr value bounded? proposed sampling
in-range value True uniform(low, high)
-inf / in-range raise error! raise error! raise error!
in-range / inf raise error! raise error! raise error!
-inf / inf raise error! raise error! raise error!
out-of-range raise error! raise error! raise error!
nan raise error! raise error! raise error!

This adds the scale implicitly and removes unsigned integers in-range / inf and still uses exponential rather than geometric.

For the geometric vs exponential, there seems to be minimal difference between the distribution (someone can correct but I believe that exponential is the continuous version of geometric distribution). So I'm not sure it makes sense to change

Looking at the tests that will need changing, could we add more testing for non-standard distributions, np.bool_, np.uint8 and np.float16.

I would call the scaling variable, sample_scaling: float = 1.0 within __init__ to make it clear to users it's meaning and should be completed in another PR as @Kallinteris-Andreas said.

@Kallinteris-Andreas
Copy link
Collaborator

Kallinteris-Andreas commented Nov 28, 2023

removes unsigned integers in-range / inf

What is the reason for this choice

Also, I do not think np.bool makes sense for Box

(other than that I agree)

@Jammf
Copy link
Contributor Author

Jammf commented Nov 28, 2023

I found relatively few usages of bool Box in a search of public GitHub repos, so it'd probably only have minor impact to remove it or redirect to MultiBinary. (Though, the link only has single-line usages, I wasn't able to figure out how to get a multi-line regex to work nicely)

@jjshoots jjshoots closed this Nov 28, 2023
@jjshoots
Copy link
Member

Crap thought I was looking at a different PR. My bad.

@jjshoots jjshoots reopened this Nov 28, 2023
@pseudo-rnd-thoughts
Copy link
Member

I will try to finish this PR by the end of the week. As the technical details are complex, I've decided to start with the tests to outline our expectations of what the space should be. This doesn't cover the sampling currently but that hasn't changed.

I found it easier to rewrite most of the tests to order them into different groupings: shape, dtype, low/high and infinite space

I will work on implementing the changes this evening hopefully

@pseudo-rnd-thoughts
Copy link
Member

Over a month later, I've finished this PR by spending 5 hours on this and rewriting most of the Box init function.
The result is a horribly complex function but adds the functionality discussed above: disallow nan, out of range values, unsigned int inf and -inf values.

A large amount of the function is processing the low and high value where to check for out of range has to be completed before converted to the final dtype.
However, this required specialised code for scalars and arrays, repeated for low and high (see _cast_low and _cast_high)

Copy link
Collaborator

@Kallinteris-Andreas Kallinteris-Andreas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine from a first pass, i will have to do a second (more detailed) review pass

assert (
high.shape == shape
), f"high.shape doesn't match provided shape, high.shape: {high.shape}, shape: {shape}"
# most of the code between these two functions are copied with minor changes for < and >
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the comment for?

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next two function calls, I can remove it


super().__init__(self.shape, self.dtype, seed)

def _cast_low(self, low, dtype_min):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like a docstring here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

tests/spaces/test_box.py Show resolved Hide resolved
@pseudo-rnd-thoughts pseudo-rnd-thoughts merged commit 89bedf1 into Farama-Foundation:main Mar 23, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug Report] Box casting during init can cause invalid sampled values
4 participants