Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with array creation functions #107

Closed
asmeurer opened this issue Jan 8, 2021 · 9 comments · Fixed by #167
Closed

Issues with array creation functions #107

asmeurer opened this issue Jan 8, 2021 · 9 comments · Fixed by #167

Comments

@asmeurer
Copy link
Member

asmeurer commented Jan 8, 2021

Some issues I noticed in the array creation functions from adding tests to the test suite:

arange

eye

full (and full_like)

  • Says "If dtype is None, the output array data type must be the default floating-point data type." I think the default should be a corresponding dtype to the input value (we don't have a notion of a "default" integer dtype).

linspace

  • It's a bit ambiguous whether it actually says this right now, but I think the stop value should not be required to be included (when endpoint=True). Consider
>>> np.linspace(0, 9288674231451855, 2, dtype=np.int64)
array([               0, 9288674231451856])

The stop value is different from what is given because of floating point loss of precision when computing the linspace.

@rgommers
Copy link
Member

rgommers commented Jan 9, 2021

Thanks @asmeurer, good points. I agree with everything except the linspace point, that seems like a bug in numpy due to having a dtype keyword but not having a separate integer implementation for it.

@kgryte
Copy link
Contributor

kgryte commented Jan 11, 2021

Re: full_like. The dtype is inferred from the provided array, not the fill value.

Re: linspace. This is a bug in NumPy. If we are not going to support endpoint inclusion because of precision issues, then the endpoint keyword should be dropped altogether, as would not be possible to support.

Re: arange. The reason for floating-point as the default is that, even if the inputs are integers, we cannot guarantee that the computed increment will allow for evenly spaced numbers without precision loss. In this case, floating-point is the safest choice.

Re: other points. Seem reasonable. Will submit a follow-up PR.

@asmeurer
Copy link
Member Author

Re: linspace. This is a bug in NumPy. If we are not going to support endpoint inclusion because of precision issues, then the endpoint keyword should be dropped altogether, as would not be possible to support.

Just to be clear, the spec doesn't explicitly say the endpoint should be exactly included. So if that is desired, we should probably state that.

Re: arange. The reason for floating-point as the default is that, even if the inputs are integers, we cannot guarantee that the computed increment will allow for evenly spaced numbers without precision loss. In this case, floating-point is the safest choice.

I'm not sure I follow here. I mean the case where start, stop, and step are integers. There should be no precision issues there because all the values will be exact integers between start and stop (assume start and stop are within the range of the dtype, which I mentioned as another point). I agree if any of them, including step is a float, then the result should be floating point. Note that converting to float could actually lose precision because many integer dtypes have values that are not exactly representable as a float:

>>> np.arange(9223372036854775805, 9223372036854775807, dtype=np.float64)
array([9.22337204e+18, 9.22337204e+18])
>>> np.unique(np.arange(9223372036854775805, 9223372036854775807, dtype=np.float64))
array([9.22337204e+18]

@shoyer
Copy link
Contributor

shoyer commented Jan 30, 2021

A few other suggestions:

  • To resurface concerns from Arguments need not be strictly positional or keyword-only for creation and manipulation functions #85:
    • Why allow shape arguments to be passed as keyword arguments, too? This is a pretty unambiguous name, and code can be clearer when it uses names.
    • Likewise, could num be allowed as a keyword for linespace and fill_value be allowed as a keyword for full/full_like?
  • empty and empty_like may not necessarily make sense for libraries that do not support mutation (e.g., JAX, Dask, TensorFlow) It might be worth calling this out.

@asmeurer
Copy link
Member Author

asmeurer commented Feb 1, 2021

empty and empty_like may not necessarily make sense for libraries that do not support mutation (e.g., JAX, Dask, TensorFlow) It might be worth calling this out.

Could these libraries simply alias empty to zeros?

Also, a common use of empty is to create size 0 arrays, like empty((0, 1)).

@shoyer
Copy link
Contributor

shoyer commented Feb 1, 2021

empty and empty_like may not necessarily make sense for libraries that do not support mutation (e.g., JAX, Dask, TensorFlow) It might be worth calling this out.

Could these libraries simply alias empty to zeros?

Fair enough -- in fact this is exactly what JAX already does.

rgommers added a commit to rgommers/array-api that referenced this issue Apr 20, 2021
rgommers added a commit to rgommers/array-api that referenced this issue Apr 20, 2021
rgommers added a commit to rgommers/array-api that referenced this issue Apr 20, 2021
rgommers added a commit to rgommers/array-api that referenced this issue Apr 20, 2021
rgommers added a commit to rgommers/array-api that referenced this issue Apr 20, 2021
rgommers added a commit to rgommers/array-api that referenced this issue Apr 20, 2021
rgommers added a commit to rgommers/array-api that referenced this issue Apr 20, 2021
rgommers added a commit to rgommers/array-api that referenced this issue Apr 20, 2021
rgommers added a commit that referenced this issue Apr 27, 2021
* Update specification for arange

Addresses comments in gh-85 and gh-107

* Update the specification for `full` and `full_like`

Addresses comments in gh-85 and gh-107

* Update specification for `linspace`

Addresses comments in gh-85 and gh-107

* Update specification for `empty`, `ones`, `zeros`

Addresses comments in gh-85 and gh-107

* Update specification for `eye`

This is useful/needed because `M` is not a descriptive name
and that name does not match between different array libraries.

* Update specification for `expand_dims`, `roll` and `reshape`

Address comment in gh-85

* One more change to `eye`, more descriptive positional arguments

* Address the default integer dtype issue for 32/64-bit Python

Closes gh-151

* Update signature of `broadcast_to`

Address a review comment; makes it consistent with other functions
using `shape`.
@asmeurer
Copy link
Member Author

Turns out the linspace issue isn't just about the endpoint:

>>> np.linspace(-9007199254740993, 0, 1, dtype=np.int64)
array([-9007199254740992])

I think this is related to this NumPy issue numpy/numpy#16813.

@asmeurer
Copy link
Member Author

The spec is maybe a little unclear, but I read it as saying the start should always be included. But the NumPy implementation clearly involves divisions which round back to integers, making this not guaranteed. It's not clear to me if this should be considered an implementation bug, of these sorts of subtleties in the implementation should be expected and hence should not be required. I opened numpy/numpy#18881 upstream about this.

@rgommers
Copy link
Member

rgommers commented May 1, 2021

That looks like a clear bug, not even a subtle one. The integer start point and step size calculation should just not use floats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants