Issues with array creation functions #107

asmeurer · 2021-01-08T22:03:59Z

Some issues I noticed in the array creation functions from adding tests to the test suite:

arange

stop and step should not be keyword-only (this was also mentioned at Arguments need not be strictly positional or keyword-only for creation and manipulation functions #85)
Does not specify the behavior if stop or start are out of range for the dtype
Says "If dtype is None, the output array data type must be the default floating-point data type." I think the default for arange should be int if all the arguments are integers.
step cannot be 0.
The function is only defined for numeric dtypes (Issues with "Mixing arrays and Python scalars" section #98)
"The length of the output array must be ceil((stop-start)/step)" should be caveated (stop and step provided, stop >= start for step > 0 and stop <= start for step < 0)

eye

May be worth explicitly saying elements with index i, j should be 1 if j - i = k and 0 otherwise.
The function is only defined for numeric dtypes (Issues with "Mixing arrays and Python scalars" section #98)

full (and full_like)

Says "If dtype is None, the output array data type must be the default floating-point data type." I think the default should be a corresponding dtype to the input value (we don't have a notion of a "default" integer dtype).

linspace

It's a bit ambiguous whether it actually says this right now, but I think the stop value should not be required to be included (when endpoint=True). Consider

>>> np.linspace(0, 9288674231451855, 2, dtype=np.int64)
array([               0, 9288674231451856])

The stop value is different from what is given because of floating point loss of precision when computing the linspace.

The text was updated successfully, but these errors were encountered:

rgommers · 2021-01-09T15:25:58Z

Thanks @asmeurer, good points. I agree with everything except the linspace point, that seems like a bug in numpy due to having a dtype keyword but not having a separate integer implementation for it.

kgryte · 2021-01-11T20:00:01Z

Re: full_like. The dtype is inferred from the provided array, not the fill value.

Re: linspace. This is a bug in NumPy. If we are not going to support endpoint inclusion because of precision issues, then the endpoint keyword should be dropped altogether, as would not be possible to support.

Re: arange. The reason for floating-point as the default is that, even if the inputs are integers, we cannot guarantee that the computed increment will allow for evenly spaced numbers without precision loss. In this case, floating-point is the safest choice.

Re: other points. Seem reasonable. Will submit a follow-up PR.

asmeurer · 2021-01-12T01:00:56Z

Re: linspace. This is a bug in NumPy. If we are not going to support endpoint inclusion because of precision issues, then the endpoint keyword should be dropped altogether, as would not be possible to support.

Just to be clear, the spec doesn't explicitly say the endpoint should be exactly included. So if that is desired, we should probably state that.

Re: arange. The reason for floating-point as the default is that, even if the inputs are integers, we cannot guarantee that the computed increment will allow for evenly spaced numbers without precision loss. In this case, floating-point is the safest choice.

I'm not sure I follow here. I mean the case where start, stop, and step are integers. There should be no precision issues there because all the values will be exact integers between start and stop (assume start and stop are within the range of the dtype, which I mentioned as another point). I agree if any of them, including step is a float, then the result should be floating point. Note that converting to float could actually lose precision because many integer dtypes have values that are not exactly representable as a float:

>>> np.arange(9223372036854775805, 9223372036854775807, dtype=np.float64)
array([9.22337204e+18, 9.22337204e+18])
>>> np.unique(np.arange(9223372036854775805, 9223372036854775807, dtype=np.float64))
array([9.22337204e+18]

shoyer · 2021-01-30T23:13:55Z

A few other suggestions:

To resurface concerns from Arguments need not be strictly positional or keyword-only for creation and manipulation functions #85:
- Why allow shape arguments to be passed as keyword arguments, too? This is a pretty unambiguous name, and code can be clearer when it uses names.
- Likewise, could num be allowed as a keyword for linespace and fill_value be allowed as a keyword for full/full_like?
empty and empty_like may not necessarily make sense for libraries that do not support mutation (e.g., JAX, Dask, TensorFlow) It might be worth calling this out.

asmeurer · 2021-02-01T20:36:00Z

empty and empty_like may not necessarily make sense for libraries that do not support mutation (e.g., JAX, Dask, TensorFlow) It might be worth calling this out.

Could these libraries simply alias empty to zeros?

Also, a common use of empty is to create size 0 arrays, like empty((0, 1)).

shoyer · 2021-02-01T20:51:00Z

empty and empty_like may not necessarily make sense for libraries that do not support mutation (e.g., JAX, Dask, TensorFlow) It might be worth calling this out.

Could these libraries simply alias empty to zeros?

Fair enough -- in fact this is exactly what JAX already does.

Addresses comments in data-apisgh-85 and data-apisgh-107

* Update specification for arange Addresses comments in gh-85 and gh-107 * Update the specification for `full` and `full_like` Addresses comments in gh-85 and gh-107 * Update specification for `linspace` Addresses comments in gh-85 and gh-107 * Update specification for `empty`, `ones`, `zeros` Addresses comments in gh-85 and gh-107 * Update specification for `eye` This is useful/needed because `M` is not a descriptive name and that name does not match between different array libraries. * Update specification for `expand_dims`, `roll` and `reshape` Address comment in gh-85 * One more change to `eye`, more descriptive positional arguments * Address the default integer dtype issue for 32/64-bit Python Closes gh-151 * Update signature of `broadcast_to` Address a review comment; makes it consistent with other functions using `shape`.

asmeurer · 2021-04-30T22:35:00Z

Turns out the linspace issue isn't just about the endpoint:

>>> np.linspace(-9007199254740993, 0, 1, dtype=np.int64)
array([-9007199254740992])

I think this is related to this NumPy issue numpy/numpy#16813.

asmeurer · 2021-04-30T22:42:59Z

The spec is maybe a little unclear, but I read it as saying the start should always be included. But the NumPy implementation clearly involves divisions which round back to integers, making this not guaranteed. It's not clear to me if this should be considered an implementation bug, of these sorts of subtleties in the implementation should be expected and hence should not be required. I opened numpy/numpy#18881 upstream about this.

rgommers · 2021-05-01T07:50:43Z

That looks like a clear bug, not even a subtle one. The integer start point and step size calculation should just not use floats.

asmeurer mentioned this issue Apr 5, 2021

RESOLVED: Support Python Array API pytorch/pytorch#54581

Closed

rgommers added a commit to rgommers/array-api that referenced this issue Apr 20, 2021

Update specification for arange

d01fd1d

Addresses comments in data-apisgh-85 and data-apisgh-107

rgommers added a commit to rgommers/array-api that referenced this issue Apr 20, 2021

Update the specification for full and full_like

6a3a411

Addresses comments in data-apisgh-85 and data-apisgh-107

rgommers added a commit to rgommers/array-api that referenced this issue Apr 20, 2021

Update specification for linspace

042a99a

Addresses comments in data-apisgh-85 and data-apisgh-107

rgommers added a commit to rgommers/array-api that referenced this issue Apr 20, 2021

Update specification for empty, ones, zeros

5b4d4ae

Addresses comments in data-apisgh-85 and data-apisgh-107

rgommers mentioned this issue Apr 20, 2021

Update API specification for creation and manipulation functions #167

Merged

rgommers added a commit to rgommers/array-api that referenced this issue Apr 20, 2021

Update specification for arange

f31fc5c

Addresses comments in data-apisgh-85 and data-apisgh-107

rgommers added a commit to rgommers/array-api that referenced this issue Apr 20, 2021

Update the specification for full and full_like

3691f8c

Addresses comments in data-apisgh-85 and data-apisgh-107

rgommers added a commit to rgommers/array-api that referenced this issue Apr 20, 2021

Update specification for linspace

3cd1a25

Addresses comments in data-apisgh-85 and data-apisgh-107

rgommers added a commit to rgommers/array-api that referenced this issue Apr 20, 2021

Update specification for empty, ones, zeros

87dd193

Addresses comments in data-apisgh-85 and data-apisgh-107

rgommers closed this as completed in #167 Apr 27, 2021

asmeurer mentioned this issue Oct 28, 2021

Refactor assertions in test_creation.py data-apis/array-api-tests#32

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with array creation functions #107

Issues with array creation functions #107

asmeurer commented Jan 8, 2021

rgommers commented Jan 9, 2021

kgryte commented Jan 11, 2021

asmeurer commented Jan 12, 2021

shoyer commented Jan 30, 2021

asmeurer commented Feb 1, 2021

shoyer commented Feb 1, 2021

asmeurer commented Apr 30, 2021

asmeurer commented Apr 30, 2021

rgommers commented May 1, 2021

Issues with array creation functions #107

Issues with array creation functions #107

Comments

asmeurer commented Jan 8, 2021

arange

eye

full (and full_like)

linspace

rgommers commented Jan 9, 2021

kgryte commented Jan 11, 2021

asmeurer commented Jan 12, 2021

shoyer commented Jan 30, 2021

asmeurer commented Feb 1, 2021

shoyer commented Feb 1, 2021

asmeurer commented Apr 30, 2021

asmeurer commented Apr 30, 2021

rgommers commented May 1, 2021