Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent default integer and floating-point types #151

Closed
leofang opened this issue Mar 25, 2021 · 6 comments · Fixed by #167
Closed

Consistent default integer and floating-point types #151

leofang opened this issue Mar 25, 2021 · 6 comments · Fixed by #167
Assignees
Labels
Narrative Content Narrative documentation content.

Comments

@leofang
Copy link
Contributor

leofang commented Mar 25, 2021

In today's call we discussed that the current Array API standard does not regulate the default integer and floating-point types across all implementations, only that each implementation should pick one, document clearly and stick to it. However, this is not strong enough, as there could be cross-platform/portability issues.

For example, NumPy is inconsistent in handling the Python integers between Windows and Linux:

import numpy as np
a = np.arange(10, dtype=int)
a.dtype  # np.int32 on Windows, np.int64 on Linux

Similar issues can be found in other APIs.

It's worth noting that as pointed out by @kgryte, the standard does not permit passing dtype=int to most of the functions, so this could eliminate a large class of such inconsistencies. But it's still good to be explicit in the standard to ensure portability.

@rgommers
Copy link
Member

I can't find any functions that actually permit dtype=int, but I agree that this would be good to add explicitly anyway. Maybe a .. note:: in https://data-apis.org/array-api/latest/API_specification/data_types.html?

@rgommers rgommers added the Narrative Content Narrative documentation content. label Mar 29, 2021
@asmeurer
Copy link
Member

asmeurer commented Apr 1, 2021

Even without dtype=int, there's still ambiguity for creation functions that can take Python scalars, like full or asarray. What should the dtype of asarray(1) be? Presumably we should use the same rule for floats, where a library should have a consistent default integer dtype, which isn't value-based (value-based casting is even more relevant for ints than floats).

Regarding the question of consistency of the same library on different platforms, does it make sense to allow a library to have different defaults depending on what architecture it is compiled on?

@leofang
Copy link
Contributor Author

leofang commented Apr 2, 2021

What should the dtype of asarray(1) be?

I suppose @asmeurer you are saying in addition to having a default integer type and a float type, each library should also decide if the default (when nothing is given) is an integer or a float?

Regarding the question of consistency of the same library on different platforms, does it make sense to allow a library to have different defaults depending on what architecture it is compiled on?

This is unfortunately what NumPy decides currently, and sometimes users don't even get to say "I don't wanna set dtype=int"; it could be buried in the implementation that is without reach from a user. So I am arguing it does not make sense.

@asmeurer
Copy link
Member

asmeurer commented Apr 2, 2021

No, I think asarray(1) should definitely have an integer dtype. But which integer dtype should it be? All it says is "If dtype is None, the output array data type must be inferred from the data type(s) in obj" which isn't clear. To me it even leaves open the possibility of value-based casting, which I think we want to avoid.

@leofang
Copy link
Contributor Author

leofang commented Apr 3, 2021

Ah ok, thanks for clarifying Aaron. I agree with your interpretation. Currently the standard only touches the rules mixing a Python type and an array dtype, but doesn't clearly define how to handle them in the creation functions.

@rgommers
Copy link
Member

This should be addressed in gh-167. I did put in one exception for default integers: for 32-bit vs. 64-bit Python. It's possible to avoid too of course, but it'd be a fairly large change for limited benefit. So "may vary" on 32-bit seems okay.

rgommers added a commit to rgommers/array-api that referenced this issue Apr 20, 2021
rgommers added a commit that referenced this issue Apr 27, 2021
* Update specification for arange

Addresses comments in gh-85 and gh-107

* Update the specification for `full` and `full_like`

Addresses comments in gh-85 and gh-107

* Update specification for `linspace`

Addresses comments in gh-85 and gh-107

* Update specification for `empty`, `ones`, `zeros`

Addresses comments in gh-85 and gh-107

* Update specification for `eye`

This is useful/needed because `M` is not a descriptive name
and that name does not match between different array libraries.

* Update specification for `expand_dims`, `roll` and `reshape`

Address comment in gh-85

* One more change to `eye`, more descriptive positional arguments

* Address the default integer dtype issue for 32/64-bit Python

Closes gh-151

* Update signature of `broadcast_to`

Address a review comment; makes it consistent with other functions
using `shape`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Narrative Content Narrative documentation content.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants