Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add clip to the specification #715

Merged
merged 6 commits into from
Jan 22, 2024
Merged

Add clip to the specification #715

merged 6 commits into from
Jan 22, 2024

Conversation

kgryte
Copy link
Contributor

@kgryte kgryte commented Nov 30, 2023

This PR

  • resolves Add clip #482 by adding clip to the Array API specification for clamping each element of an input array to a specified range.
  • requires that the output data type be the same as the input array. One could argue that the output data type follow type promotion rules (between x, min, max). This would be in the spirit of earlier efforts to ensure that type promotion rules are applied consistently throughout the specification. However, in contrast to Update output dtypes for bitwise shift operations #201, min and max are kwargs, and we do not have, TMK, any precedent for array kwargs influencing the data type of the output array. Furthermore, when clamping, users are more likely to want an output array of the same data type as the input array (this was also raised on the NumPy issue tracker: ENH: Ensure that output of np.clip has the same dtype as the main array numpy/numpy#24976).
  • allows the input array x to be broadcast, thus allowing the output array to have a rank greater than the input array. This differs from TensorFlow, which requires that the output array shape be the same as the input array shape. NumPy, however, supports such broadcasting behavior. Note that x can be broadcast is somewhat at odds with not allowing type promotion. For the output data type, I argued that min and max should not affect the output data type, but, in allowing x to be broadcast, this would mean that min and max should affect the output array shape. This is likely fine and consistent with the rest of the specification, where we have plenty of kwargs which affect the output array shape, although this would be the first, TMK, involving broadcasting.
  • specifies that if min > max, behavior is unspecified. NumPy et al set output values to max; however, other implementations should be free to raise an exception or support alternative behavior.
  • allows both min and max to be optional. When both min and max are None, the function is essentially a no-op. This follows PyTorch, but differs from NumPy which allows min and max be None, but not at the same time.
  • does not prohibit mixed data types, but leaves behavior unspecified (e.g., when x is an integer data type and min or max is a floating-point data type), which is consistent elsewhere in the specification. TensorFlow raises an exception in such a scenario.
  • makes min and max positional and keyword arguments.
  • Follows NumPy, JAX, CuPy, and Dask in naming the API clip. TensorFlow uses the name clip_by_value. PyTorch also includes clip, but this aliases to clamp.

Note that this PR would introduce changes to existing clip functionality in NumPy et al. Namely,

  • min and max are positional and keyword arguments; whereas, in NumPy, a_min and a_max are positional.
  • deviates from NumPy's naming convention of a_min and a_max.
  • NumPy requires that only one of min or max be allowed to be None at the same time.

@kgryte kgryte added the API extension Adds new functions or objects to the API. label Nov 30, 2023
@kgryte kgryte added this to the v2023 milestone Nov 30, 2023
@kgryte kgryte changed the title Add clamp to the specification Add clip to the specification Jan 11, 2024
@kgryte
Copy link
Contributor Author

kgryte commented Jan 11, 2024

I've updated this PR based on feedback from the 30 November 2023 workgroup meeting. Namely,

  • I renamed the API from clamp to clip. Given widespread usage of this API, it was considered undesirable to rename and deprecate clip in NumPy et al. Even though this PR introduces behavior which differs from NumPy (e.g., kwargs, output data type, etc), it was considered better if NumPy simply moved to adopt specified behavior.
  • I updated the guidance regarding the output data type to be the same as the input array x and not the result of type promotion. An argument could be made either way; however, as discussed during the workgroup meeting, user expectation is most likely to be that the output data type matches x, and the specification does not have precedent (TMK) for array kwargs influencing the output data type. As such, specification was guidance was updated accordingly. Note, however, that this differs from current behavior in, e.g., NumPy.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes min and max keyword-only arguments.

The majority of usage today is like this, with scalar values but more importantly positional use of min/max (and not even named that in numpy today): clip(x, -1, 4). This doesn't seem unreasonable, so I'd prefer positional usage I think. Makes introduction a lot less disruptive.

The discussion in gh-482 leans towards this I'd say:

def clip(x, /, min=None, max=None):

Introducing that in NumPy would be fine in a minor release.

The rest of the semantics for min/max all LGTM.

I updated the guidance regarding the output data type to be the same as the input array x and not the result of type promotion. An argument could be made either way; however, as discussed during the workgroup meeting, user expectation is most likely to be that the output data type matches x, and the specification does not have precedent (TMK) for array kwargs influencing the output data type. As such, specification was guidance was updated accordingly. Note, however, that this differs from current behavior in, e.g., NumPy.

This seems reasonable. NumPy is already a little inconsistent with itself, sometimes scalars cause type promotion, sometimes not. E.g.:

>>> x = np.arange(6).astype(np.int8)
>>> np.clip(x, -1, 4.5)  # yields float64
array([0. , 1. , 2. , 3. , 4. , 4.5])
>>> np.clip(x, -1, 4)  # no promotion, all fits in int8
>>> xu = x.astype(np.uint8)
>>> np.clip(xu, -1, 4)  # now min value doesn't cause upcasting to int16
...
OverflowError: Python integer -1 out of bounds for uint8

Mostly the result is upcast though:

>>> x_f32 = np.linspace(0, 1, num=5, dtype=np.float32)
>>> np.clip(x_f32, 0.2, 0.8)
array([0.2 , 0.25, 0.5 , 0.75, 0.8 ], dtype=float32)
>>> np.clip(x_f32, np.array([0.2], dtype=np.float64), 0.8)
array([0.2 , 0.25, 0.5 , 0.75, 0.8 ])

The cross-kind promotion or comparison is problematic anyway. It's not entirely clear from the language here what the expected result is for clip(x_int32, 0.6, 2.3) - the minimum value in the output could be 1 or 0 depending how you'd implement the function, because some internal casting will have to happen.

Probably there should be an additional note that cross-kind dtypes are unspecified behavior?

@kgryte
Copy link
Contributor Author

kgryte commented Jan 11, 2024

@rgommers I've updated min and max to be both positional and kwarg. And I added a note stating that arguments having different data type kinds will result in implementation-dependent behavior.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM now.

I'll leave it open for a bit in case more people have comments.

@kgryte
Copy link
Contributor Author

kgryte commented Jan 22, 2024

As this PR has received the OK and has not received any further comments, will go ahead and merge. Thanks all!

@kgryte kgryte merged commit 6e320d0 into data-apis:main Jan 22, 2024
3 checks passed
@kgryte kgryte deleted the feat/clip branch January 22, 2024 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API extension Adds new functions or objects to the API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add clip
2 participants