Skip to content

RFC: add isin for elementwise set inclusion test #854

Open
@lucascolley

Description

@lucascolley
Member

Prior art

Motivation

This function is used in scikit-learn. They've implemented it in terms of the standard, and that implementation could find a home in array-api-extra: data-apis/array-api-extra#34. @asmeurer suggested there that we should also consider adding this to the standard.

Activity

asmeurer

asmeurer commented on Nov 22, 2024

@asmeurer
Member

Another potential reason for adding it is that it uses a nontrivial implementation which depends on some heuristics based on the input size.

rgommers

rgommers commented on Nov 22, 2024

@rgommers
Member

Thanks @lucascolley. I've added ndonnx (which has it) and MLX (which doesn't) to the issue description.

This seems like a very reasonable proposal to me. Implementing isin in terms of other primitives in the standard is a little complex indeed.

The return type should always be a boolean array. The NumPy docs say it can be a bool for a single input element, but that's actually a bug in the docs. I checked NumPy, JAX, and PyTorch and all return a 0-D array.

added
RFCRequest for comments. Feature requests and proposed changes.
on Nov 22, 2024
moved this to Stage 0 in Proposalson Nov 22, 2024
rgommers

rgommers commented on Nov 28, 2024

@rgommers
Member

The thing to discuss here is what keywords are desired I think. NumPy and Dask use:

def isin(element, test_elements, assume_unique=False, invert=False, *, kind=None)

The private scikit-learn implementation here is:

def isin(element, test_elements, xp, assume_unique=False, invert=False)

JAX:

def isin(element, test_elements, assume_unique=False, invert=False, *, method='auto')

PyTorch:

def isin(elements, test_elements, *, assume_unique=False, invert=False)

ndonnx:

def isin(x: Array, items: Sequence[Scalar]) -> Array

The assume_unique and invert keywords seem easy to support and useful. So this should probably work:

def isin(x: Array, test_elements: Array, /, *, assume_unique : bool = False, invert : bool =False) -> Array[bool]

The type of test_elements is a bit TBD, could be a union between arrays and sequences perhaps?

rgommers

rgommers commented on Nov 28, 2024

@rgommers
Member

We discussed this in the community meeting today. A summary with a couple of points to follow up on:

  • In general, folks were 👍🏼 on adding isin
  • For the second argument, accepting arrays and scalars but not sequences seemed preferred.
  • Some discussion about promotion behavior, this needs to be specified. Since the semantics of isin are element-wise comparison like, it is probably a good idea to match what == does.
  • For the second argument: limit to 1-D, or accept any shape and then reshape to 1-D? NumPy does the latter.
added this to the v2025 milestone on May 29, 2025
added a commit that references this issue on Jun 12, 2025
linked a pull request that will close this issue on Jun 12, 2025
moved this from Stage 0 to Stage 2 in Proposalson Jun 12, 2025
kgryte

kgryte commented on Jun 12, 2025

@kgryte
Contributor

A PR adding isin to the specification is now up for review: #959

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    API extensionAdds new functions or objects to the API.RFCRequest for comments. Feature requests and proposed changes.

    Type

    No type

    Projects

    Status

    Stage 2

    Relationships

    None yet

      Development

      Participants

      @asmeurer@rgommers@kgryte@lucascolley

      Issue actions

        RFC: add `isin` for elementwise set inclusion test · Issue #854 · data-apis/array-api