Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-640: [Python] Implement __hash__ and equality for Array scalar values Arrow scalar values #1765

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions python/pyarrow/scalar.pxi
Expand Up @@ -73,6 +73,8 @@ cdef class ArrayValue(Scalar):
raise NotImplementedError(
"Cannot compare Arrow values that don't support as_py()")

def __hash__(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think we misunderstood each other.
When you asked about restricting yourself to integers, I thought you meant about writing a fast path that avoids calling as_py(). If there is no fast path, then there's no need to check for ints or any other types.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that. I've switch it to hash() using as_py() without any type checks. Per the JIRA ticket can we consider a separate ticket for a fast path custom hash on ints?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the JIRA ticket can we consider a separate ticket for a fast path custom hash on ints?

Definitely.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ticket created https://issues.apache.org/jira/browse/ARROW-2339. Anything else you would suggest for this PR?

return hash(self.as_py())

cdef class BooleanValue(ArrayValue):

Expand Down
27 changes: 27 additions & 0 deletions python/pyarrow/tests/test_scalars.py
Expand Up @@ -171,3 +171,30 @@ def test_dictionary(self):
categorical.categories)
for i, c in enumerate(values):
assert v[i].as_py() == c

def test_int_hash(self):
# ARROW-640
int_arr = pa.array([1, 1, 2, 1])
assert hash(int_arr[0]) == hash(1)

def test_float_hash(self):
# ARROW-640
float_arr = pa.array([1.4, 1.2, 2.5, 1.8])
assert hash(float_arr[0]) == hash(1.4)

def test_string_hash(self):
# ARROW-640
str_arr = pa.array(["foo", "bar"])
assert hash(str_arr[1]) == hash("bar")

def test_bytes_hash(self):
# ARROW-640
byte_arr = pa.array([b'foo', None, b'bar'])
assert hash(byte_arr[2]) == hash(b"bar")

def test_array_to_set(self):
# ARROW-640
arr = pa.array([1, 1, 2, 1])
set_from_array = set(arr)
assert isinstance(set_from_array, set)
assert set_from_array == {1, 2}