Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-35112: [Python] Expose keys_sorted in python MapType #35113

Merged

Conversation

0x26res
Copy link
Contributor

@0x26res 0x26res commented Apr 13, 2023

Rationale for this change

It not possible to read keys_sorted in the python API

What changes are included in this PR?

  • expose keys_sorted in cdef class MapType / types.pxi
  • add tests

Are these changes tested?

yes

Are there any user-facing changes?

We're exposing keys_sorted but I guess the documentation will update itself from the """ pydoc (?)

This is not an API breaking change

@0x26res 0x26res requested a review from AlenkaF as a code owner April 13, 2023 17:43
@github-actions
Copy link

@github-actions
Copy link

⚠️ GitHub issue #35112 has been automatically assigned in GitHub to PR creator.

@danepitkin
Copy link
Member

LGTM! I'll let someone with committers rights finalize this review.

I believe you can ignore the appveyor error, since there's been issues on main with pytests timing out.

@AlenkaF
Copy link
Member

AlenkaF commented Apr 14, 2023

The binding looks good to me also +1, thank you for the contribution!

I do have a general, probably silly question, about the keyword in general. Looking at the C++ and the tests, it is meant as a "metadata" keyword and not a "check" that the data is actually sorted, right? What I mean is, you can have a MapType defined with keys_sorted=True but using it in Scalars for example, the keys do not actually have to be sorted (ascending?):

>>> ty = pa.map_(pa.string(), pa.int8(), keys_sorted=True)
>>> v = [('b', 2), ('a', 1)]
>>> s = pa.scalar(v, type=ty)
>>> s
<pyarrow.MapScalar: [('b', 2), ('a', 1)]>

And Dane is correct, the failing tests are not connected to this PR.

@danepitkin
Copy link
Member

Good catch, @AlenkaF . Is that a possible bug that the underlying C++ implementation doesn't maintain a sorted order?

@AlenkaF
Copy link
Member

AlenkaF commented Apr 17, 2023

I am not sure. I think the type only has the keys_sorted defined as metadata. I do not see any reference to it in pyarrow.MapArray or pyarrow.MapScalar.

So maybe the question is do we want to add a check for it in PyArrow Array and Scalar?

@0x26res
Copy link
Contributor Author

0x26res commented Apr 17, 2023

Are we then saying that:

  • This field is pure meta data and it's left to the user to provide sorted keys?
  • it won't be enforced by arrow?

For context, I'm not really using that field. I just need to be able to access it in order to create slightly modified copies of schemas. For example if I want to change the type of nested fields (int32 -> int64). Then I need to make copy of pa.map_ types, preserving the keys_sorted metadata.

| So maybe the question is do we want to add a check for it in PyArrow Array and Scalar?

Maybe we should create a follow up issue to do this. It would involve making some change that may break some stuff at runtime (if someone was previously providing unsorted data with keys_sorted=True).

As far as this MR is concerned, I think we should just improve the doc for that field (and probably update the doc in the C++ MapType class).

@AlenkaF
Copy link
Member

AlenkaF commented Apr 18, 2023

Yeah, the issue of keys_sorted being only metadata or not should be a separate from this PR. We can open a new issue to start a discussion on that.

I agree that for this PR the aim is to expose the parameter as a property so we are able to get the information from the Data Type.

My suggestion for the docs would be to only make it explicit in PyArrow for example: "Should the entries be sorted according to keys."

@0x26res
Copy link
Contributor Author

0x26res commented Apr 18, 2023

@AlenkaF thanks for the suggestion, I've updated the comment.

Copy link
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Apr 18, 2023
@AlenkaF AlenkaF merged commit 1deb740 into apache:main Apr 18, 2023
@ursabot
Copy link

ursabot commented Apr 20, 2023

Benchmark runs are scheduled for baseline = 9f852d4 and contender = 1deb740. 1deb740 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] test-mac-arm
[Finished ⬇️7.65% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.54% ⬆️0.09%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 1deb740e ec2-t3-xlarge-us-east-2
[Failed] 1deb740e test-mac-arm
[Finished] 1deb740e ursa-i9-9960x
[Finished] 1deb740e ursa-thinkcentre-m75q
[Finished] 9f852d46 ec2-t3-xlarge-us-east-2
[Failed] 9f852d46 test-mac-arm
[Finished] 9f852d46 ursa-i9-9960x
[Finished] 9f852d46 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot
Copy link

ursabot commented Apr 20, 2023

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this pull request May 11, 2023
…#35113)

### Rationale for this change

It not possible to read `keys_sorted` in the python API

### What changes are included in this PR?

- expose keys_sorted in `cdef class MapType` / types.pxi
- add tests

### Are these changes tested?

yes

### Are there any user-facing changes?

We're exposing keys_sorted but I guess the documentation will update itself from the `"""` pydoc (?)

This is not an API breaking change

* Closes: apache#35112

Authored-by: aandres <aandres@tradewelltech.co>
Signed-off-by: Alenka Frim <frim.alenka@gmail.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
…#35113)

### Rationale for this change

It not possible to read `keys_sorted` in the python API

### What changes are included in this PR?

- expose keys_sorted in `cdef class MapType` / types.pxi
- add tests

### Are these changes tested?

yes

### Are there any user-facing changes?

We're exposing keys_sorted but I guess the documentation will update itself from the `"""` pydoc (?)

This is not an API breaking change

* Closes: apache#35112

Authored-by: aandres <aandres@tradewelltech.co>
Signed-off-by: Alenka Frim <frim.alenka@gmail.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this pull request May 16, 2023
…#35113)

### Rationale for this change

It not possible to read `keys_sorted` in the python API

### What changes are included in this PR?

- expose keys_sorted in `cdef class MapType` / types.pxi
- add tests

### Are these changes tested?

yes

### Are there any user-facing changes?

We're exposing keys_sorted but I guess the documentation will update itself from the `"""` pydoc (?)

This is not an API breaking change

* Closes: apache#35112

Authored-by: aandres <aandres@tradewelltech.co>
Signed-off-by: Alenka Frim <frim.alenka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[python] Expose keys_sorted in MapType
4 participants