Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Improve usability of pc.map_lookup / MapLookupOptions #36045

Closed
jorisvandenbossche opened this issue Jun 13, 2023 · 4 comments · Fixed by #36387
Closed

[Python] Improve usability of pc.map_lookup / MapLookupOptions #36045

jorisvandenbossche opened this issue Jun 13, 2023 · 4 comments · Fixed by #36387
Assignees
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 13, 2023

Trying to use pc.map_lookup today, I noticed some minor usability issues that could be improved:

arr = pa.array([{'a': 'a1', 'b': 'b1'}, {'a': 'a2', 'b': 'b2'}, {'a': 'a1', 'b': 'b3'}], 
               type=pa.map_(pa.string(), pa.string()))

1. Passing the key value as a python string instead of scalar errors:

# works fine
>>> pc.map_lookup(arr, pa.scalar("a"), "all")
<pyarrow.lib.ListArray object at 0x7f099d2fabc0>
[
  [
    "a1"
  ],
  [
    "a2"
  ],
  [
    "a1"
  ]
]

# gives confusing error
>>> pc.map_lookup(arr, "a", "all")
...
ArrowInvalid: map_lookup: query_key can't be empty.

If not passing a Scalar object, we should probably just pass the python value to pa.scalar(..) for convenience.

2. Allow to specify the occurence keyword, given that this is not necessarily obvious, it's nice to be able to specify the named argument:

>>> pc.map_lookup(arr, pa.scalar("a"), occurence="first")
...
TypeError: __init__() takes exactly 3 positional arguments (2 given)

I suppose we need to give it a default value for this, though (to make our auto-generation work, or we can look into improving the autogeneration of the signature)

@jorisvandenbossche
Copy link
Member Author

2. Allow to specify the occurence keyword, given that this is not necessarily obvious, it's nice to be able to specify the named argument:

Actually, that errors only because I mis-spelled the keyword name. With the correct name it works:

>>> pc.map_lookup(arr, pa.scalar("a"), occurrence="all")
<pyarrow.lib.ListArray object at 0x7f099c96bd00>
[
  [
    "a1"
  ],
  [
    "a2"
  ],
  [
    "a1"
  ]
]

Wondering if we can provide a better error message in that case, but that might just be a Python/cython behaviour of first checking the positional keywords before the named ones)

@jorisvandenbossche
Copy link
Member Author

Wondering if we can provide a better error message in that case, but that might just be a Python/cython behaviour of first checking the positional keywords before the named ones)

Yes, that's a cython issue -> cython/cython#1281

@R-JunmingChen
Copy link
Contributor

R-JunmingChen commented Jun 29, 2023

  1. Passing the key value as a python string instead of scalar errors:

So, we should only handle the issue one. I can make a quick implementation for it

@R-JunmingChen
Copy link
Contributor

take

jorisvandenbossche added a commit that referenced this issue Jul 5, 2023
…ns (#36387)

### Rationale for this change
This PR is for #36045, which aims to improve usability of pc.map_lookup / MapLookupOptions  

### What changes are included in this PR?
For `query_key` which is not a subclass of `pyarrow.lib.Scalar`, we convert it to `scalar`

### Are these changes tested?

Yes, add one test in tests/test_compute.py::test_map_lookup
* Closes: #36045

Lead-authored-by: Junming Chen <junming.chen.r@outlook.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jorisvandenbossche jorisvandenbossche added this to the 13.0.0 milestone Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants