New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-9967: [Python] Add compute module docs + expose more option classes #8145
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format?
See also: |
edb57cc
to
3f99ae8
Compare
docs/source/cpp/compute.rst
Outdated
@@ -201,6 +201,11 @@ an ``Invalid`` :class:`Status` when overflow is detected. | |||
+--------------------------+------------+--------------------+---------------------+ | |||
| subtract_checked | Binary | Numeric | Numeric | | |||
+--------------------------+------------+--------------------+---------------------+ | |||
| divide | Binary | Numeric | Numeric | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed that this was missing on the c++ page
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thank you! However, can you leave the table alphabetically-sorted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually my bad, you already had it there. Reverted
docs/source/python/api/compute.rst
Outdated
|
||
ascii_lower | ||
ascii_upper | ||
binary_length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not really a transform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's fair. Would Structural Transforms
be a good place for it?
docs/source/python/compute.rst
Outdated
>>> a = pa.array([1, 1, 2, 3]) | ||
>>> pc.sum(a) | ||
<pyarrow.Int64Scalar: 7> | ||
>>> b = pa.array([4, 5, 8, 2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this array is not used? (maybe show pc.equals(a, b) ?)
e79559c
to
d117ed1
Compare
python/pyarrow/compute.py
Outdated
@@ -294,6 +294,46 @@ def mode(array): | |||
return call_function("mode", [array]) | |||
|
|||
|
|||
def and_(x1, x2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK we can't have methods named and
and or
in Python. For now I went with and_
and or_
here, for consistency with and_kleene
and or_kleene
, but for example numpy uses np.logical_and
and np.logical_or
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a function named "and":
>>> from pyarrow import compute
>>> getattr(compute, "and")
<function pyarrow.compute.and(left, right, *, memory_pool=None)>
So you should just fix compute.py
so that it's exported as and_
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Fixed now
mean | ||
min_max | ||
sum | ||
mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this render the docstrings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. It's what's done on the other pages:
https://github.com/apache/arrow/blob/master/docs/source/python/api/datatypes.rst
I'm having trouble building the docs locally (related to ARROW-10018 I think) so haven't been able to check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't render the docstring here, but it will create a separate page for each of those functions and link to that from this table.
(the separate pages might be a bit overkill since the docstrings here are often not that informative yet, but it's indeed how we do it for other functions as well)
c2445c1
to
c1c5846
Compare
Question: in 952649d I'm attempting to expose This is needed to make the |
Cython generally doesn't convert implicitly between arbitrary Python objects and C/C++ types, you have to type your code explicitly. Also, it is better to use def __cinit__(self, int64_t pivot):
self.partition_nth_options.reset(new CPartitionNthOptions(pivot)) |
Thanks @pitrou! That was the problem |
This is ready for re-review. I believe that I've addressed the feedback from previous reviews. I've also now exposed all the option classes so that all the kernels listed in the docs are usable from Python. |
docs/source/python/compute.rst
Outdated
|
||
.. seealso:: | ||
|
||
:ref:`C++ compute functions documentation <cpp-compute>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche Will this work? I'm not 100% sure as I haven't yet compiled the docs to explicitly check
python/pyarrow/_compute.pyx
Outdated
|
||
cdef class VarianceOptions(FunctionOptions): | ||
cdef: | ||
unique_ptr[CVarianceOptions] variance_options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why unique_ptr
is used in some of the other classes, but it shouldn't be necessary. See MinMaxOptions
for an example without.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In PartitionNthOptions
we need it because it doesn't have a null constructor so it can't be allocated on the stack. That's not the case for VarianceOptions
, though, so I'll take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rewritten VarianceOptions
without unique_ptr
.
I think that with the other three options classes exposed in this PR (SetLookupOptions
, PartitionNthOptions
and StrptimeOptions
) it's necessary to use unique_ptr
or, alternatively, we would need to define default constructors
@github-actions crossbow submit test-ubuntu-18.04-docs |
Revision: 73272bf Submitted crossbow builds: ursa-labs/crossbow @ actions-615
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Will merge after a couple changes. Thank you @arw2019 !
CI failures are unrelated. |
Thanks @pitrou @jorisvandenbossche for reviewing! |
…sses apache#8163 exposes `pyarrow.compute` kernels and generates their docstrings. This PR adds documentation for the module in the User Guide and the Python API reference. Closes apache#8145 from arw2019/ARROW-7871 Lead-authored-by: arw2019 <andrew.r.wieteska@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
#8163 exposes
pyarrow.compute
kernels and generates their docstrings. This PR adds documentation for the module in the User Guide and the Python API reference.