Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-9164: [C++] Add embedded documentation to compute functions #8457

Closed
wants to merge 2 commits into from

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Oct 13, 2020

Also re-use the embedded documentations in Python to generate nicer signatures and docstrings.

@pitrou pitrou requested review from wesm and bkietz and removed request for wesm October 13, 2020 17:14
@github-actions
Copy link

@pitrou pitrou force-pushed the ARROW-9164-compute-func-docs branch 2 times, most recently from 2c75b60 to 8010c4e Compare October 13, 2020 18:10
def wrapper(left, right, *, memory_pool=None):
return func.call([left, right], None, memory_pool)
template = _wrapper_template
exec(template.format(func_name=name, args_sig=args_sig), globals(), ns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to set the __signature__ property instead?

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Altogether this looks like a great improvement in the usability of the compute module, thanks for doing this!

cpp/src/arrow/compute/function.h Outdated Show resolved Hide resolved
python/pyarrow/compute.py Show resolved Hide resolved
python/pyarrow/compute.py Show resolved Hide resolved
python/pyarrow/compute.py Outdated Show resolved Hide resolved
python/pyarrow/compute.py Show resolved Hide resolved
option_class = _get_options_class(func)
arg_names = _get_arg_names(func)
args_sig = ', '.join(arg_names)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if option_class is None:
def wrapper(*args, *, memory_pool=None):
return func.call(args, None, memory_pool)
wrapped_template = 'lambda {}, *, memory_pool=None: ()'
else:
def wrapper(*args, *, options=None, memory_pool=None, **kwargs):
options = _handle_options(name, option_class, options, kwargs)
return func.call(args, options, memory_pool)
wrapped_template = 'lambda {}, *, options=None, memory_pool=None, **kwargs: ()'
wrapper.__name__ = name
wrapper.__qualname__ = name
wrapper.__wrapped__ = eval(wrapped_template.format(args_sig))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I didn't know that __wrapped__ was taken up by inspect.signature. Interesting. I'd rather examine this later though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Also the python docstrings ;)

I mostly did a review of the docstrings' content now (and not yet the generation machinery)

cpp/src/arrow/compute/kernels/aggregate_var_std.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/scalar_arithmetic.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/scalar_arithmetic.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/scalar_boolean.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/scalar_arithmetic.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/scalar_validity.cc Outdated Show resolved Hide resolved
FunctionDoc list_parent_indices_doc(
"Compute parent indices of nested list values",
("`lists` must have a list-like type.\n"
"For each value in each list of `lists`, the top-level list index\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but I think that for conceptually more complex functions (like this one, for me), it might be useful to include a small example. But that might get verbose to write in this form ..

For example, for this case: [[1, 2], [3, 4, 5], [6]] returns [0, 0, 1, 1, 1, 2]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not obvious how to make this too unwiedly to write.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going further down the docstring rabbithole: we could introduce a doctest-like class (of which a FunctionDoc contains a vector), which could just be a set of lambdas which generate a set of inputs, options, and maybe the expected output. These can then be rendered lazily in python

cpp/src/arrow/compute/kernels/vector_sort.cc Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_sort.cc Outdated Show resolved Hide resolved
@pitrou pitrou force-pushed the ARROW-9164-compute-func-docs branch from 8010c4e to 2b0e393 Compare October 15, 2020 16:27
@pitrou
Copy link
Member Author

pitrou commented Oct 15, 2020

@nealrichardson @kou This may help for R and Ruby.

@kou
Copy link
Member

kou commented Oct 16, 2020

Wow! Great!
I want not only argument name but also acceptable argument types to generate methods automatically in Ruby. (It'll be a follow-up task.)
For example, if the first argument of function A (x of A(x, y, z)) accepts a table and a record batch, we can add an A method to table and record batch classes:

x = table
x.A(y, z)

x = record_batch
x.A(y, z)

@pitrou
Copy link
Member Author

pitrou commented Oct 19, 2020

Will rebase and merge.

Also re-use the embedded documentations in Python to generate nicer signatures and docstrings.
Expose CountOptions.

Improve introspectability of option classes.
@pitrou pitrou force-pushed the ARROW-9164-compute-func-docs branch from 2b0e393 to 61fc098 Compare October 19, 2020 12:00
@pitrou
Copy link
Member Author

pitrou commented Oct 19, 2020

Travis-CI build: https://travis-ci.org/github/pitrou/arrow/builds/737045923

CI failures are unrelated.

@pitrou pitrou closed this in 69b444b Oct 19, 2020
@pitrou pitrou deleted the ARROW-9164-compute-func-docs branch October 19, 2020 12:56
kszucs pushed a commit that referenced this pull request Oct 19, 2020
Also re-use the embedded documentations in Python to generate nicer signatures and docstrings.

Closes #8457 from pitrou/ARROW-9164-compute-func-docs

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
david1437 pushed a commit to david1437/arrow that referenced this pull request Nov 9, 2020
Also re-use the embedded documentations in Python to generate nicer signatures and docstrings.

Closes apache#8457 from pitrou/ARROW-9164-compute-func-docs

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
Also re-use the embedded documentations in Python to generate nicer signatures and docstrings.

Closes apache#8457 from pitrou/ARROW-9164-compute-func-docs

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants