-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-34868: [Python] Share docstrings between classes #34894
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good utility for writing docstrings 👍
Do we know if there are other places in pyarrow that would benefit from it?
Do you think we should add those to this PR? I was thinking of doing it in separate PRs, but who knows when that will happen! |
I am not sure we have many places that currently are not in cython, and have repetition in docstrings. Looking through our python files, we do already have some manual templating in parquet/core.py, ipc.py or orc.py (eg to share the explanation of certain parameters between multiple docstrings), but I don't know if they would fit this pattern for using
|
6faa4dc
to
ae963a9
Compare
Updated
|
python/pyarrow/types.py
Outdated
return t.id in _UNION_TYPES | ||
|
||
|
||
@doc(is_null, datatype="nested") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we like nested
or nested type
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose nested type
since it's describing where the type comes from.
python/pyarrow/types.py
Outdated
return t.id == lib.Type_DATE64 | ||
|
||
|
||
@doc(is_null, datatype="logical map") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep "logical" in logical map
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underlying implementation is @GARROW_TYPE_MAP: A repeated struct logical type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove that. We don't have a clear definition of "logical" types in the arrow format spec, and the fact that a map type is basically a list of structs is an implementation detail (and adding "logical" here doesn't help making that clear)
return t.id in _DECIMAL_TYPES | ||
|
||
|
||
@doc(is_null, datatype="decimal128") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added "128" to decimal128
here.
return t.id == lib.Type_DECIMAL128 | ||
|
||
|
||
@doc(is_null, datatype="decimal256") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added "256" to decimal256
here.
python/pyarrow/types.py
Outdated
return t.id == lib.Type_INTERVAL_MONTH_DAY_NANO | ||
|
||
|
||
@doc(is_null, datatype="primitive") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we like primitive
or primitive type
better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose primitive type
since it's describing a family of types.
I applied |
One interesting limitation is if there are docstrings that use curly braces example:
vs.
|
Thanks so much for the updates, the optimisation in the types.py looks great! 👏
Oh, this is not ideal. |
If the example uses curly braces, pyarrow will throw a runtime error:
The fix would be to replace them like this:
This mostly shows up in Cython docstrings, which are not applicable at the moment! |
OK. Well, I do not have much against that (using |
python/pyarrow/types.py
Outdated
return t.id == lib.Type_BOOL | ||
|
||
|
||
@doc(is_null, datatype="integer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@doc(is_null, datatype="integer") | |
@doc(is_null, datatype="any integer") |
I would keep the "any" to clearly indicate this is for any integer (i.e. all bitwidths and signed/unsigned)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! Will update
python/pyarrow/types.py
Outdated
return t.id == lib.Type_DATE64 | ||
|
||
|
||
@doc(is_null, datatype="logical map") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove that. We don't have a clear definition of "logical" types in the arrow format spec, and the fact that a map type is basically a list of structs is an implementation detail (and adding "logical" here doesn't help making that clear)
I think also in a doctest you can use the (and to be clear, the current PR doesn't run into that, right? It's only for when we would want to use it more broadly for docstrings with examples?) |
Correct! There are no uses of |
All feedback is applied! |
Benchmark runs are scheduled for baseline = f4bd43d and contender = cd14e20. cd14e20 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
### Rationale for this change Python classes sometimes duplicate docstrings, but change one word such as class name. Add a decorator function as a utility to help deduplicate docstring descriptions. Only works in Python. Does not work in Cython due to this CPython issue python/cpython#91309. ### What changes are included in this PR? Add a decorator `@ doc` that can copy, concatenate, and/or format docstrings between classes. ### Are these changes tested? Tests added. ``` >>> import pyarrow >>> from pyarrow.filesystem import FileSystem, LocalFileSystem, DaskFileSystem, S3FSWrapper >>> from pyarrow.hdfs import HadoopFileSystem >>> for fs in [FileSystem, LocalFileSystem, DaskFileSystem, S3FSWrapper, HadoopFileSystem]: ... print(fs.__name__) ... print(fs.isdir.__doc__) ... FileSystem Return True if path is a directory. Parameters ---------- path : str Path to check. LocalFileSystem Return True if path is a directory. Parameters ---------- path : str Path to check. DaskFileSystem Return True if path is a directory. Parameters ---------- path : str Path to check. S3FSWrapper Return True if path is a directory. Parameters ---------- path : str Path to check. HadoopFileSystem Return True if path is a directory. Parameters ---------- path : str Path to check. ``` Note that `FileSystem.isdir.__doc__` is not dedented because it does not use the `@ doc` decorator. ### Are there any user-facing changes? No * Closes: apache#34868 Authored-by: Dane Pitkin <dane@voltrondata.com> Signed-off-by: Alenka Frim <frim.alenka@gmail.com>
### Rationale for this change Python classes sometimes duplicate docstrings, but change one word such as class name. Add a decorator function as a utility to help deduplicate docstring descriptions. Only works in Python. Does not work in Cython due to this CPython issue python/cpython#91309. ### What changes are included in this PR? Add a decorator `@ doc` that can copy, concatenate, and/or format docstrings between classes. ### Are these changes tested? Tests added. ``` >>> import pyarrow >>> from pyarrow.filesystem import FileSystem, LocalFileSystem, DaskFileSystem, S3FSWrapper >>> from pyarrow.hdfs import HadoopFileSystem >>> for fs in [FileSystem, LocalFileSystem, DaskFileSystem, S3FSWrapper, HadoopFileSystem]: ... print(fs.__name__) ... print(fs.isdir.__doc__) ... FileSystem Return True if path is a directory. Parameters ---------- path : str Path to check. LocalFileSystem Return True if path is a directory. Parameters ---------- path : str Path to check. DaskFileSystem Return True if path is a directory. Parameters ---------- path : str Path to check. S3FSWrapper Return True if path is a directory. Parameters ---------- path : str Path to check. HadoopFileSystem Return True if path is a directory. Parameters ---------- path : str Path to check. ``` Note that `FileSystem.isdir.__doc__` is not dedented because it does not use the `@ doc` decorator. ### Are there any user-facing changes? No * Closes: apache#34868 Authored-by: Dane Pitkin <dane@voltrondata.com> Signed-off-by: Alenka Frim <frim.alenka@gmail.com>
### Rationale for this change Python classes sometimes duplicate docstrings, but change one word such as class name. Add a decorator function as a utility to help deduplicate docstring descriptions. Only works in Python. Does not work in Cython due to this CPython issue python/cpython#91309. ### What changes are included in this PR? Add a decorator `@ doc` that can copy, concatenate, and/or format docstrings between classes. ### Are these changes tested? Tests added. ``` >>> import pyarrow >>> from pyarrow.filesystem import FileSystem, LocalFileSystem, DaskFileSystem, S3FSWrapper >>> from pyarrow.hdfs import HadoopFileSystem >>> for fs in [FileSystem, LocalFileSystem, DaskFileSystem, S3FSWrapper, HadoopFileSystem]: ... print(fs.__name__) ... print(fs.isdir.__doc__) ... FileSystem Return True if path is a directory. Parameters ---------- path : str Path to check. LocalFileSystem Return True if path is a directory. Parameters ---------- path : str Path to check. DaskFileSystem Return True if path is a directory. Parameters ---------- path : str Path to check. S3FSWrapper Return True if path is a directory. Parameters ---------- path : str Path to check. HadoopFileSystem Return True if path is a directory. Parameters ---------- path : str Path to check. ``` Note that `FileSystem.isdir.__doc__` is not dedented because it does not use the `@ doc` decorator. ### Are there any user-facing changes? No * Closes: apache#34868 Authored-by: Dane Pitkin <dane@voltrondata.com> Signed-off-by: Alenka Frim <frim.alenka@gmail.com>
Rationale for this change
Python classes sometimes duplicate docstrings, but change one word such as class name. Add a decorator function as a utility to help deduplicate docstring descriptions. Only works in Python. Does not work in Cython due to this CPython issue python/cpython#91309.
What changes are included in this PR?
Add a decorator
@doc
that can copy, concatenate, and/or format docstrings between classes.Are these changes tested?
Tests added.
Note that
FileSystem.isdir.__doc__
is not dedented because it does not use the@doc
decorator.Are there any user-facing changes?
No