Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Sharing docstrings between classes #34868

Closed
danepitkin opened this issue Apr 3, 2023 · 1 comment · Fixed by #34894
Closed

[Python] Sharing docstrings between classes #34868

danepitkin opened this issue Apr 3, 2023 · 1 comment · Fixed by #34894

Comments

@danepitkin
Copy link
Member

Describe the enhancement requested

PyArrow duplicates a lot of documentation in order to provide explicit docstring examples. Let's reduce the duplication of docstrings by providing a way to share docstrings between classes. See the way pandas did this as an example: https://pandas.pydata.org/docs/development/contributing_docstring.html#sharing-docstrings

A good example of duplication in PyArrow are the classes Table and RecordBatch. They both provide similar, sometimes identical, top-level implementations and docstrings, while typically only differing in low-level C++ implementation.

Here is an example of duplicative docstring descriptions.
class RecordBatch:

    @property
    def nbytes(self):
        """
        Total number of bytes consumed by the elements of the record batch.

        In other words, the sum of bytes from all buffer ranges referenced.

        Unlike `get_total_buffer_size` this method will account for array
        offsets.

        If buffers are shared between arrays then the shared
        portion will only be counted multiple times.

        The dictionary of dictionary arrays will always be counted in their
        entirety even if the array only references a portion of the dictionary.

        Examples
        --------
        >>> import pyarrow as pa
        >>> n_legs = pa.array([2, 2, 4, 4, 5, 100])
        >>> animals = pa.array(["Flamingo", "Parrot", "Dog", "Horse", "Brittle stars", "Centipede"])
        >>> batch = pa.RecordBatch.from_arrays([n_legs, animals],
        ...                                     names=["n_legs", "animals"])
        >>> batch.nbytes
        116
        """
        ...

class Table:

    @property
    def nbytes(self):
        """
        Total number of bytes consumed by the elements of the table.

        In other words, the sum of bytes from all buffer ranges referenced.

        Unlike `get_total_buffer_size` this method will account for array
        offsets.

        If buffers are shared between arrays then the shared
        portion will only be counted multiple times.

        The dictionary of dictionary arrays will always be counted in their
        entirety even if the array only references a portion of the dictionary.

        Examples
        --------
        >>> import pyarrow as pa
        >>> import pandas as pd
        >>> df = pd.DataFrame({'n_legs': [None, 4, 5, None],
        ...                    'animals': ["Flamingo", "Horse", None, "Centipede"]})
        >>> table = pa.Table.from_pandas(df)
        >>> table.nbytes
        72
        """
        ...

Component(s)

Python

@danepitkin
Copy link
Member Author

This won't work for Cython until this issue is fixed: python/cpython#91309

danepitkin added a commit to danepitkin/arrow that referenced this issue Apr 4, 2023
danepitkin added a commit to danepitkin/arrow that referenced this issue Apr 4, 2023
danepitkin added a commit to danepitkin/arrow that referenced this issue Apr 6, 2023
AlenkaF pushed a commit that referenced this issue Apr 24, 2023
### Rationale for this change

Python classes sometimes duplicate docstrings, but change one word such as class name. Add a decorator function as a utility to help deduplicate docstring descriptions. Only works in Python. Does not work in Cython due to this CPython issue python/cpython#91309.

### What changes are included in this PR?

Add a decorator `@ doc` that can copy, concatenate, and/or format docstrings between classes.

### Are these changes tested?

Tests added. 

```
>>> import pyarrow
>>> from pyarrow.filesystem import FileSystem, LocalFileSystem, DaskFileSystem, S3FSWrapper
>>> from pyarrow.hdfs import HadoopFileSystem
>>> for fs in [FileSystem, LocalFileSystem, DaskFileSystem, S3FSWrapper, HadoopFileSystem]:
...     print(fs.__name__)
...     print(fs.isdir.__doc__)
... 
FileSystem

        Return True if path is a directory.

        Parameters
        ----------
        path : str
            Path to check.
        
LocalFileSystem

Return True if path is a directory.

Parameters
----------
path : str
    Path to check.

DaskFileSystem

Return True if path is a directory.

Parameters
----------
path : str
    Path to check.

S3FSWrapper

Return True if path is a directory.

Parameters
----------
path : str
    Path to check.

HadoopFileSystem

Return True if path is a directory.

Parameters
----------
path : str
    Path to check.

```
Note that `FileSystem.isdir.__doc__` is not dedented because it does not use the `@ doc` decorator.

### Are there any user-facing changes?

No
* Closes: #34868

Authored-by: Dane Pitkin <dane@voltrondata.com>
Signed-off-by: Alenka Frim <frim.alenka@gmail.com>
@AlenkaF AlenkaF added this to the 13.0.0 milestone Apr 24, 2023
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this issue May 11, 2023
### Rationale for this change

Python classes sometimes duplicate docstrings, but change one word such as class name. Add a decorator function as a utility to help deduplicate docstring descriptions. Only works in Python. Does not work in Cython due to this CPython issue python/cpython#91309.

### What changes are included in this PR?

Add a decorator `@ doc` that can copy, concatenate, and/or format docstrings between classes.

### Are these changes tested?

Tests added. 

```
>>> import pyarrow
>>> from pyarrow.filesystem import FileSystem, LocalFileSystem, DaskFileSystem, S3FSWrapper
>>> from pyarrow.hdfs import HadoopFileSystem
>>> for fs in [FileSystem, LocalFileSystem, DaskFileSystem, S3FSWrapper, HadoopFileSystem]:
...     print(fs.__name__)
...     print(fs.isdir.__doc__)
... 
FileSystem

        Return True if path is a directory.

        Parameters
        ----------
        path : str
            Path to check.
        
LocalFileSystem

Return True if path is a directory.

Parameters
----------
path : str
    Path to check.

DaskFileSystem

Return True if path is a directory.

Parameters
----------
path : str
    Path to check.

S3FSWrapper

Return True if path is a directory.

Parameters
----------
path : str
    Path to check.

HadoopFileSystem

Return True if path is a directory.

Parameters
----------
path : str
    Path to check.

```
Note that `FileSystem.isdir.__doc__` is not dedented because it does not use the `@ doc` decorator.

### Are there any user-facing changes?

No
* Closes: apache#34868

Authored-by: Dane Pitkin <dane@voltrondata.com>
Signed-off-by: Alenka Frim <frim.alenka@gmail.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
### Rationale for this change

Python classes sometimes duplicate docstrings, but change one word such as class name. Add a decorator function as a utility to help deduplicate docstring descriptions. Only works in Python. Does not work in Cython due to this CPython issue python/cpython#91309.

### What changes are included in this PR?

Add a decorator `@ doc` that can copy, concatenate, and/or format docstrings between classes.

### Are these changes tested?

Tests added. 

```
>>> import pyarrow
>>> from pyarrow.filesystem import FileSystem, LocalFileSystem, DaskFileSystem, S3FSWrapper
>>> from pyarrow.hdfs import HadoopFileSystem
>>> for fs in [FileSystem, LocalFileSystem, DaskFileSystem, S3FSWrapper, HadoopFileSystem]:
...     print(fs.__name__)
...     print(fs.isdir.__doc__)
... 
FileSystem

        Return True if path is a directory.

        Parameters
        ----------
        path : str
            Path to check.
        
LocalFileSystem

Return True if path is a directory.

Parameters
----------
path : str
    Path to check.

DaskFileSystem

Return True if path is a directory.

Parameters
----------
path : str
    Path to check.

S3FSWrapper

Return True if path is a directory.

Parameters
----------
path : str
    Path to check.

HadoopFileSystem

Return True if path is a directory.

Parameters
----------
path : str
    Path to check.

```
Note that `FileSystem.isdir.__doc__` is not dedented because it does not use the `@ doc` decorator.

### Are there any user-facing changes?

No
* Closes: apache#34868

Authored-by: Dane Pitkin <dane@voltrondata.com>
Signed-off-by: Alenka Frim <frim.alenka@gmail.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this issue May 16, 2023
### Rationale for this change

Python classes sometimes duplicate docstrings, but change one word such as class name. Add a decorator function as a utility to help deduplicate docstring descriptions. Only works in Python. Does not work in Cython due to this CPython issue python/cpython#91309.

### What changes are included in this PR?

Add a decorator `@ doc` that can copy, concatenate, and/or format docstrings between classes.

### Are these changes tested?

Tests added. 

```
>>> import pyarrow
>>> from pyarrow.filesystem import FileSystem, LocalFileSystem, DaskFileSystem, S3FSWrapper
>>> from pyarrow.hdfs import HadoopFileSystem
>>> for fs in [FileSystem, LocalFileSystem, DaskFileSystem, S3FSWrapper, HadoopFileSystem]:
...     print(fs.__name__)
...     print(fs.isdir.__doc__)
... 
FileSystem

        Return True if path is a directory.

        Parameters
        ----------
        path : str
            Path to check.
        
LocalFileSystem

Return True if path is a directory.

Parameters
----------
path : str
    Path to check.

DaskFileSystem

Return True if path is a directory.

Parameters
----------
path : str
    Path to check.

S3FSWrapper

Return True if path is a directory.

Parameters
----------
path : str
    Path to check.

HadoopFileSystem

Return True if path is a directory.

Parameters
----------
path : str
    Path to check.

```
Note that `FileSystem.isdir.__doc__` is not dedented because it does not use the `@ doc` decorator.

### Are there any user-facing changes?

No
* Closes: apache#34868

Authored-by: Dane Pitkin <dane@voltrondata.com>
Signed-off-by: Alenka Frim <frim.alenka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants