ARROW-7898: [Python] Reduce the number docstring violations using numpydoc#6444
ARROW-7898: [Python] Reduce the number docstring violations using numpydoc#6444kszucs wants to merge 13 commits intoapache:masterfrom
Conversation
|
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format? See also: |
python/pyarrow/_fs.pyx
Outdated
There was a problem hiding this comment.
This type of change is pointless. Can't you disable this rule in Numpydoc?
There was a problem hiding this comment.
I have actually disabled it, because cython's embedsignature directive prepends the docstring with the signature, which always violates the numpydoc rule.
I also tried to disable the embedsignature, but it is useful for identifying the object from the CLI output.
BTW I'm not sure what's the right policy on this, because numpydoc rule explicitly states that:
GL01: Docstring text (summary) should start in the line immediately after the opening quotes (not in the same line, or leaving a blank line in between)
But the examples from the guide violates that :)
There was a problem hiding this comment.
But the examples from the guide violates that :)
Historic background: the validation rules were kind of blindly copied from pandas to numpydoc, and thus there are indeed some rules that numpydoc itself doesn't necessarily follow (because in pandas we decided to make a certain choice where the numpydoc or PEP257 is ambiguous or allows different options).
For that reason I also mentioned before we should probably pick a few rules that we think are important (like validating that all parameters have a description).
|
What's the status here--are we moving forward with this? |
wesm
left a comment
There was a problem hiding this comment.
The line break after the """ is fine with me, that's always been my preferred docstring style anyhow
python/pyarrow/hdfs.py
Outdated
There was a problem hiding this comment.
This type of change seems overly pedantic, can we turn this warning off?
There was a problem hiding this comment.
Although here it's maybe less useful, for longer docstrings I personally find the general rule of "first line should be a short summary" a good rule
The first sentence is also used for autosummary tables (although in this case the first sentence is short)
There was a problem hiding this comment.
I also like the short summary, but of course we can turn it off by default.
jorisvandenbossche
left a comment
There was a problem hiding this comment.
I am fine with going ahead with this PR (besides the formatting changes, there are also valuable docstring additions included).
Formatting will always be something subjective. For example, I find the boolean -> bool changes overly pedantic, while I like the "first line summary" rule. So I think following a standard, or at least those subset of rules that have enough support amongst us, is useful.
python/pyarrow/hdfs.py
Outdated
There was a problem hiding this comment.
Although here it's maybe less useful, for longer docstrings I personally find the general rule of "first line should be a short summary" a good rule
The first sentence is also used for autosummary tables (although in this case the first sentence is short)
python/pyarrow/jvm.py
Outdated
There was a problem hiding this comment.
It's indeed a numpydoc requirement (as far as I know, it will render incorrectly when not having the required spaces around it).
I also don't like this that much (although I am used to it now, it's in my muscle memory ..), but to relax this, that's a discussion to have in the numpydoc project I think
|
We probably shouldn't wait too long with this (rebasing will get harder). OK with moving forward with this? |
Co-Authored-By: Antoine Pitrou <pitrou@free.fr>
|
@jorisvandenbossche updated, ready for review again and/or for merge. |
|
Thanks @kszucs ! |
Depends on #6420.
Reduces the number of docstring violations from 1335 to 793 (fixes 542).
This is going to require more patches, but we need to start somewhere.