Skip to content

ARROW-7898: [Python] Reduce the number docstring violations using numpydoc#6444

Closed
kszucs wants to merge 13 commits intoapache:masterfrom
kszucs:docstrings
Closed

ARROW-7898: [Python] Reduce the number docstring violations using numpydoc#6444
kszucs wants to merge 13 commits intoapache:masterfrom
kszucs:docstrings

Conversation

@kszucs
Copy link
Member

@kszucs kszucs commented Feb 18, 2020

Depends on #6420.

Reduces the number of docstring violations from 1335 to 793 (fixes 542).

This is going to require more patches, but we need to start somewhere.

@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of change is pointless. Can't you disable this rule in Numpydoc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have actually disabled it, because cython's embedsignature directive prepends the docstring with the signature, which always violates the numpydoc rule.

I also tried to disable the embedsignature, but it is useful for identifying the object from the CLI output.

BTW I'm not sure what's the right policy on this, because numpydoc rule explicitly states that:

GL01: Docstring text (summary) should start in the line immediately after the opening quotes (not in the same line, or leaving a blank line in between)

But the examples from the guide violates that :)

cc @jorisvandenbossche

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the examples from the guide violates that :)

See numpy/numpydoc#241

Historic background: the validation rules were kind of blindly copied from pandas to numpydoc, and thus there are indeed some rules that numpydoc itself doesn't necessarily follow (because in pandas we decided to make a certain choice where the numpydoc or PEP257 is ambiguous or allows different options).

For that reason I also mentioned before we should probably pick a few rules that we think are important (like validating that all parameters have a description).

@kszucs kszucs changed the title [Python] Docstring format fixes ARROW-7898: [Python] Fix docstring formatting issues using numpydoc Feb 20, 2020
@kszucs kszucs marked this pull request as ready for review February 20, 2020 18:26
@github-actions
Copy link

@kszucs kszucs changed the title ARROW-7898: [Python] Fix docstring formatting issues using numpydoc ARROW-7898: [Python] Reduce the number docstring violations using numpydoc Feb 24, 2020
@nealrichardson
Copy link
Member

What's the status here--are we moving forward with this?

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line break after the """ is fine with me, that's always been my preferred docstring style anyhow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of change seems overly pedantic, can we turn this warning off?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although here it's maybe less useful, for longer docstrings I personally find the general rule of "first line should be a short summary" a good rule

The first sentence is also used for autosummary tables (although in this case the first sentence is short)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like the short summary, but of course we can turn it off by default.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with going ahead with this PR (besides the formatting changes, there are also valuable docstring additions included).

Formatting will always be something subjective. For example, I find the boolean -> bool changes overly pedantic, while I like the "first line summary" rule. So I think following a standard, or at least those subset of rules that have enough support amongst us, is useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although here it's maybe less useful, for longer docstrings I personally find the general rule of "first line should be a short summary" a good rule

The first sentence is also used for autosummary tables (although in this case the first sentence is short)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's indeed a numpydoc requirement (as far as I know, it will render incorrectly when not having the required spaces around it).

I also don't like this that much (although I am used to it now, it's in my muscle memory ..), but to relax this, that's a discussion to have in the numpydoc project I think

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Mar 18, 2020

We probably shouldn't wait too long with this (rebasing will get harder). OK with moving forward with this?
If so, @kszucs do you have time to update this?

@kszucs
Copy link
Member Author

kszucs commented Mar 24, 2020

@jorisvandenbossche updated, ready for review again and/or for merge.

Copy link
Member Author

@kszucs kszucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, merging.

@kszucs kszucs closed this in b07c262 Mar 25, 2020
@jorisvandenbossche
Copy link
Member

Thanks @kszucs !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments