New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-7874: [Python][Archery] Validate docstrings with numpydoc #6420
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format?
See also: |
Cool! I wanted to try it out, but do you know if there is a way to only run this check with archery? Eg |
I think we should maybe start with a subset of the checks (an overview is here: https://github.com/numpy/numpydoc/blob/5e9a629c5882a32b5539345f807cd8fc8f0dc624/numpydoc/validate.py#L35). Maybe things like all parameters documented, no wrong section names, are most important to start with. |
@jorisvandenbossche SGTM |
Yeah, I'm running it via
|
…with the messages; blacklist a couple of rules
@jorisvandenbossche I've added an standalone archery command for running it, because it requires pyarrow to be avilable for import during runtime. You can also pass a list of rules to ignore or include. archery numpydoc --whitelist ... --blacklist ...
archery numpydoc -w RULE1 -w RULE2 # enable two rules
archery numpydoc -b RULE1 -b RULE2 # desallow two rules It will help with iterating over the high priority violations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@jorisvandenbossche ready for review |
In the output, I see things like
So it seems that for cython methods the parameters are not recognized? (I thought this worked in pandas, though, I might need to check) |
Numpydoc looks for the signature, but: >>> import pyarrow as pa
>>> import inspect
>>> inspect.signature(pa.table)
ValueError: no signature found for builtin <built-in function table> |
Pandas example:
where Timestamp is also defined in cython. |
Ah, but for a function, it also doesn't work in pandas (we don't have many functions from cython directly exposed publicly). And Timestamp is a |
Yeah, cython doesn't set |
What I find surprising is that eg |
Anyway, I think this can be merged? Or should we wait a bit more for someone else to review? |
@jorisvandenbossche I managed to add a workaround, so now we can parse the signature of <Signature (data, names=None, schema=None, metadata=None)> It alseo required some additional gymnastics to get rid of the cython specific typehints, se we convert array(obj, type=None, mask=None, size=None, from_pandas=None,
bool safe=True, MemoryPool memory_pool=None) To: <Signature (obj, type=None, mask=None, size=None, from_pandas=None,
safe=True, memory_pool=None)> |
You can test it with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, merging on green
…pydoc Depends on #6420. Reduces the number of docstring violations from 1335 to 793 (fixes 542). This is going to require more patches, but we need to start somewhere. Closes #6444 from kszucs/docstrings Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
No description provided.