New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve imperative mood check using stemmer/wordlist #235

Merged
merged 8 commits into from Mar 7, 2017

Conversation

Projects
2 participants
@lordmauve
Copy link
Contributor

lordmauve commented Feb 6, 2017

This is a refresh of PR #155, to bring it up to date. This addresses issue #68. See these issues for comprehensive details of what is included in this pull request, but to summarise:

Add an improved imperative mood heuristic based on word lists, matched using a Porter stemmer.

We add a list of common verbs in the imperative form. If the stemmed form of a word matches any
stemmed form in this list, then it's probably a verb. But if the actual form we found doesn't match the
imperative form, we detect an error.

Additionally, there is a list of words commonly found in docstrings which are not verbs.

PR #155 was out of date - it preceded the rename to pydocstyle, conversion to a package, and dropping support for Python 3.2. Git merging was practically impossible so it has been abandoned.

@Nurdok

This comment has been minimized.

Copy link
Member

Nurdok commented Feb 11, 2017

Thanks you for opening this PR! Please be patient as it sometimes take me a while to give a proper review.

@Nurdok

This comment has been minimized.

Copy link
Member

Nurdok commented Feb 25, 2017

On the whole, this looks good. One thing that bothers me a bit is the source of the word lists. Where did you get them from? Scraping GitHub for docstrings? If we were to make an attempt to expand the list, how would we go about that?

@lordmauve

This comment has been minimized.

Copy link
Contributor

lordmauve commented Feb 25, 2017

Yes, I did it by scraping, but in a very large, proprietary codebase (tens of millions of lines), and I carefully groomed the resulting list, going from most common words downwards.

As mentioned in #155, this covered about 80% of real docstrings and was well into the long tail of words that might appear.

@Nurdok
Copy link
Member

Nurdok left a comment

I have some nit-picks, but generally this is great. Thanks!

@@ -202,7 +202,9 @@ def to_rst(cls):
D400 = D4xx.create_error('D400', 'First line should end with a period',
'not {0!r}')
D401 = D4xx.create_error('D401', 'First line should be in imperative mood',
'{0!r}, not {1!r}')
"'{0}', not '{1}'")
D401b = D4xx.create_error('D401', 'First line should be in imperative mood; '

This comment has been minimized.

@Nurdok

Nurdok Feb 27, 2017

Member

I don't think this will go well when generating the docs from errors. Did you try to build the docs with this branch?

This comment has been minimized.

@lordmauve

lordmauve Feb 27, 2017

Contributor

I did not. The problem here is that on Python 2, unicode strings have a repr like u'foo' which is both unsightly and causes test failures.

This comment has been minimized.

@lordmauve

lordmauve Feb 27, 2017

Contributor

I checked, this makes no difference. Only the first and second arguments here are used in the docs.

"""
text = pkgutil.get_data('pydocstyle', 'data/' + name).decode('utf8')
for l in text.splitlines():

This comment has been minimized.

@Nurdok

Nurdok Feb 27, 2017

Member

Please don't use one-letter variables - line is good here.

This comment has been minimized.

@lordmauve

lordmauve Feb 27, 2017

Contributor

I prefer one-letter variables when they have very ephemeral scope! But I take your point, and l is specifically forbidden by PEP8.

:return:
:returns
:returns:
:rtype

This comment has been minimized.

@Nurdok

Nurdok Feb 27, 2017

Member

Is in necessary to put the colons here? We can just check for that when checking for D401.

This comment has been minimized.

@lordmauve

lordmauve Feb 27, 2017

Contributor

These slipped in from the automated scanning. Should probably remove them.

@Nurdok Nurdok added this to In Progress / Assigned in Pydocstyle 2.0.0 Mar 4, 2017

setup.py Outdated
@@ -31,6 +31,10 @@
keywords='pydocstyle, PEP 257, pep257, PEP 8, pep8, docstrings',
packages=('pydocstyle',),
package_dir={'': 'src'},
package_data={'pydocstyle': ['data/*.txt']},
install_requires=[
'snowballstemmer==1.2.1',

This comment has been minimized.

@Nurdok

Nurdok Mar 7, 2017

Member

It's okay to pin version in the development requirements, but the install requirements should be as loose as possible, so please remove the version number.

@lordmauve

This comment has been minimized.

Copy link
Contributor

lordmauve commented Mar 7, 2017

Thanks, I have done so.

@Nurdok

Nurdok approved these changes Mar 7, 2017

@Nurdok Nurdok merged commit 286d933 into PyCQA:master Mar 7, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@lordmauve

This comment has been minimized.

Copy link
Contributor

lordmauve commented Mar 7, 2017

Thank you!

@Nurdok Nurdok moved this from In Progress / Assigned to Done in Pydocstyle 2.0.0 Mar 10, 2017

@lordmauve lordmauve deleted the lordmauve:imperative-mood-heuristic-2 branch Dec 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment