New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pydocstyle 2.0.0 raises UnicodeDecodeError #258

Closed
kashewnuts opened this Issue May 8, 2017 · 8 comments

Comments

Projects
None yet
3 participants
@kashewnuts
Copy link

kashewnuts commented May 8, 2017

Problems summary

UnicodeDecodeError occurs when executing a command with a file containing a Unicode character string.

Expected

To succeed the command and not cause an error.

Environment Information

  • OS: CentOS release 6.5 (Final)
  • Python Versions: 2.7.7
  • requirements.txt
pydocstyle==2.0.0

Error Messages

[vagrant@localhost apps]$ pydocstyle
/home/vagrant/.virtualenvs/spamspam/lib/python2.7/site-packages/snowballstemmer/basestemmer.py:122: UnicodeWarning: Unicode unequal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if self.current[self.cursor:self.cursor + s_size] != s:
/home/vagrant/.virtualenvs/spamspam/lib/python2.7/site-packages/snowballstemmer/basestemmer.py:130: UnicodeWarning: Unicode unequal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if self.current[self.cursor - s_size:self.cursor] != s:
Traceback (most recent call last):
  File "/home/vagrant/.virtualenvs/spamspam/bin/pydocstyle", line 11, in <module>
    sys.exit(main())
  File "/home/vagrant/.virtualenvs/spamspam/lib/python2.7/site-packages/pydocstyle/cli.py", line 68, in main
    sys.exit(run_pydocstyle())
  File "/home/vagrant/.virtualenvs/spamspam/lib/python2.7/site-packages/pydocstyle/cli.py", line 45, in run_pydocstyle
    ignore_decorators=ignore_decorators))
  File "/home/vagrant/.virtualenvs/spamspam/lib/python2.7/site-packages/pydocstyle/checker.py", line 695, in check
    ignore_decorators):
  File "/home/vagrant/.virtualenvs/spamspam/lib/python2.7/site-packages/pydocstyle/checker.py", line 75, in check_source
    definition.docstring)
  File "/home/vagrant/.virtualenvs/spamspam/lib/python2.7/site-packages/pydocstyle/checker.py", line 375, in check_imperative_mood
    correct_form = IMPERATIVE_VERBS.get(stem(check_word))
  File "/home/vagrant/.virtualenvs/spamspam/lib/python2.7/site-packages/snowballstemmer/basestemmer.py", line 342, in stemWord
    result = self._stem_word(word)
  File "/home/vagrant/.virtualenvs/spamspam/lib/python2.7/site-packages/snowballstemmer/basestemmer.py", line 326, in _stem_word
    self._stem()
  File "/home/vagrant/.virtualenvs/spamspam/lib/python2.7/site-packages/snowballstemmer/english_stemmer.py", line 999, in _stem
    if not self.r_prelude():
  File "/home/vagrant/.virtualenvs/spamspam/lib/python2.7/site-packages/snowballstemmer/english_stemmer.py", line 247, in r_prelude
    if not self.slice_from(u"Y"):
  File "/home/vagrant/.virtualenvs/spamspam/lib/python2.7/site-packages/snowballstemmer/basestemmer.py", line 286, in slice_from
    self.replace_s(self.bra, self.ket, s)
  File "/home/vagrant/.virtualenvs/spamspam/lib/python2.7/site-packages/snowballstemmer/basestemmer.py", line 267, in replace_s
    self.current = self.current[0:c_bra] + s + self.current[c_ket:]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 8: ordinal not in range(128)
@Nurdok

This comment has been minimized.

Copy link
Member

Nurdok commented May 8, 2017

Hi @kashewnuts. Thanks for reporting this. This is a serious bug and I will do my best to have a patch version available as soon as I can.

@kashewnuts

This comment has been minimized.

Copy link

kashewnuts commented May 8, 2017

Hi @Nurdok. Thank you for replying us soon. I hope this problem will be solved.

@Nurdok

This comment has been minimized.

Copy link
Member

Nurdok commented May 12, 2017

@kashewnuts I'm having trouble reproducing this on my machine. Can you share the file this happened with, or create a Short, Self Contained, Correct Example?

@kashewnuts

This comment has been minimized.

Copy link

kashewnuts commented May 12, 2017

Sorry, there is a lack of information.
It seems to happen when the Unicode character and 'y' are included in one line of docstring.

# coding: utf-8

# Occur UnicodeDecodeError
def sample():
    """あy
    """
    pass

# Don't occur UnicodeDecodeError
def sample2():
    """
    y
    """
    pass

So, if you are writing a docstring that contains words like "display" or "key", UnicodeDecodeError will do.

@Nurdok

This comment has been minimized.

Copy link
Member

Nurdok commented May 12, 2017

Wow, that is... oddly specific. I'll try to take a look at it soon.

@sigmavirus24

This comment has been minimized.

Copy link
Member

sigmavirus24 commented May 16, 2017

@kashewnuts can you also provide the output of running locale and what (if anything) you have PYTHONIOENCODING set to?

@kashewnuts

This comment has been minimized.

Copy link

kashewnuts commented May 16, 2017

[vagrant@localhost ~]$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

PYTHONIOENCODING are not set.

@Nurdok

This comment has been minimized.

Copy link
Member

Nurdok commented Jun 9, 2017

@kashewnuts sorry for the delay. I'm working on fixing this and will post a PR today. However, note that you did not use a Unicode string! That's why snowballstemmer fails. pydocstyle should still handle this case, but if this is still an issue for you, adding u to the docstring will fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment