Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf8' codec can't decode byte 0xf4 in position 13: invalid continuation byte #194

Closed
lmaroncelli opened this issue Jan 22, 2014 · 2 comments

Comments

@lmaroncelli
Copy link

Hi all,
when I try to rebuild tags I get this error in console

Exception in thread Thread-25:
Traceback (most recent call last):
  File ".\threading.py", line 532, in __bootstrap_inner
  File ".\threading.py", line 484, in run
  File "./ctagsplugin.py", line 118, in run
    result = func(*args, **kwargs)
  File "./ctagsplugin.py", line 862, in build_ctags
    recursive=recursive, cmd=command)
  File "./ctags.py", line 315, in build_ctags
    resort_ctags(tag_file)
  File "./ctags.py", line 348, in resort_ctags
    for line in fh:
  File ".\codecs.py", line 679, in next
  File ".\codecs.py", line 610, in next
  File ".\codecs.py", line 525, in readline
  File ".\codecs.py", line 472, in read
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf4 in position 13: invalid continuation byte
@stephenfin
Copy link
Contributor

This is a highly specific error - likely due to the contents of your files. It's likely you have non-unicode characters in your file. I can't really test for this as I don't use such characters in my files :)

Try to figure out which character(s) are causing issues. If you can suggest a fix, please file pull request. Otherwise, you'll should still be able to use the plugin - only the Show Symbols (Alt + S) command won't work.

PS: I'm going to close this issue for now and it's not something that can be fixed yet. Feel free to reopen if you disagree.

@wangzhen127
Copy link

I encountered similar errors. Sometimes it is hard to dig out where the bad character is. One suggestion is to modify 2 lines in function resort_ctags() in ctags.py to ignore or replace the bad character. For example, to replace the errors with some standard python character, add errors='replace' to the following two lines in function resort_ctags()

with codecs.open(tag_file, encoding='utf-8', errors='replace') as fh:

with codecs.open(tag_file+'_sorted_by_file', 'w', encoding='utf-8', errors='replace') as fw:

My UnicodeDecodeError are gone after manually doing the above.

See also:
http://docs.python.org/2/library/codecs.html#codec-base-classes
The default value for errors is 'strict'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants