New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Still getting UnicodeEncodeError #11

Closed
MisterGoodcat opened this Issue Jan 4, 2018 · 7 comments

Comments

Projects
None yet
2 participants
@MisterGoodcat
Copy link

MisterGoodcat commented Jan 4, 2018

Hi there. First of all thank you for this tool, which seems very handy.

I ran into some errors with a lot of my repositories though, all related to unicode. I've seen you added fixes for that in the past, but I still receive the following a lot (with both stable and GitHub version):

Blame: 100%|#############################################################################| 1/1 [00:00<00:00, 10.64it/s]
Total commits: 35279
Total files: 1
Total loc: 4
Traceback (most recent call last):
  File "C:\Python27\lib\runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "C:\Python27\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "c:\p\hissrc\src\git-fame\gitfame\__main__.py", line 2, in <module>
    main()  # pragma: no cover
  File "c:\p\hissrc\src\git-fame\gitfame\_gitfame.py", line 271, in main
    run(args)
  File "c:\p\hissrc\src\git-fame\gitfame\_gitfame.py", line 263, in run
    print(tabulate(auth_stats, stats_tot, args["--sort"]))
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\ufffd' in position 1265: character maps to <undefined>

As you can see I've narrowed it down and found one particluar file with only one commit and one author that causes the error. The author is in the format "surname lastname <s.lastname@example.com>", and that seems to be the problem.

I tried using different code pages in the console, but they either don't change anything (850, 1252) or are not supported by Python 2.7 (65001).

Any ideas?

@MisterGoodcat

This comment has been minimized.

Copy link

MisterGoodcat commented Jan 4, 2018

Also tried different hosts (PowerShell, Git Bash), but no luck.

@MisterGoodcat

This comment has been minimized.

Copy link

MisterGoodcat commented Jan 4, 2018

After digging a bit further and replaying the git commands you're using: I realized the output of these commands does \r\n for line endings on my machine. I have little knowledge of Python, but I've seen regex engines not matching \r\n correctly with $ - might that be the case here?

@casperdcl

This comment has been minimized.

Copy link
Owner

casperdcl commented Jan 4, 2018

have you tried exporting PYTHONIOENCODING=utf-8? I'm quite sure your issue is a unicode character in an author's name

@MisterGoodcat

This comment has been minimized.

Copy link

MisterGoodcat commented Jan 4, 2018

Thanks for the quick response.

set PYTHONIOENCODING=utf-8

This fixes the error, but breaks the progress bar of tqdm in a bad way (every step in a new line). Additionally:

chcp 65001

Somewhat fixes the progress bar, but it does show the unicode replacement char from time to time. I can live with that :).

IMO this should be added to the docs somewhere?

@MisterGoodcat

This comment has been minimized.

Copy link

MisterGoodcat commented Jan 4, 2018

Btw. you are correct, even filtering by a single file still lists all the authors of the repo, and one had a German Umlaut in their name. The char also is broken by default but fixed by chcp 65001 also.

@casperdcl

This comment has been minimized.

Copy link
Owner

casperdcl commented Jan 6, 2018

ah, sorry, wasn't reading carefully. You're using Windows, which is, yes, broken. chcp 65001 is a semi-fix, perhaps should mention it in the documentation...

casperdcl added a commit that referenced this issue Jan 6, 2018

@MisterGoodcat

This comment has been minimized.

Copy link

MisterGoodcat commented Jan 12, 2018

Thanks for the fix!

casperdcl added a commit that referenced this issue Jan 22, 2018

casperdcl added a commit that referenced this issue Jan 22, 2018

bump version, merge branch 'unicode'
closes #11 unicode -> '?' on error
argopt -> docopt
logging (--log)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment