Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cp950' codec can't decode byte 0xf0 in position 8324 #30

Closed
tinlokkoo opened this issue Apr 21, 2021 · 4 comments
Closed

cp950' codec can't decode byte 0xf0 in position 8324 #30

tinlokkoo opened this issue Apr 21, 2021 · 4 comments

Comments

@tinlokkoo
Copy link

      File "C:\Users\tinlok\AppData\Local\Temp\pip-req-build-hnmzzea8\setup.py", line 29, in <module>
        long_description = fh.read()
    UnicodeDecodeError: 'cp950' codec can't decode byte 0xf0 in position 8324: illegal multibyte sequence

please modify line 28 in setup.py to

with open("README.md", "r", encoding='utf-8-sig') as fh:
    long_description = fh.read()
@MaartenGr
Copy link
Owner

Could you provide the exact steps you have taken to get to his issue? This helps me in finding out where the issue stems from.

@tinlokkoo
Copy link
Author

just pip install . will cause this error. Some how, windows machine's default decoder is cp950. not utf-8. And for some file, there contains a ByteOrder Mark (BOM) which can be decode nicely with encoding='utf-8-sig'. This is a good practice to declear the file encoding if you know your encoding. So please change it.

@MaartenGr
Copy link
Owner

I will look into this and see if this can be fixed in the next version.

@MaartenGr MaartenGr mentioned this issue May 10, 2021
Merged
MaartenGr added a commit that referenced this issue May 10, 2021
* Use candidate words instead of extracting those from the documents
* Spacy, Gensim, USE, and Custom Backends were added
* Improved imports
* Fix encoding error when locally installing KeyBERT #30
* Improved documentation (ReadMe & MKDocs)
* Add the main tutorial as a shield
* Typos #31, #35
@MaartenGr
Copy link
Owner

This was fixed in the new v0.3 release (#32).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants