Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codec error when converting movie lens dataset #94

Open
guedes-joaofelipe opened this issue Jul 24, 2021 · 3 comments
Open

Codec error when converting movie lens dataset #94

guedes-joaofelipe opened this issue Jul 24, 2021 · 3 comments
Assignees

Comments

@guedes-joaofelipe
Copy link

guedes-joaofelipe commented Jul 24, 2021

I followed the instructions on Readme.md to download and convert the movie lens dataset but I got the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2892: invalid continuation byte

Just changed the pd.read_csv method on file convertion_tools/src/extended_dataset.py (line 52) to include an encoding argument and fix the problem.

pd.read_csv(self.item_file, delimiter=self.item_sep, header=None, engine='python', encoding = "ISO-8859-1")

@EliverQ
Copy link
Collaborator

EliverQ commented Jul 26, 2021

Hi, @guedes-joaofelipe! Thank you for your issue, but we can't reproduce the problem here. So could you please check your dataset and your environment again?

@ZZZZZZZZeng
Copy link

I had the same problem.

@ZZZZZZZZeng
Copy link

@EliverQ I had the same problem,When I convert the yelp data set on windows。

Traceback (most recent call last):
File "run.py", line 40, in
datasets.convert_inter()
File "D:\学业\研究生\数据集\数据集转换程序\RecSysDatasets-master\conversion_tools\src\extended_dataset.py", line 4581, in convert_inter
for _ in fin:
UnicodeDecodeError: 'gbk' codec can't decode byte 0x8b in position 1909: illegal multibyte sequence

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants