Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Sample Data Broken for Some Users) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128) #21

Open
atc0005 opened this issue Aug 29, 2020 · 7 comments
Labels

Comments

@atc0005
Copy link

atc0005 commented Aug 29, 2020

As noted on GlobalDataverseCommunityConsortium/dataverse-ansible#38, I encountered the following error first when running the Ansible playbook from that repo, then again when following the steps in this repo's README file.

Snippet of the output just prior and then the error message:

Creating dataverse ubiquity-press.json in dataverse :root
{'name': 'Ubiquity Press Dataverse', 'alias': 'ubiquity-press', 'dataverseContacts': [{'contactEmail': 'ubiquity-press@mailinator.com'}], 'affiliation': '', 'description': 'Ubiquity Press is an open access publisher of peer-reviewed, academic journals. Our flexible publishing model makes journals affordable, and enables researchers around the world to find and access the information they need, without barriers. The following gives an overview of how we work. More information can be found in a recent interview with Chronicle of Higher Education: <a href="http://chronicle.com/blogs/profhacker/ubiquity/43312" rel="nofollow" target="_blank">"Open Access Ahoy: An Interview with Ubiquity Press"</a>.', 'dataverseType': 'JOURNALS'}
Dataverse ubiquity-press created.
<Response [201]>
Dataverse ubiquity-press published.
<Response [200]>
Creating dataverse jopd.json in dataverse ubiquity-press
{'name': 'Journal of Open Psychology Data (JOPD) Dataverse', 'alias': 'jopd', 'dataverseContacts': [{'contactEmail': 'jopd@mailinator.com'}], 'affiliation': 'Ubiquity Press', 'description': 'Datasets from data papers published in the Journal of Open Psychology Data (JOPD).', 'dataverseType': 'JOURNALS'}
Dataverse jopd created.
<Response [201]>
Dataverse jopd published.
<Response [200]>
Creating dataset flynn-effect-in-estonia.json in dataverse jopd
Traceback (most recent call last):
  File "create_sample_data.py", line 56, in <module>
    metadata = json.load(f)
  File "/usr/lib64/python3.6/json/__init__.py", line 296, in load
    return loads(fp.read(),
  File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128)

The environment is a CentOS 7 x64 LXD container. I attempted to replicate within a local CentOS 7 x64 VM, but my (unfortunately remote) VMware Workstation environment is acting up. I'll attempt to further replicate in a non-LXD environment when I have more time.

@qqmyers
Copy link
Member

qqmyers commented Aug 29, 2020

FWIW, I think adding ", encoding='utf-8') " to the open calls right before the json.load statements would work, but I just had a similar situation in dataverse-metrics and it turned out I was able to read the unicode in python 3 but not python 2, so there must also be some environment variable (or module?) that can be set (which would explain why this hasn't been seen by others?)

@pdurbin
Copy link
Member

pdurbin commented Aug 31, 2020

@qqmyers thanks for the tip about the Python version.

@atc0005 which version of Python was used above, please?

@donsizemore
Copy link
Contributor

@pdurbin he first hit the bug using dataverse-ansible, which installs 3.6:
https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible/blob/master/tasks/sampledata.yml#L16

@atc0005
Copy link
Author

atc0005 commented Aug 31, 2020

@pdurbin: which version of Python was used above, please?

What @donsizemore said. Please let me know if you need more info.

@djbrooke
Copy link
Contributor

djbrooke commented Dec 4, 2020

Thanks all for the details here. I'm going to get this into a sprint so that we can get it fixed.

@djbrooke djbrooke changed the title UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128) (Sample Data Broken) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128) Jan 6, 2021
@djbrooke djbrooke changed the title (Sample Data Broken) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128) (Sample Data Broken for Some Users) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128) Jan 6, 2021
@djbrooke
Copy link
Contributor

djbrooke commented Jan 6, 2021

  • This could be a python version mis-match - consider asking/telling people to use python 3
  • This could be that there's some malformed(?) UTF-8 characters in the data itself

@djbrooke djbrooke added the Medium label Jan 6, 2021
@atc0005
Copy link
Author

atc0005 commented Jan 7, 2021

  • This could be a python version mis-match - consider asking/telling people to use python 3

If it helps, I believe that I was using Python 3.6 at the time I encountered the issue. The error snippet in the OP suggests this, but it's been long enough since my attempt to load the sample data that I don't recall for sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants