(Sample Data Broken for Some Users) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128) #21

atc0005 · 2020-08-29T11:56:08Z

As noted on GlobalDataverseCommunityConsortium/dataverse-ansible#38, I encountered the following error first when running the Ansible playbook from that repo, then again when following the steps in this repo's README file.

Snippet of the output just prior and then the error message:

Creating dataverse ubiquity-press.json in dataverse :root
{'name': 'Ubiquity Press Dataverse', 'alias': 'ubiquity-press', 'dataverseContacts': [{'contactEmail': 'ubiquity-press@mailinator.com'}], 'affiliation': '', 'description': 'Ubiquity Press is an open access publisher of peer-reviewed, academic journals. Our flexible publishing model makes journals affordable, and enables researchers around the world to find and access the information they need, without barriers. The following gives an overview of how we work. More information can be found in a recent interview with Chronicle of Higher Education: <a href="http://chronicle.com/blogs/profhacker/ubiquity/43312" rel="nofollow" target="_blank">"Open Access Ahoy: An Interview with Ubiquity Press"</a>.', 'dataverseType': 'JOURNALS'}
Dataverse ubiquity-press created.
<Response [201]>
Dataverse ubiquity-press published.
<Response [200]>
Creating dataverse jopd.json in dataverse ubiquity-press
{'name': 'Journal of Open Psychology Data (JOPD) Dataverse', 'alias': 'jopd', 'dataverseContacts': [{'contactEmail': 'jopd@mailinator.com'}], 'affiliation': 'Ubiquity Press', 'description': 'Datasets from data papers published in the Journal of Open Psychology Data (JOPD).', 'dataverseType': 'JOURNALS'}
Dataverse jopd created.
<Response [201]>
Dataverse jopd published.
<Response [200]>
Creating dataset flynn-effect-in-estonia.json in dataverse jopd
Traceback (most recent call last):
  File "create_sample_data.py", line 56, in <module>
    metadata = json.load(f)
  File "/usr/lib64/python3.6/json/__init__.py", line 296, in load
    return loads(fp.read(),
  File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128)

The environment is a CentOS 7 x64 LXD container. I attempted to replicate within a local CentOS 7 x64 VM, but my (unfortunately remote) VMware Workstation environment is acting up. I'll attempt to further replicate in a non-LXD environment when I have more time.

qqmyers · 2020-08-29T12:09:25Z

FWIW, I think adding ", encoding='utf-8') " to the open calls right before the json.load statements would work, but I just had a similar situation in dataverse-metrics and it turned out I was able to read the unicode in python 3 but not python 2, so there must also be some environment variable (or module?) that can be set (which would explain why this hasn't been seen by others?)

pdurbin · 2020-08-31T14:28:54Z

@qqmyers thanks for the tip about the Python version.

@atc0005 which version of Python was used above, please?

donsizemore · 2020-08-31T14:33:13Z

@pdurbin he first hit the bug using dataverse-ansible, which installs 3.6:
https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible/blob/master/tasks/sampledata.yml#L16

atc0005 · 2020-08-31T14:40:04Z

@pdurbin: which version of Python was used above, please?

What @donsizemore said. Please let me know if you need more info.

djbrooke · 2020-12-04T18:10:17Z

Thanks all for the details here. I'm going to get this into a sprint so that we can get it fixed.

djbrooke · 2021-01-06T19:43:10Z

This could be a python version mis-match - consider asking/telling people to use python 3
This could be that there's some malformed(?) UTF-8 characters in the data itself

atc0005 · 2021-01-07T09:13:06Z

This could be a python version mis-match - consider asking/telling people to use python 3

If it helps, I believe that I was using Python 3.6 at the time I encountered the issue. The error snippet in the OP suggests this, but it's been long enough since my attempt to load the sample data that I don't recall for sure.

atc0005 mentioned this issue Aug 29, 2020

TASK [dataverse : run sampledata] fails with "UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128)" GlobalDataverseCommunityConsortium/dataverse-ansible#38

Closed

djbrooke added this to Up Next 🛎 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Dec 4, 2020

djbrooke changed the title ~~UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128)~~ (Sample Data Broken) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128) Jan 6, 2021

djbrooke changed the title ~~(Sample Data Broken) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128)~~ (Sample Data Broken for Some Users) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128) Jan 6, 2021

djbrooke added the Medium label Jan 6, 2021

jcohenadad mentioned this issue Apr 19, 2021

Interactive QC assessment: Add Pass/Fail/Artifact and download YAML file spinalcordtoolbox/spinalcordtoolbox#3253

Merged

22 tasks

djbrooke removed this from Up Next 🛎 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Sep 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Sample Data Broken for Some Users) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128) #21

(Sample Data Broken for Some Users) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128) #21

atc0005 commented Aug 29, 2020

qqmyers commented Aug 29, 2020

pdurbin commented Aug 31, 2020

donsizemore commented Aug 31, 2020

atc0005 commented Aug 31, 2020

djbrooke commented Dec 4, 2020

djbrooke commented Jan 6, 2021

atc0005 commented Jan 7, 2021

(Sample Data Broken for Some Users) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128) #21

(Sample Data Broken for Some Users) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 834: ordinal not in range(128) #21

Comments

atc0005 commented Aug 29, 2020

qqmyers commented Aug 29, 2020

pdurbin commented Aug 31, 2020

donsizemore commented Aug 31, 2020

atc0005 commented Aug 31, 2020

djbrooke commented Dec 4, 2020

djbrooke commented Jan 6, 2021

atc0005 commented Jan 7, 2021