New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_Phylo.py UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 #1320
Comments
See also https://github.com/biopython/biopython/blob/biopython-170/Tests/test_Phylo.py#L56 def test_unicode_exception(self):
"""Read a Newick file with a unicode byte order mark (BOM)."""
if sys.version_info[0] < 3:
self.assertRaises(NewickIO.NewickError, Phylo.read, EX_NEWICK_BOM, "newick")
else:
# Must specify the encoding on Windows
with open(EX_NEWICK_BOM, encoding="utf-8") as handle:
tree = Phylo.read(handle, 'newick')
self.assertEqual(len(tree.get_terminals()), 3) From 10fadab |
This also breaks some of the Phylo examples in
|
And |
This seems to help, but would need work for output too...
|
As I mentioned briefly in PR #1808, I encountered the same issue (or a closely related one) to this issue and also issues #1321 and #669. I had Changing
But I am not sure how to best to fix this within Biopython. It is possible to inspect and modify the locale via the Python EDIT: my setup:
|
From #855, it seems for the XML files the default encoding problem when loading the files can be side-stepped by opening the files in binary mode (and letting the XML parser handle the encoding settings), which is what I tried in #1320 (comment) @chris-rands If you'd like to explore this, I suggest using that as a starting point. |
I was just testing this again but with Python 3.7, and found all tests now pass on my system with LANG C, so I think this has been fixed for >=3.7. PEP 538 seems to explains the relevant changes. |
If these problems will "go away" with Python 3.7 onwards, that is good news. If there are any simple changes we can make for Python 3.4, 3.5, 3.6, even better. |
It probably did not entirely go away with Python 3.7. See https://github.com/biopython/biopython/runs/6866566839?check_suite_focus=true |
Spin out from #855 which was specifically for
test_NCBIXML.py
but has the same root cause.Some of the test XML files contain a non-ASCII accented character:
Note while
PhyloXML/distribution.xml
fails to do so,PhyloXML/phyloxml_examples.xml
does define an encoding,Testing with Biopython 1.70 with my default locale, everything is fine as a UTF8 encoding is the default. However, under some systems (including the
multibuild
systems for compiling wheels), you can get a default encoding of ascii.The failure can be recreated under Python 3 as follows, here on Mac OS X using Python 3.6:
We can probably fix this by opening the XML files in binary mode, I have a pull request pending which already does this for the related test failures in other modules.
CC @etal
The text was updated successfully, but these errors were encountered: