Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_Phylo.py UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 #1320

Open
peterjc opened this issue Jul 13, 2017 · 9 comments
Open
Assignees

Comments

@peterjc
Copy link
Member

peterjc commented Jul 13, 2017

Spin out from #855 which was specifically for test_NCBIXML.py but has the same root cause.

Some of the test XML files contain a non-ASCII accented character:

$ grep $'\xc3\xbc' PhyloXML/*.xml
PhyloXML/distribution.xml:            <desc>ETH Zürich</desc>
PhyloXML/phyloxml_examples.xml:                  <desc>ETH Zürich</desc>

Note while PhyloXML/distribution.xml fails to do so, PhyloXML/phyloxml_examples.xml does define an encoding,

<?xml version="1.0" encoding="UTF-8"?>

Testing with Biopython 1.70 with my default locale, everything is fine as a UTF8 encoding is the default. However, under some systems (including the multibuild systems for compiling wheels), you can get a default encoding of ascii.

The failure can be recreated under Python 3 as follows, here on Mac OS X using Python 3.6:

$ LANG=C python3 test_Phylo.py
test_convert (__main__.IOTests)
Convert a tree between all supported formats. ... ok
test_convert_phyloxml_binary (__main__.IOTests)
Try writing phyloxml to a binary handle; fail on Py3. ... ERROR
test_convert_phyloxml_filename (__main__.IOTests)
Write phyloxml to a given filename. ... ERROR
test_convert_phyloxml_text (__main__.IOTests)
Write phyloxml to a text handle. ... ERROR
test_format_branch_length (__main__.IOTests)
Custom format string for Newick branch length serialization. ... ok
test_int_labels (__main__.IOTests)
Read newick formatted tree with numeric labels. ... ok
test_newick_read_multiple (__main__.IOTests)
Parse a Nexus file with multiple trees. ... ok
test_newick_read_scinot (__main__.IOTests)
Parse Newick branch lengths in scientific notation. ... ok
test_newick_read_single1 (__main__.IOTests)
Read first Newick file with one tree. ... ok
test_newick_read_single2 (__main__.IOTests)
Read second Newick file with one tree. ... ok
test_newick_read_single3 (__main__.IOTests)
Read Nexus file with one tree. ... ERROR
test_newick_write (__main__.IOTests)
Parse a Nexus file with multiple trees. ... ok
test_phylo_read_extra (__main__.IOTests)
Additional tests to check correct parsing ... ok
test_unicode_exception (__main__.IOTests)
Read a Newick file with a unicode byte order mark (BOM). ... ok
test_collapse (__main__.MixinTests)
TreeMixin: collapse() method. ... ERROR
test_collapse_all (__main__.MixinTests)
TreeMixin: collapse_all() method. ... ERROR
test_common_ancestor (__main__.MixinTests)
TreeMixin: common_ancestor() method. ... ERROR
test_depths (__main__.MixinTests)
TreeMixin: depths() method. ... ERROR
test_distance (__main__.MixinTests)
TreeMixin: distance() method. ... ERROR
test_find_clades (__main__.MixinTests)
TreeMixin: find_clades() method. ... ERROR
test_find_elements (__main__.MixinTests)
TreeMixin: find_elements() method. ... ERROR
test_find_terminal (__main__.MixinTests)
TreeMixin: find_elements() with terminal argument. ... ERROR
test_get_path (__main__.MixinTests)
TreeMixin: get_path() method. ... ERROR
test_is_bifurcating (__main__.MixinTests)
TreeMixin: is_bifurcating() method. ... ERROR
test_is_monophyletic (__main__.MixinTests)
TreeMixin: is_monophyletic() method. ... ERROR
test_ladderize (__main__.MixinTests)
TreeMixin: ladderize() method. ... ERROR
test_prune (__main__.MixinTests)
TreeMixin: prune() method. ... ERROR
test_split (__main__.MixinTests)
TreeMixin: split() method. ... ERROR
test_total_branch_length (__main__.MixinTests)
TreeMixin: total_branch_length() method. ... ERROR
test_trace (__main__.MixinTests)
TreeMixin: trace() method. ... ERROR
test_randomized (__main__.TreeTests)
Tree.randomized: generate a new randomized tree. ... ok
test_root_at_midpoint (__main__.TreeTests)
Tree.root_at_midpoint: reroot at the tree's midpoint. ... ok
test_root_with_outgroup (__main__.TreeTests)
Tree.root_with_outgroup: reroot at a given clade. ... ok
test_str (__main__.TreeTests)
Tree.__str__: pretty-print to a string. ... ERROR

======================================================================
ERROR: test_convert_phyloxml_binary (__main__.IOTests)
Try writing phyloxml to a binary handle; fail on Py3.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 157, in test_convert_phyloxml_binary
    trees, out_handle, "phyloxml")
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/unittest/case.py", line 728, in assertRaises
    return context.handle('assertRaises', args, kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/unittest/case.py", line 177, in handle
    callable_obj(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 81, in write
    n = getattr(supported_formats[format], 'write')(trees, fp, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 134, in write
    return Writer(obj).write(file, encoding=encoding, indent=indent)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 678, in __init__
    self._tree = ElementTree.ElementTree(self.phyloxml(phyloxml))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 690, in phyloxml
    for tree in obj.phylogenies:
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 130, in <genexpr>
    obj = PX.Phyloxml({}, phylogenies=(fix_single(t) for t in obj))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_convert_phyloxml_filename (__main__.IOTests)
Write phyloxml to a given filename.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 170, in test_convert_phyloxml_filename
    count = Phylo.write(trees, tmp_filename, "phyloxml")
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 81, in write
    n = getattr(supported_formats[format], 'write')(trees, fp, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 134, in write
    return Writer(obj).write(file, encoding=encoding, indent=indent)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 678, in __init__
    self._tree = ElementTree.ElementTree(self.phyloxml(phyloxml))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 690, in phyloxml
    for tree in obj.phylogenies:
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 130, in <genexpr>
    obj = PX.Phyloxml({}, phylogenies=(fix_single(t) for t in obj))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_convert_phyloxml_text (__main__.IOTests)
Write phyloxml to a text handle.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 163, in test_convert_phyloxml_text
    count = Phylo.write(trees, out_handle, "phyloxml")
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 81, in write
    n = getattr(supported_formats[format], 'write')(trees, fp, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 134, in write
    return Writer(obj).write(file, encoding=encoding, indent=indent)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 678, in __init__
    self._tree = ElementTree.ElementTree(self.phyloxml(phyloxml))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 690, in phyloxml
    for tree in obj.phylogenies:
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 130, in <genexpr>
    obj = PX.Phyloxml({}, phylogenies=(fix_single(t) for t in obj))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_newick_read_single3 (__main__.IOTests)
Read Nexus file with one tree.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 53, in test_newick_read_single3
    tree = Phylo.read(EX_NEXUS2, 'nexus')
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 63, in read
    tree = next(tree_gen)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/NexusIO.py", line 38, in parse
    nex = Nexus.Nexus(handle)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Nexus/Nexus.py", line 614, in __init__
    self.read(input)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Nexus/Nexus.py", line 635, in read
    file_contents = fp.read()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 398: ordinal not in range(128)

======================================================================
ERROR: test_collapse (__main__.MixinTests)
TreeMixin: collapse() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_collapse_all (__main__.MixinTests)
TreeMixin: collapse_all() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_common_ancestor (__main__.MixinTests)
TreeMixin: common_ancestor() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_depths (__main__.MixinTests)
TreeMixin: depths() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_distance (__main__.MixinTests)
TreeMixin: distance() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_find_clades (__main__.MixinTests)
TreeMixin: find_clades() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_find_elements (__main__.MixinTests)
TreeMixin: find_elements() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_find_terminal (__main__.MixinTests)
TreeMixin: find_elements() with terminal argument.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_get_path (__main__.MixinTests)
TreeMixin: get_path() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_is_bifurcating (__main__.MixinTests)
TreeMixin: is_bifurcating() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_is_monophyletic (__main__.MixinTests)
TreeMixin: is_monophyletic() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_ladderize (__main__.MixinTests)
TreeMixin: ladderize() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_prune (__main__.MixinTests)
TreeMixin: prune() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_split (__main__.MixinTests)
TreeMixin: split() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_total_branch_length (__main__.MixinTests)
TreeMixin: total_branch_length() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_trace (__main__.MixinTests)
TreeMixin: trace() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_str (__main__.TreeTests)
Tree.__str__: pretty-print to a string.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 257, in test_str
    tree = Phylo.read(source, 'phyloxml')
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 63, in read
    tree = next(tree_gen)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1041: ordinal not in range(128)

----------------------------------------------------------------------
Ran 34 tests in 0.211s

FAILED (errors=21)

We can probably fix this by opening the XML files in binary mode, I have a pull request pending which already does this for the related test failures in other modules.

CC @etal

@peterjc
Copy link
Member Author

peterjc commented Jul 13, 2017

See also https://github.com/biopython/biopython/blob/biopython-170/Tests/test_Phylo.py#L56

    def test_unicode_exception(self):
        """Read a Newick file with a unicode byte order mark (BOM)."""
        if sys.version_info[0] < 3:
            self.assertRaises(NewickIO.NewickError, Phylo.read, EX_NEWICK_BOM, "newick")
        else:
            # Must specify the encoding on Windows                                                                                                                                                                        
            with open(EX_NEWICK_BOM, encoding="utf-8") as handle:
                tree = Phylo.read(handle, 'newick')
            self.assertEqual(len(tree.get_terminals()), 3)

From 10fadab

@peterjc
Copy link
Member Author

peterjc commented Jul 13, 2017

This also breaks some of the Phylo examples in test_Tutorial.py,

$ LANG=C python3 test_Tutorial.py
Running Tutorial doctests...
**********************************************************************
File "test_Tutorial.py", line 254, in __main__.TutorialDocTestHolder.doctest_test_chapter_phylo_line_00074
Failed example:
    trees = list(Phylo.parse("../../Tests/PhyloXML/phyloxml_examples.xml", "phyloxml"))
Exception raised:
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/doctest.py", line 1330, in __run
        compileflags, 1), test.globs)
      File "<doctest __main__.TutorialDocTestHolder.doctest_test_chapter_phylo_line_00074[5]>", line 1, in <module>
        trees = list(Phylo.parse("../../Tests/PhyloXML/phyloxml_examples.xml", "phyloxml"))
      File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
        for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
      File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
        return Parser(file).parse()
      File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
        event, root = next(context)
      File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
        data = source.read(16 * 1024)
      File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)
**********************************************************************
...

@peterjc
Copy link
Member Author

peterjc commented Jul 13, 2017

And test_PhyloXML.py too

@peterjc
Copy link
Member Author

peterjc commented Jul 13, 2017

This seems to help, but would need work for output too...

$ git diff
diff --git a/Bio/Phylo/_io.py b/Bio/Phylo/_io.py
index def7060b4..3f1cfb679 100644
--- a/Bio/Phylo/_io.py
+++ b/Bio/Phylo/_io.py
@@ -32,6 +32,11 @@ try:
 except ImportError:
     pass
 
+# These should be opened in binary mode (e.g. XML encoding pain)
+_BINARY_FORMATS = (
+    'phyloxml',
+    'nexml',
+)
 
 def parse(file, format, **kwargs):
     """Iteratively parse a file and return each of the trees it contains.
@@ -47,7 +52,11 @@ def parse(file, format, **kwargs):
     ...     print(tree.rooted)
     True
     """
-    with File.as_handle(file, 'r') as fp:
+    if format in _BINARY_FORMATS:
+        mode = "rb"
+    else:
+        mode = "rt"
+    with File.as_handle(file, mode) as fp:
         for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
             yield tree
 

@chris-rands
Copy link
Contributor

chris-rands commented Sep 28, 2018

As I mentioned briefly in PR #1808, I encountered the same issue (or a closely related one) to this issue and also issues #1321 and #669.

I had UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128) test failures from running python3 setup.py test, the ERRORs occurring in: test_Nexus, test_Phylo, test_PhyloXML, and test_Tutorial. Most errors were in test_Phylo. (I can provide full trace-backs if necessary.)

Changing LANG from C to en_GB.UTF-8 fixed all the errors.

$ echo $LANG
C
$ export LANG=en_GB.UTF-8
$ echo $LANG
en_GB.UTF-8

But I am not sure how to best to fix this within Biopython. It is possible to inspect and modify the locale via the Python locale module but I am no expert on this and it might have unintended side-effects.

EDIT: my setup:

>>> import sys; print(sys.version)
3.6.6 (default, Jun 27 2018, 13:11:40) 
[GCC 8.1.1 20180531]
>>> import platform; print(platform.python_implementation()); print(platform.platform())
CPython
Linux-4.17.9-1-ARCH-x86_64-with-arch-Arch-Linux
>>> import Bio; print(Bio.__version__)
1.73.dev0

@peterjc
Copy link
Member Author

peterjc commented Sep 28, 2018

From #855, it seems for the XML files the default encoding problem when loading the files can be side-stepped by opening the files in binary mode (and letting the XML parser handle the encoding settings), which is what I tried in #1320 (comment)

@chris-rands If you'd like to explore this, I suggest using that as a starting point.

@chris-rands
Copy link
Contributor

I was just testing this again but with Python 3.7, and found all tests now pass on my system with LANG C, so I think this has been fixed for >=3.7. PEP 538 seems to explains the relevant changes.

@peterjc
Copy link
Member Author

peterjc commented Dec 18, 2018

If these problems will "go away" with Python 3.7 onwards, that is good news.

If there are any simple changes we can make for Python 3.4, 3.5, 3.6, even better.

@fabianegli
Copy link
Contributor

It probably did not entirely go away with Python 3.7. See https://github.com/biopython/biopython/runs/6866566839?check_suite_focus=true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants