test_Phylo.py UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 #1320

peterjc · 2017-07-13T14:17:51Z

Spin out from #855 which was specifically for test_NCBIXML.py but has the same root cause.

Some of the test XML files contain a non-ASCII accented character:

$ grep $'\xc3\xbc' PhyloXML/*.xml
PhyloXML/distribution.xml:            <desc>ETH Zürich</desc>
PhyloXML/phyloxml_examples.xml:                  <desc>ETH Zürich</desc>

Note while PhyloXML/distribution.xml fails to do so, PhyloXML/phyloxml_examples.xml does define an encoding,

<?xml version="1.0" encoding="UTF-8"?>

Testing with Biopython 1.70 with my default locale, everything is fine as a UTF8 encoding is the default. However, under some systems (including the multibuild systems for compiling wheels), you can get a default encoding of ascii.

The failure can be recreated under Python 3 as follows, here on Mac OS X using Python 3.6:

$ LANG=C python3 test_Phylo.py
test_convert (__main__.IOTests)
Convert a tree between all supported formats. ... ok
test_convert_phyloxml_binary (__main__.IOTests)
Try writing phyloxml to a binary handle; fail on Py3. ... ERROR
test_convert_phyloxml_filename (__main__.IOTests)
Write phyloxml to a given filename. ... ERROR
test_convert_phyloxml_text (__main__.IOTests)
Write phyloxml to a text handle. ... ERROR
test_format_branch_length (__main__.IOTests)
Custom format string for Newick branch length serialization. ... ok
test_int_labels (__main__.IOTests)
Read newick formatted tree with numeric labels. ... ok
test_newick_read_multiple (__main__.IOTests)
Parse a Nexus file with multiple trees. ... ok
test_newick_read_scinot (__main__.IOTests)
Parse Newick branch lengths in scientific notation. ... ok
test_newick_read_single1 (__main__.IOTests)
Read first Newick file with one tree. ... ok
test_newick_read_single2 (__main__.IOTests)
Read second Newick file with one tree. ... ok
test_newick_read_single3 (__main__.IOTests)
Read Nexus file with one tree. ... ERROR
test_newick_write (__main__.IOTests)
Parse a Nexus file with multiple trees. ... ok
test_phylo_read_extra (__main__.IOTests)
Additional tests to check correct parsing ... ok
test_unicode_exception (__main__.IOTests)
Read a Newick file with a unicode byte order mark (BOM). ... ok
test_collapse (__main__.MixinTests)
TreeMixin: collapse() method. ... ERROR
test_collapse_all (__main__.MixinTests)
TreeMixin: collapse_all() method. ... ERROR
test_common_ancestor (__main__.MixinTests)
TreeMixin: common_ancestor() method. ... ERROR
test_depths (__main__.MixinTests)
TreeMixin: depths() method. ... ERROR
test_distance (__main__.MixinTests)
TreeMixin: distance() method. ... ERROR
test_find_clades (__main__.MixinTests)
TreeMixin: find_clades() method. ... ERROR
test_find_elements (__main__.MixinTests)
TreeMixin: find_elements() method. ... ERROR
test_find_terminal (__main__.MixinTests)
TreeMixin: find_elements() with terminal argument. ... ERROR
test_get_path (__main__.MixinTests)
TreeMixin: get_path() method. ... ERROR
test_is_bifurcating (__main__.MixinTests)
TreeMixin: is_bifurcating() method. ... ERROR
test_is_monophyletic (__main__.MixinTests)
TreeMixin: is_monophyletic() method. ... ERROR
test_ladderize (__main__.MixinTests)
TreeMixin: ladderize() method. ... ERROR
test_prune (__main__.MixinTests)
TreeMixin: prune() method. ... ERROR
test_split (__main__.MixinTests)
TreeMixin: split() method. ... ERROR
test_total_branch_length (__main__.MixinTests)
TreeMixin: total_branch_length() method. ... ERROR
test_trace (__main__.MixinTests)
TreeMixin: trace() method. ... ERROR
test_randomized (__main__.TreeTests)
Tree.randomized: generate a new randomized tree. ... ok
test_root_at_midpoint (__main__.TreeTests)
Tree.root_at_midpoint: reroot at the tree's midpoint. ... ok
test_root_with_outgroup (__main__.TreeTests)
Tree.root_with_outgroup: reroot at a given clade. ... ok
test_str (__main__.TreeTests)
Tree.__str__: pretty-print to a string. ... ERROR

======================================================================
ERROR: test_convert_phyloxml_binary (__main__.IOTests)
Try writing phyloxml to a binary handle; fail on Py3.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 157, in test_convert_phyloxml_binary
    trees, out_handle, "phyloxml")
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/unittest/case.py", line 728, in assertRaises
    return context.handle('assertRaises', args, kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/unittest/case.py", line 177, in handle
    callable_obj(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 81, in write
    n = getattr(supported_formats[format], 'write')(trees, fp, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 134, in write
    return Writer(obj).write(file, encoding=encoding, indent=indent)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 678, in __init__
    self._tree = ElementTree.ElementTree(self.phyloxml(phyloxml))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 690, in phyloxml
    for tree in obj.phylogenies:
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 130, in <genexpr>
    obj = PX.Phyloxml({}, phylogenies=(fix_single(t) for t in obj))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_convert_phyloxml_filename (__main__.IOTests)
Write phyloxml to a given filename.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 170, in test_convert_phyloxml_filename
    count = Phylo.write(trees, tmp_filename, "phyloxml")
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 81, in write
    n = getattr(supported_formats[format], 'write')(trees, fp, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 134, in write
    return Writer(obj).write(file, encoding=encoding, indent=indent)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 678, in __init__
    self._tree = ElementTree.ElementTree(self.phyloxml(phyloxml))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 690, in phyloxml
    for tree in obj.phylogenies:
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 130, in <genexpr>
    obj = PX.Phyloxml({}, phylogenies=(fix_single(t) for t in obj))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_convert_phyloxml_text (__main__.IOTests)
Write phyloxml to a text handle.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 163, in test_convert_phyloxml_text
    count = Phylo.write(trees, out_handle, "phyloxml")
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 81, in write
    n = getattr(supported_formats[format], 'write')(trees, fp, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 134, in write
    return Writer(obj).write(file, encoding=encoding, indent=indent)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 678, in __init__
    self._tree = ElementTree.ElementTree(self.phyloxml(phyloxml))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 690, in phyloxml
    for tree in obj.phylogenies:
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 130, in <genexpr>
    obj = PX.Phyloxml({}, phylogenies=(fix_single(t) for t in obj))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_newick_read_single3 (__main__.IOTests)
Read Nexus file with one tree.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 53, in test_newick_read_single3
    tree = Phylo.read(EX_NEXUS2, 'nexus')
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 63, in read
    tree = next(tree_gen)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/NexusIO.py", line 38, in parse
    nex = Nexus.Nexus(handle)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Nexus/Nexus.py", line 614, in __init__
    self.read(input)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Nexus/Nexus.py", line 635, in read
    file_contents = fp.read()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 398: ordinal not in range(128)

======================================================================
ERROR: test_collapse (__main__.MixinTests)
TreeMixin: collapse() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_collapse_all (__main__.MixinTests)
TreeMixin: collapse_all() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_common_ancestor (__main__.MixinTests)
TreeMixin: common_ancestor() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_depths (__main__.MixinTests)
TreeMixin: depths() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_distance (__main__.MixinTests)
TreeMixin: distance() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_find_clades (__main__.MixinTests)
TreeMixin: find_clades() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_find_elements (__main__.MixinTests)
TreeMixin: find_elements() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_find_terminal (__main__.MixinTests)
TreeMixin: find_elements() with terminal argument.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_get_path (__main__.MixinTests)
TreeMixin: get_path() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_is_bifurcating (__main__.MixinTests)
TreeMixin: is_bifurcating() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_is_monophyletic (__main__.MixinTests)
TreeMixin: is_monophyletic() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_ladderize (__main__.MixinTests)
TreeMixin: ladderize() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_prune (__main__.MixinTests)
TreeMixin: prune() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_split (__main__.MixinTests)
TreeMixin: split() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_total_branch_length (__main__.MixinTests)
TreeMixin: total_branch_length() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_trace (__main__.MixinTests)
TreeMixin: trace() method.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 266, in setUp
    self.phylogenies = list(Phylo.parse(EX_PHYLO, 'phyloxml'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)

======================================================================
ERROR: test_str (__main__.TreeTests)
Tree.__str__: pretty-print to a string.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Phylo.py", line 257, in test_str
    tree = Phylo.read(source, 'phyloxml')
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 63, in read
    tree = next(tree_gen)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
    return Parser(file).parse()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
    event, root = next(context)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1041: ordinal not in range(128)

----------------------------------------------------------------------
Ran 34 tests in 0.211s

FAILED (errors=21)

We can probably fix this by opening the XML files in binary mode, I have a pull request pending which already does this for the related test failures in other modules.

CC @etal

The text was updated successfully, but these errors were encountered:

peterjc · 2017-07-13T14:29:31Z

See also https://github.com/biopython/biopython/blob/biopython-170/Tests/test_Phylo.py#L56

    def test_unicode_exception(self):
        """Read a Newick file with a unicode byte order mark (BOM)."""
        if sys.version_info[0] < 3:
            self.assertRaises(NewickIO.NewickError, Phylo.read, EX_NEWICK_BOM, "newick")
        else:
            # Must specify the encoding on Windows                                                                                                                                                                        
            with open(EX_NEWICK_BOM, encoding="utf-8") as handle:
                tree = Phylo.read(handle, 'newick')
            self.assertEqual(len(tree.get_terminals()), 3)

From 10fadab

peterjc · 2017-07-13T16:45:16Z

This also breaks some of the Phylo examples in test_Tutorial.py,

$ LANG=C python3 test_Tutorial.py
Running Tutorial doctests...
**********************************************************************
File "test_Tutorial.py", line 254, in __main__.TutorialDocTestHolder.doctest_test_chapter_phylo_line_00074
Failed example:
    trees = list(Phylo.parse("../../Tests/PhyloXML/phyloxml_examples.xml", "phyloxml"))
Exception raised:
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/doctest.py", line 1330, in __run
        compileflags, 1), test.globs)
      File "<doctest __main__.TutorialDocTestHolder.doctest_test_chapter_phylo_line_00074[5]>", line 1, in <module>
        trees = list(Phylo.parse("../../Tests/PhyloXML/phyloxml_examples.xml", "phyloxml"))
      File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/_io.py", line 51, in parse
        for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
      File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 98, in parse
        return Parser(file).parse()
      File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 296, in __init__
        event, root = next(context)
      File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
        data = source.read(16 * 1024)
      File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128)
**********************************************************************
...

peterjc · 2017-07-13T16:50:52Z

And test_PhyloXML.py too

peterjc · 2017-07-13T16:52:49Z

This seems to help, but would need work for output too...

$ git diff
diff --git a/Bio/Phylo/_io.py b/Bio/Phylo/_io.py
index def7060b4..3f1cfb679 100644
--- a/Bio/Phylo/_io.py
+++ b/Bio/Phylo/_io.py
@@ -32,6 +32,11 @@ try:
 except ImportError:
     pass
 
+# These should be opened in binary mode (e.g. XML encoding pain)
+_BINARY_FORMATS = (
+    'phyloxml',
+    'nexml',
+)
 
 def parse(file, format, **kwargs):
     """Iteratively parse a file and return each of the trees it contains.
@@ -47,7 +52,11 @@ def parse(file, format, **kwargs):
     ...     print(tree.rooted)
     True
     """
-    with File.as_handle(file, 'r') as fp:
+    if format in _BINARY_FORMATS:
+        mode = "rb"
+    else:
+        mode = "rt"
+    with File.as_handle(file, mode) as fp:
         for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
             yield tree

chris-rands · 2018-09-28T14:40:38Z

As I mentioned briefly in PR #1808, I encountered the same issue (or a closely related one) to this issue and also issues #1321 and #669.

I had UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13090: ordinal not in range(128) test failures from running python3 setup.py test, the ERRORs occurring in: test_Nexus, test_Phylo, test_PhyloXML, and test_Tutorial. Most errors were in test_Phylo. (I can provide full trace-backs if necessary.)

Changing LANG from C to en_GB.UTF-8 fixed all the errors.

$ echo $LANG
C
$ export LANG=en_GB.UTF-8
$ echo $LANG
en_GB.UTF-8

But I am not sure how to best to fix this within Biopython. It is possible to inspect and modify the locale via the Python locale module but I am no expert on this and it might have unintended side-effects.

EDIT: my setup:

>>> import sys; print(sys.version)
3.6.6 (default, Jun 27 2018, 13:11:40) 
[GCC 8.1.1 20180531]
>>> import platform; print(platform.python_implementation()); print(platform.platform())
CPython
Linux-4.17.9-1-ARCH-x86_64-with-arch-Arch-Linux
>>> import Bio; print(Bio.__version__)
1.73.dev0

peterjc · 2018-09-28T14:50:00Z

From #855, it seems for the XML files the default encoding problem when loading the files can be side-stepped by opening the files in binary mode (and letting the XML parser handle the encoding settings), which is what I tried in #1320 (comment)

@chris-rands If you'd like to explore this, I suggest using that as a starting point.

chris-rands · 2018-12-17T09:58:34Z

I was just testing this again but with Python 3.7, and found all tests now pass on my system with LANG C, so I think this has been fixed for >=3.7. PEP 538 seems to explains the relevant changes.

peterjc · 2018-12-18T11:08:37Z

If these problems will "go away" with Python 3.7 onwards, that is good news.

If there are any simple changes we can make for Python 3.4, 3.5, 3.6, even better.

fabianegli · 2022-06-13T20:50:32Z

It probably did not entirely go away with Python 3.7. See https://github.com/biopython/biopython/runs/6866566839?check_suite_focus=true

peterjc mentioned this issue Jul 13, 2017

Parse XML files in bytes mode to avoid encoding error #1322

Merged

peterjc assigned etal Jul 17, 2017

peterjc added Bug Testing labels Jul 17, 2017

fabianegli mentioned this issue Jun 13, 2022

CI issues @github-actions CI / test_windows (3.9) (pull_request) Failing after 5m — test_windows (3.9) #3944

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_Phylo.py UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 #1320

test_Phylo.py UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 #1320

peterjc commented Jul 13, 2017

peterjc commented Jul 13, 2017

peterjc commented Jul 13, 2017

peterjc commented Jul 13, 2017 •

edited

peterjc commented Jul 13, 2017

chris-rands commented Sep 28, 2018 •

edited

peterjc commented Sep 28, 2018

chris-rands commented Dec 17, 2018

peterjc commented Dec 18, 2018

fabianegli commented Jun 13, 2022

test_Phylo.py UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 #1320

test_Phylo.py UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 #1320

Comments

peterjc commented Jul 13, 2017

peterjc commented Jul 13, 2017

peterjc commented Jul 13, 2017

peterjc commented Jul 13, 2017 • edited

peterjc commented Jul 13, 2017

chris-rands commented Sep 28, 2018 • edited

peterjc commented Sep 28, 2018

chris-rands commented Dec 17, 2018

peterjc commented Dec 18, 2018

fabianegli commented Jun 13, 2022

peterjc commented Jul 13, 2017 •

edited

chris-rands commented Sep 28, 2018 •

edited