You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It looks like this test is working on Python 3, but failing on Python 2.7 - we don't run this on AppVeyor or TravisCI because is it an online test and therefore we expect some intermittent failures and also don't want to burden the online servers.
64bit Linux, Python 2.7.15 via conda
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ python --version
Python 2.7.15
$ python test_Entrez_online.py
test_ecitmatch (test_Entrez_online.EntrezOnlineCase)
Test Entrez.ecitmatch to search for a citation ... ok
test_efetch_biosystems_xml (test_Entrez_online.EntrezOnlineCase)
Test Entrez parser with XML from biosystems ... ok
test_efetch_gds_utf8 (test_Entrez_online.EntrezOnlineCase)
Test correct handling of encodings in Entrez.efetch ... /mnt/.../conda/lib/python2.7/unittest/case.py:503: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if not first == second:
FAIL
...
======================================================================
FAIL: test_efetch_gds_utf8 (test_Entrez_online.EntrezOnlineCase)
Test correct handling of encodings in Entrez.efetch
----------------------------------------------------------------------
Traceback (most recent call last):
File "/mnt/.../repositories/biopython/Tests/test_Entrez_online.py", line 261, in test_efetch_gds_utf8
self.assertEqual(result[342:359], expected_result)
AssertionError: '\xe2\x80\x9cfield of injur' != u'\u201cfield of injury\u201d'
(It also fails on #1848 but I have omitted that here)
Same issue on macOS using Apple provided Python 2.7.10
These are the left and right double quote characters,
expected_result=u'“field of injury”'# Use of Unicode double qoutation marks U+201C and U+201D self.assertEqual(result[342:359], expected_result)
Does test_Entrez_online.py need to declare an encoding since it is non ASCII?
It does not seem to be a locale issue as the machines are both UTF8 based, and this fails the same way:
$ LANG=C python test_Entrez_online.py
I think the problem is likely an oversight in the encoding/decoding, and this seems to fix it but needs reviewing especially with the call to _binary_to_string_handle inside _open inside efetch:
$ git diff
diff --git a/Tests/test_Entrez_online.py b/Tests/test_Entrez_online.py
index 12c345ba5..d2f00a480 100644
--- a/Tests/test_Entrez_online.py
+++ b/Tests/test_Entrez_online.py
@@ -257,7 +257,10 @@ class EntrezOnlineCase(unittest.TestCase):
self.assertIn(URL_API_KEY, handle.url)
self.assertIn("id=200079209", handle.url)
result = handle.read()
- expected_result = u'“field of injury”' # Use of Unicode double qoutation marks U+201C and U+201D
+ if sys.version_info[0] < 3:
+ result = result.decode("UTF8")
+ # Use of Unicode double quotation marks U+201C and U+201D
+ expected_result = u'“field of injury”'
self.assertEqual(result[342:359], expected_result)
handle.close()
finally:
The text was updated successfully, but these errors were encountered:
This error is probably due to the fact that in python3 unicode() is renamed to str(). As a result, strings are unicode by default in python3 but not python2. See here.
I can't think of a better solution than version checks, as you have suggested.
Using _as_unicode takes the default encoding and fails - using UTF8 explicitly seems to be required here, so I've applied the if-statement shown earlier.
It looks like this test is working on Python 3, but failing on Python 2.7 - we don't run this on AppVeyor or TravisCI because is it an online test and therefore we expect some intermittent failures and also don't want to burden the online servers.
64bit Linux, Python 2.7.15 via conda
(It also fails on #1848 but I have omitted that here)
Same issue on macOS using Apple provided Python 2.7.10
These are the left and right double quote characters,
https://www.fileformat.info/info/unicode/char/201c/index.htm
UTF-8 (hex), 0xE2 0x80 0x9C
UTF-16 (hex), 0x201C
https://www.fileformat.info/info/unicode/char/201d/index.htm
UTF-8 (hex), 0xE2 0x80 0x9D
UTF-16 (hex), 0x201D
Quoting the files,
Does
test_Entrez_online.py
need to declare an encoding since it is non ASCII?It does not seem to be a locale issue as the machines are both UTF8 based, and this fails the same way:
I think the problem is likely an oversight in the encoding/decoding, and this seems to fix it but needs reviewing especially with the call to
_binary_to_string_handle
inside_open
insideefetch
:The text was updated successfully, but these errors were encountered: