New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: XML queue overflow #5824
Comments
Please attach the actual table file and also confirm that it actually follows VO standards. |
Heres the table file
Well, i know there is also something not right with the arraysize parameter... |
Not sure what >>> import astropy
>>> astropy.__version__
'2.0.dev17886'
>>> from astropy.table import Table
>>> tab = Table.read('bugreport.xml', format='votable')
WARNING: W35: bugreport.xml:6:2: W35: 'value' attribute required for INFO elements [astropy.io.votable.tree]
WARNING: W27: bugreport.xml:11:2: W27: COOSYS deprecated in VOTable 1.2 [astropy.io.votable.tree]
>>> tab
<Table masked=True length=2538>
RAJ2000 [1] DEJ2000 [1] W1mag [1] W2mag [1] W3mag [1] W4mag [1]
deg deg mag mag mag mag
float64 float64 float32 float32 float32 float32
----------- ----------- --------- --------- --------- ---------
10.0500458 19.757055 16.848 16.346 12.221 8.998
10.0415944 19.7670545 17.392 16.644 12.406 8.571
10.0108147 19.755775 17.256 17.096 12.559 8.948
9.9922353 19.7511416 17.648 16.553 12.17 8.218
9.9816437 19.7507113 16.164 16.015 12.39 9.044
9.9866186 19.7557493 16.118 16.394 12.597 8.919
... ... ... ... ... ...
9.9856502 20.2367117 17.02 17.337 11.966 8.859
9.9935504 20.2470238 16.491 16.713 12.547 8.966
9.9855978 20.2464858 16.885 16.466 12.585 8.992
9.9689587 20.2405518 16.836 16.37 12.616 8.839
9.9558873 20.2455505 17.222 17.25 12.724 9.119
9.9633682 20.2460298 15.724 15.676 12.602 8.868 |
Reading from a local file works fine indeed. PyVO is doing some complicated stuff involving a custom config and calling astropy internal methods manually, so i reduced that part to from astropy.io.votable import parse
return parse(source) where Parsing from the local file-obj works fine, so i wrote the return value of both c1.xml: local --- c1.xml 2017-02-20 10:53:57.640852489 +0100
+++ c2.xml 2017-02-20 10:53:57.640852489 +0100
@@ -1,19 +1,19 @@
<?xml version="1.0" encoding="UTF-8"?>
<VOTABLE version="1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.2" xmlns="http://www.ivoa.net/xml/VOTable/v1.2">
<RESOURCE type="results">
- <INFO name="QUERY_STATUS" value="OK"/>
+ <INFO name="QUERY_STATUS" value="OK" />
<INFO name="PROVIDER" value="TAPVizieR">VizieR TAP service.</INFO>
<INFO name="QUERY"><![CDATA[SELECT raj2000, dej2000, w1mag, w2mag, w3mag, w4mag
- FROM "II/328/allwise"
- WHERE 1=CONTAINS(
- POINT('ICRS', raj2000, dej2000),
- CIRCLE('ICRS', 10, 20, 0.25))]]></INFO>
- <COOSYS ID="coosys_FK5" system="eq_FK5" equinox="2000.0"/>
+ FROM "II/328/allwise"
+ WHERE 1=CONTAINS(
+ POINT('ICRS', raj2000, dej2000),
+ CIRCLE('ICRS', 10, 20, 0.25))]]></INFO>
+ <COOSYS ID='coosys_FK5' system='eq_FK5' equinox='2000.0'/>
<TABLE>
- <FIELD ID="RAJ2000" name="RAJ2000" datatype="double" arraysize="1" ucd="pos.eq.ra;meta.main" unit="deg" ref="coosys_FK5">
+ <FIELD ID="RAJ2000" name="RAJ2000" datatype="double" arraysize="1" ucd="pos.eq.ra;meta.main" unit="deg" ref='coosys_FK5'>
<DESCRIPTION>Right ascension (J2000)</DESCRIPTION>
</FIELD>
- <FIELD ID="DEJ2000" name="DEJ2000" datatype="double" arraysize="1" ucd="pos.eq.dec;meta.main" unit="deg" ref="coosys_FK5">
+ <FIELD ID="DEJ2000" name="DEJ2000" datatype="double" arraysize="1" ucd="pos.eq.dec;meta.main" unit="deg" ref='coosys_FK5'>
<DESCRIPTION>Declination (J2000)</DESCRIPTION>
</FIELD>
<FIELD ID="W1mag" name="W1mag" datatype="float" arraysize="1" ucd="phot.mag;em.IR.3-4um" unit="mag">
@@ -1963,7 +1963,7 @@
<TR><TD>10.0950335</TD><TD>20.1427539</TD><TD>15.965</TD><TD>16.082</TD><TD>12.216</TD><TD>8.443</TD></TR>
<TR><TD>10.0992395</TD><TD>20.1429400</TD><TD>17.393</TD><TD>16.631</TD><TD>12.238</TD><TD>8.979</TD></TR>
<TR><TD>10.0881509</TD><TD>20.1455762</TD><TD>17.568</TD><TD>17.097</TD><TD>12.246</TD><TD>8.542</TD></TR>
-<TR><TD>10.0989969</TD><TD>20.1507476</TD><TD>17.024</TD><TD>16.382</TD><TD>12.019</TD><TD/></TR>
+<TR><TD>10.0989969</TD><TD>20.1507476</TD><TD>17.024</TD><TD>16.382</TD><TD>12.019</TD><TD></TD></TR>
<TR><TD>10.0790436</TD><TD>20.1435568</TD><TD>17.399</TD><TD>16.651</TD><TD>12.041</TD><TD>8.678</TD></TR>
<TR><TD>10.0688502</TD><TD>20.1414011</TD><TD>16.059</TD><TD>15.948</TD><TD>12.554</TD><TD>8.711</TD></TR>
<TR><TD>10.0898150</TD><TD>20.1589132</TD><TD>17.083</TD><TD>17.048</TD><TD>12.432</TD><TD>8.542</TD></TR>
@@ -2573,4 +2573,4 @@
</DATA>
</TABLE>
</RESOURCE>
-</VOTABLE>
\ Kein Zeilenumbruch am Dateiende. # no newline at file end
+</VOTABLE> If i would guess, i'd say it has something to do with |
I am not sure what can be fixed on Astropy side in this case. Any suggestion? |
The circumstances when the bug occurs are quite specific: >=816 rows and >=4 columns are required (the contents of the columns do not matter, eg.: --- cached_result.votable.xml.circle.815.good 2017-02-24 17:10:35.881331402 +0100
+++ cached_result.votable.xml.circle.816.bad 2017-02-24 17:10:20.821371217 +0100
@@ -4,6 +4,6 @@
SELECT
- TOP 815
+ TOP 816
raj2000, dej2000, w1mag, w2mag
FROM "II/328/allwise"
WHERE 1=CONTAINS(POINT('ICRS', raj2000, dej2000), CIRCLE('ICRS', 0, 0, 0.149))
]]></INFO>
@@ -840,2 +840,3 @@
<TR><TD>359.9752481</TD><TD>0.1319479</TD><TD>17.120</TD><TD>17.008</TD></TR>
+<TR><TD>359.9583354</TD><TD>0.1323975</TD><TD>16.736</TD><TD>16.755</TD></TR>
</TABLEDATA> Core of debug harness: adql_query = """
SELECT
TOP {:d}
raj2000, dej2000, w1mag, w2mag
FROM "II/328/allwise"
WHERE 1=CONTAINS(POINT('ICRS', raj2000, dej2000), CIRCLE('ICRS', 0, 0, 0.149))
""".format((815, 816)[not use_safe_number_of_results])
…
def main():
…
if use_local_file:
# always works
f = open(local_filename, 'r')
else:
# doesn't work when using _fast_iterparse
keep_gzip = dump_votable_xml_gzip
f = functools.partial(response.raw.read, decode_content=not keep_gzip)
if use_slow_and_reliable:
# always works
iterable = iter(astropy.utils.xml.iterparser._slow_iterparse(f))
else:
# doesn't work when using 'response.raw.read'
iterable = iter(astropy.utils.xml.iterparser._fast_iterparse(f)) The following tarball contains a test harness ( |
Thank you for the detailed info. So, looks like a bug in the C parser. |
The underlying trigger is that this TAP provider returns Gzip-compressed data (a good thing). From this response, the requested number of bytes (eg. 8192) are ingested by There are three lengths involved:
The presumptions are that:
There are several possible solutions; the most ideal would probably be to have the handlers There are also some kludges in the short term, to workaround the immediate symptom without curing/fixing the underlying problems: one is to pass a larger Are there any preferences/thoughts about which style of fix would be most appropriate for the upstream codebase? |
I am going to c/c the following maintainers for |
Following tarball contains (This is to enable patch validation). |
Initial patch that consolidates the logic and extends the the (This does not fix the EOF detection on non-regular files that was discovered in the process). |
@sladen , do you intend to submit an official PR? |
@pllim, done! |
relloc() its buffers instead of overflowing them. [astropy#5824, astropy#5869]
relloc() its buffers instead of overflowing them. [astropy#5824, astropy#5869] Squashed commit of the following: commit 5eaed5f Merge: 1c15acd ae97dea Author: Paul Sladen <github@paul.sladen.org> Date: Tue Mar 21 11:17:04 2017 +0100 Fix to allow the C-based _fast_iterparse() VOTable XML parser to relloc() its buffers instead of overflowing them. [astropy#5824, astropy#5869] commit 1c15acd Author: Paul Sladen <github@paul.sladen.org> Date: Tue Mar 21 11:06:34 2017 +0100 iterparse.c: short-circuit quicker to avoid realloc() nop when n == self->queue_size commit c111ba9 Author: Paul Sladen <github@paul.sladen.org> Date: Tue Mar 21 11:03:14 2017 +0100 test_iterparse.py: spelling/grammar corrections (no code changes) commit 8270fac Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 12:39:34 2017 +0100 CHANGES.rst: drop second sentence/contents of parenthesis commit dcc81af Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 11:46:37 2017 +0100 CHANGES.rst: drop parenthesis '()', keep square brackets '[]', (hopefully) per feedback commit 54d0dd8 Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 10:26:08 2017 +0100 test_iterparse.py: enumerate iterator using list() (written blind) commit c26b94c Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 10:01:07 2017 +0100 test_iterparse.py: pacify pep8online.com linter, per feedback commit 6e33447 Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 09:35:19 2017 +0100 test_iterparse.py: comment out debugging print() commit 191c2bd Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 09:20:13 2017 +0100 test_iterparse.py: call iterator directly instead of with next() (written blind) commit b6704c8 Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 09:19:08 2017 +0100 test_iterparse.py: use six.BytesIO per feedback (written blind) commit d3cf1be Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 09:16:58 2017 +0100 test_iterparse.py: relocate 'from __future__ import' from __init__.py to test per feedback commit 1f4c1c8 Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 09:15:46 2017 +0100 CHANGES.rst: tweak [bug #, bug #], per feedback commit 3da2fae Author: Paul Sladen <github@paul.sladen.org> Date: Fri Mar 17 22:39:12 2017 +0100 test_iterparse.py: try to call iterator directly, per feedback (written blind) commit a440901 Author: Paul Sladen <github@paul.sladen.org> Date: Fri Mar 17 22:37:15 2017 +0100 test_iterparse.py: drop unused raises import commit de239ae Author: Paul Sladen <github@paul.sladen.org> Date: Fri Mar 17 22:35:24 2017 +0100 test_iterparse.py: drop if __name__==__main__, per feedback commit 48fd4c6 Author: Paul Sladen <github@paul.sladen.org> Date: Fri Mar 17 22:33:59 2017 +0100 CHANGES.rst: tweak [bug #] position, (hopefully) per feedback commit a36ada2 Author: Paul Sladen <github@paul.sladen.org> Date: Fri Mar 17 22:31:08 2017 +0100 xml/tests/__init__.py: empty __init__ per feedback commit 9208ce5 Author: Paul Sladen <github@paul.sladen.org> Date: Wed Mar 15 15:09:34 2017 +0100 Changelog v1.3.1: utils.xml: "Fix to allow the C-based _fast_iterparse()... commit 183d4ca Author: Paul Sladen <github@paul.sladen.org> Date: Wed Mar 15 14:42:24 2017 +0100 test_iterparse: additional comments; trim unused commit c69987e Author: Paul Sladen <github@paul.sladen.org> Date: Tue Mar 14 18:08:47 2017 +0100 test_iterparse: Python-2/3isms, try to use StringIO/BytesIOs depending commit d4a9193 Author: Paul Sladen <github@paul.sladen.org> Date: Tue Mar 14 17:12:04 2017 +0100 test_iterparse: add .encode() to testcase for Python 3.5 commit eda776b Author: Paul Sladen <github@paul.sladen.org> Date: Tue Mar 14 15:49:59 2017 +0100 test_iterparse(): remove even more dependencies commit fa49f80 Author: Paul Sladen <github@paul.sladen.org> Date: Tue Mar 14 14:51:27 2017 +0100 test_iterparse.py - simplify emulation to avoid importing requests/pyvo commit 78e7a21 Author: Paul Sladen <github@paul.sladen.org> Date: Tue Mar 14 12:32:32 2017 +0100 test_iterparse.py - test case for bug 5824 overread on compressed data commit 05c9098 Author: Paul Sladen <github@paul.sladen.org> Date: Thu Mar 9 19:34:37 2017 +0100 queue_realloc() tmp not needed commit 6ee0287 Author: Paul Sladen <github@paul.sladen.org> Date: Thu Mar 9 18:54:05 2017 +0100 iterparse.c: add queue_realloc() + move 'buffersize / 2' logic there [bug astropy#5824]
relloc() its buffers instead of overflowing them. [astropy#5824, astropy#5869] Squashed commit of the following: commit 5eaed5f Author: Paul Sladen <github@paul.sladen.org> Date: Tue Mar 21 11:17:04 2017 +0100 Fix to allow the C-based _fast_iterparse() VOTable XML parser to relloc() its buffers instead of overflowing them. [astropy#5824, astropy#5869] commit 1c15acd Author: Paul Sladen <github@paul.sladen.org> Date: Tue Mar 21 11:06:34 2017 +0100 iterparse.c: short-circuit quicker to avoid realloc() nop when n == self->queue_size commit c111ba9 Author: Paul Sladen <github@paul.sladen.org> Date: Tue Mar 21 11:03:14 2017 +0100 test_iterparse.py: spelling/grammar corrections (no code changes) commit 8270fac Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 12:39:34 2017 +0100 CHANGES.rst: drop second sentence/contents of parenthesis commit dcc81af Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 11:46:37 2017 +0100 CHANGES.rst: drop parenthesis '()', keep square brackets '[]', (hopefully) per feedback commit 54d0dd8 Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 10:26:08 2017 +0100 test_iterparse.py: enumerate iterator using list() (written blind) commit c26b94c Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 10:01:07 2017 +0100 test_iterparse.py: pacify pep8online.com linter, per feedback commit 6e33447 Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 09:35:19 2017 +0100 test_iterparse.py: comment out debugging print() commit 191c2bd Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 09:20:13 2017 +0100 test_iterparse.py: call iterator directly instead of with next() (written blind) commit b6704c8 Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 09:19:08 2017 +0100 test_iterparse.py: use six.BytesIO per feedback (written blind) commit d3cf1be Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 09:16:58 2017 +0100 test_iterparse.py: relocate 'from __future__ import' from __init__.py to test per feedback commit 1f4c1c8 Author: Paul Sladen <github@paul.sladen.org> Date: Sat Mar 18 09:15:46 2017 +0100 CHANGES.rst: tweak [bug #, bug #], per feedback commit 3da2fae Author: Paul Sladen <github@paul.sladen.org> Date: Fri Mar 17 22:39:12 2017 +0100 test_iterparse.py: try to call iterator directly, per feedback (written blind) commit a440901 Author: Paul Sladen <github@paul.sladen.org> Date: Fri Mar 17 22:37:15 2017 +0100 test_iterparse.py: drop unused raises import commit de239ae Author: Paul Sladen <github@paul.sladen.org> Date: Fri Mar 17 22:35:24 2017 +0100 test_iterparse.py: drop if __name__==__main__, per feedback commit 48fd4c6 Author: Paul Sladen <github@paul.sladen.org> Date: Fri Mar 17 22:33:59 2017 +0100 CHANGES.rst: tweak [bug #] position, (hopefully) per feedback commit a36ada2 Author: Paul Sladen <github@paul.sladen.org> Date: Fri Mar 17 22:31:08 2017 +0100 xml/tests/__init__.py: empty __init__ per feedback commit 9208ce5 Author: Paul Sladen <github@paul.sladen.org> Date: Wed Mar 15 15:09:34 2017 +0100 Changelog v1.3.1: utils.xml: "Fix to allow the C-based _fast_iterparse()... commit 183d4ca Author: Paul Sladen <github@paul.sladen.org> Date: Wed Mar 15 14:42:24 2017 +0100 test_iterparse: additional comments; trim unused commit c69987e Author: Paul Sladen <github@paul.sladen.org> Date: Tue Mar 14 18:08:47 2017 +0100 test_iterparse: Python-2/3isms, try to use StringIO/BytesIOs depending commit d4a9193 Author: Paul Sladen <github@paul.sladen.org> Date: Tue Mar 14 17:12:04 2017 +0100 test_iterparse: add .encode() to testcase for Python 3.5 commit eda776b Author: Paul Sladen <github@paul.sladen.org> Date: Tue Mar 14 15:49:59 2017 +0100 test_iterparse(): remove even more dependencies commit fa49f80 Author: Paul Sladen <github@paul.sladen.org> Date: Tue Mar 14 14:51:27 2017 +0100 test_iterparse.py - simplify emulation to avoid importing requests/pyvo commit 78e7a21 Author: Paul Sladen <github@paul.sladen.org> Date: Tue Mar 14 12:32:32 2017 +0100 test_iterparse.py - test case for bug 5824 overread on compressed data commit 05c9098 Author: Paul Sladen <github@paul.sladen.org> Date: Thu Mar 9 19:34:37 2017 +0100 queue_realloc() tmp not needed commit 6ee0287 Author: Paul Sladen <github@paul.sladen.org> Date: Thu Mar 9 18:54:05 2017 +0100 iterparse.c: add queue_realloc() + move 'buffersize / 2' logic there [bug astropy#5824]
iterparse.c: add queue_realloc() + move 'buffersize / 2' logic there [bug #5824]
#5869 is merged, so grabbing the latest dev should fix your problem here. Otherwise, it will be in 1.3.2 release. Thank you! |
iterparse.c: add queue_realloc() + move 'buffersize / 2' logic there [bug #5824]
Im trying to run the following program, using the pyvo library:
which results in the following error:
Versions:
astropy: 1.3
numpy: 1.12.0
python: 2.7.12 (same with 3.5.2)
The text was updated successfully, but these errors were encountered: