Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVO votable fails to parse, but looks valid? #10572

Closed
keflavich opened this issue Jul 15, 2020 · 12 comments · Fixed by #10583
Closed

SVO votable fails to parse, but looks valid? #10572

keflavich opened this issue Jul 15, 2020 · 12 comments · Fixed by #10583
Assignees

Comments

@keflavich
Copy link
Contributor

This VOTABLE:
http://svo2.cab.inta-csic.es/theory/fps/fps.php?FORMAT=metadata

will not load because of the following error:

>>> import requests, io
>>> response = requests.get('http://svo2.cab.inta-csic.es/theory/fps/fps.php?FORMAT=metadata')
>>> votable = io.BytesIO(response.content)
>>> from astropy.io.votable import  parse
>>> parse(votable, invalid='mask', verify='ignore')
Traceback (most recent call last):
  File "<ipython-input-32-f5eb581b75ff>", line 1, in <module>
    parse(votable, invalid='mask', verify='ignore')
  File "/home/adam/anaconda3/lib/python3.7/site-packages/astropy/utils/decorators.py", line 521, in wrapper
    return function(*args, **kwargs)
  File "/home/adam/anaconda3/lib/python3.7/site-packages/astropy/io/votable/table.py", line 167, in parse
    config=config, pos=(1, 1)).parse(iterator, config)
  File "/home/adam/anaconda3/lib/python3.7/site-packages/astropy/io/votable/tree.py", line 3568, in parse
    iterator, tag, data, config, pos)
  File "/home/adam/anaconda3/lib/python3.7/site-packages/astropy/io/votable/tree.py", line 3478, in _add_resource
    resource.parse(self, iterator, config)
  File "/home/adam/anaconda3/lib/python3.7/site-packages/astropy/io/votable/tree.py", line 3280, in parse
    iterator, tag, data, config, pos)
  File "/home/adam/anaconda3/lib/python3.7/site-packages/astropy/io/votable/tree.py", line 3240, in _add_param
    param.parse(iterator, config)
  File "/home/adam/anaconda3/lib/python3.7/site-packages/astropy/io/votable/tree.py", line 1472, in parse
    self.values.parse(iterator, config)
  File "/home/adam/anaconda3/lib/python3.7/site-packages/astropy/io/votable/tree.py", line 1034, in parse
    vo_raise(E09, 'OPTION', config, pos)
  File "/home/adam/anaconda3/lib/python3.7/site-packages/astropy/io/votable/exceptions.py", line 101, in vo_raise
    raise exception_class(args, config, pos)
E09: None:92:8: E09: 'OPTION' must have a value attribute

The line it's failing on is <OPTION name="All" value=""/>, which has a value specified, it's just empty.

Is this an invalid table, or is this a real bug in the reader?

@pllim
Copy link
Member

pllim commented Jul 15, 2020

Did you try run volint on it?

@keflavich
Copy link
Contributor Author

No, that would have been too clever. It reports the same error. Is there any way to work around this error, then? noncompliant votables are more common than compliant...

2: W42: No XML namespace specified
<VOTABLE version="1.1" xsi:schemaLocation="http://www.ivoa.net/xml/...
^

5: W03: Implicitly generating an ID from a name
  'INPUT:WavelengthMean_min' -> 'INPUT_WavelengthMean_min'
    <PARAM name="INPUT:WavelengthMean_min" ucd="em.wl" utype="photd...
    ^

12: W03: Implicitly generating an ID from a name
  'INPUT:WavelengthMean_max' -> 'INPUT_WavelengthMean_max'
    <PARAM name="INPUT:WavelengthMean_max" ucd="em.wl" utype="photd...
    ^

19: W03: Implicitly generating an ID from a name
  'INPUT:WavelengthEff_min' -> 'INPUT_WavelengthEff_min'
    <PARAM name="INPUT:WavelengthEff_min" ucd="em.wl.effective" uty...
    ^

26: W03: Implicitly generating an ID from a name
  'INPUT:WavelengthEff_max' -> 'INPUT_WavelengthEff_max'
    <PARAM name="INPUT:WavelengthEff_max" ucd="em.wl.effective" uty...
    ^

33: W03: Implicitly generating an ID from a name
  'INPUT:WavelengthMin_min' -> 'INPUT_WavelengthMin_min'
    <PARAM name="INPUT:WavelengthMin_min" ucd="em.wl;stat.min" utyp...
    ^

40: W03: Implicitly generating an ID from a name
  'INPUT:WavelengthMin_max' -> 'INPUT_WavelengthMin_max'
    <PARAM name="INPUT:WavelengthMin_max" ucd="em.wl;stat.min" utyp...
    ^

47: W03: Implicitly generating an ID from a name
  'INPUT:WavelengthMax_min' -> 'INPUT_WavelengthMax_min'
    <PARAM name="INPUT:WavelengthMax_min" ucd="em.wl;stat.max" utyp...
    ^

54: W03: Implicitly generating an ID from a name
  'INPUT:WavelengthMax_max' -> 'INPUT_WavelengthMax_max'
    <PARAM name="INPUT:WavelengthMax_max" ucd="em.wl;stat.max" utyp...
    ^

61: W03: Implicitly generating an ID from a name 'INPUT:WidthEff_min'
  -> 'INPUT_WidthEff_min'
    <PARAM name="INPUT:WidthEff_min" ucd="instr.bandwidth" utype="p...
    ^

68: W03: Implicitly generating an ID from a name 'INPUT:WidthEff_max'
  -> 'INPUT_WidthEff_max' (suppressing further warnings of this
  type...)
    <PARAM name="INPUT:WidthEff_max" ucd="instr.bandwidth" utype="p...
    ^

92: E09: 'OPTION' must have a value attribute
        <OPTION name="All" value=""/>
        ^

@pllim
Copy link
Member

pllim commented Jul 16, 2020

I think this is a question of either XML or VOTABLE standards and I have to defer to @tomdonaldson . As far as I can understand the IVOA standards on OPTION and setting null value, you cannot just set empty string like that. The parser does not recognize it and reports that "option has no value" because the value is empty; i.e., the iterator only sees {'name': 'All'} when <OPTION name="All" value=""/> is parsed.

If I change the data to the following, it works:

      <VALUES null="999">
        <OPTION value="999" name="All"/>

Is this a bug or feature, I don't know. But regardless, you might want to tell the service provider so they can work on fixing their VO tables.

@pllim
Copy link
Member

pllim commented Jul 16, 2020

Relevant code:

elif tag == 'OPTION':
if 'value' not in data:
vo_raise(E09, 'OPTION', config, pos)

@keflavich
Copy link
Contributor Author

Reported to the SVO. Thanks @pllim! Let's see if they fix it.

@tomdonaldson
Copy link
Contributor

Ugh. The distinction between a null valued string and an intentionally empty string has a been a low-level persistent ambiguity in VOTables. (Also, this example is VOTable version 1.1, which is not directly relevant, but predates some of the clarity that came about later for null values, mostly for non-strings.)

For this case, I think the value="" seems reasonable based on the spec. Indeed for strings, whether literal attribute values or values in the table itself, I think blank values should be interpreted as empty strings instead of nulls where possible.

On a whim I compared the default behavior of the "fast" parser with that with of the "slow" parser as below:

import requests, io
response = requests.get('http://svo2.cab.inta-csic.es/theory/fps/fps.php?FORMAT=metadata')
votable = io.BytesIO(response.content)
from astropy.io.votable import  parse
parse(votable, invalid='mask', verify='ignore', _debug_python_based_parser=True)

The code above is happy with the OPTION because it includes value='' in data in

elif tag == 'OPTION':
if 'value' not in data:
vo_raise(E09, 'OPTION', config, pos)

@keflavich If the slow parser doesn't have important known differences (@pllim?), the _debug_python_based_parser=True may be a reasonable workaround.

@tomdonaldson
Copy link
Contributor

Ironically, the parser doesn't complain at all that the PARAM elements don't have a value attribute even though that is required by the schema. (I'm not sure why the schema requires PARAM to have a value, but that's a different question.)

@carlosrodrigoblanco
Copy link

carlosrodrigoblanco commented Jul 17, 2020

Hi all

My name is Carlos and I'm the developer of this SVO service . Thanks for making us aware of this problem.

Actually I don't see any important reason to keep this empty VALUE in this VOTable, so I have just deleted it. The idea was saying that if you leave "Instrument" empty in a query all "instrument" values will be included. But I don't think that it is necessary.

But I would prefer to get more opinions about this (maybe try to discuss this at the IVOA level), because I don't think that there is a reason why having a value="" for an OPTION in a string PARAM should be incorrect and it would be good to clarify this as much as possible.

The old version of the service can be seen (by now, just in case further discussion is needed) at
http://svo2.cab.inta-csic.es/theory/fps/fpsold.php?FORMAT=metadata

@pllim
Copy link
Member

pllim commented Jul 17, 2020

Re: #10572 (comment)

Theoretically, the behavior with or without _debug_python_based_parser=True should be exactly the same. The fact that they are different means one of the parser has a bug. But which one?

@tomdonaldson
Copy link
Contributor

Re: #10572 (comment)

Theoretically, the behavior with or without _debug_python_based_parser=True should be exactly the same. The fact that they are different means one of the parser has a bug. But which one?

The value="" should be allowed, so I'm thinking the C version is wrong here. At first glance that code looks pretty generic. If true, then all attributes with blank string values are being treated as if they weren't present at all. I suspect that validating whether a required attribute is present is one of the very few cases where it makes any difference.

@pllim pllim added Bug and removed question labels Jul 17, 2020
@pllim
Copy link
Member

pllim commented Jul 17, 2020

The naming of _debug_python_based_parser is unfortunate too, as it kinda disallows us from promoting its use, being a "hidden" keyword. I guess it was only meant for debugging the C-extension? 🤷‍♀️

@tomdonaldson
Copy link
Contributor

Since the service referenced in the original report has changed to prevent this problem, here is a self-contained code block that demonstrates the issue on a tiny, but mostly legal VOTable. Note that the PARAM is actually missing a value attribute, but that neither parser complains.

import io
from astropy.io.votable import  parse

vottext = b'''<?xml version="1.0"?>
<VOTABLE version="1.1" xmlns="http://www.ivoa.net/xml/VOTable/v1.1"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:noNamespaceSchemaLocation="http://www.ivoa.net/xml/VOTable/v1.1" >
  <RESOURCE type="meta">
    <PARAM name="INPUT:Instrument" ucd="instr" utype="" unit="" datatype="char" arraysize="*">
      <DESCRIPTION>Instrument</DESCRIPTION>
      <VALUES>
        <OPTION name="All" value=""/>
        <OPTION value="90prime"/>
      </VALUES>
    </PARAM>
  </RESOURCE>
</VOTABLE>
'''

# Succeeds using the slow parser.
vot = parse(io.BytesIO(vottext), invalid='mask', verify='ignore', _debug_python_based_parser=True)
print(vot)

# Fails for the fast parser, complaining about missing OPTION value.
vot = parse(io.BytesIO(vottext), invalid='mask', verify='ignore')
print(vot)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants