Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit parser only issues warning rather than throwing an exception for completely invalid string when using "vounit" format #16199

Open
JeremyMcCormick opened this issue Mar 14, 2024 · 15 comments

Comments

@JeremyMcCormick
Copy link

JeremyMcCormick commented Mar 14, 2024

Description

The following warning prints when I define a unit in "vounit" format with a non-sensical string:

WARNING: UnitsWarning: Unit 'bad' not supported by the VOUnit standard. Did you mean Ba (deprecated), aD, ad or g.cm**-1.s**-2? [astropy.units.format.vounit]

This should be throwing an exception, because "bad" is a completely invalid string for a units definition.

Expected behavior

The parser for "vounit" should be throwing an exception in this case and not just issuing a warning when parse_strict="raise" is set. Otherwise, it cannot be used programmatically to check VOUnit and raise errors when invalid ones are used.

The regular expression matching used here represents undesirable behavior to me. The parser should not be doing fuzzy matching on strings which do not represent valid units and then issuing warnings instead of exceptions when it finds a string that kind of appears similar.

How to Reproduce

from astropy.unit import Unit
Unit("bad", format="vounit", parse_strict="raise")  # This should raise an exception but does not.

Versions

In [20]: astropy.__version__
Out[20]: '5.3.4'
Copy link

Welcome to Astropy 👋 and thank you for your first issue!

A project member will respond to you as soon as possible; in the meantime, please double-check the guidelines for submitting issues and make sure you've provided the requested details.

GitHub issues in the Astropy repository are used to track bug reports and feature requests; If your issue poses a question about how to use Astropy, please instead raise your question in the Astropy Discourse user forum and close this issue.

If you feel that this issue has not been responded to in a timely manner, please send a message directly to the development mailing list. If the issue is urgent or sensitive in nature (e.g., a security vulnerability) please send an e-mail directly to the private e-mail feedback@astropy.org.

@pllim
Copy link
Member

pllim commented Mar 14, 2024

Still a problem in astropy v6.1.dev . Looks like this call went here:

return cls._parser.parse(s, lexer=cls._lexer, debug=debug)

That invokes parse in here:

class ThreadSafeParser:

This line calls an external library (astropy.extern.ply.yacc.LRParser) that somehow bypasses Unit error handling:

return self.parser.parse(*args, **kwargs)

You can get this same problematic behavior with this call:

from astropy.units.format import vounit
vounit.VOUnit._parser.parser.parse("bad")

This very low-level parse function does not understand parse_strict:

>>> vounit.VOUnit._parser.parser.parse?
Signature:
vounit.VOUnit._parser.parser.parse(
    input=None,
    lexer=None,
    debug=False,
    tracking=False,
    tokenfunc=None,
)
Docstring: <no docstring>
File:      astropy/extern/ply/yacc.py
Type:      method

If the except is not catching ValueError here:

except ValueError as e:

I think it would have properly respected parse_strict. So there is a bug but not clear to me what is a proper way to fix this.

vounit.VOUnit._parse_unit("bad")  # ValueError

Thanks for the report!

@pllim
Copy link
Member

pllim commented Mar 14, 2024

That said, this throws an exception even though theoretically it went though similar logic route. 🤯

from astropy.units.format.cds import CDS
CDS._parser.parser.parse("bad")

@pllim
Copy link
Member

pllim commented Mar 14, 2024

FWIW if I comment out this line, it would respect parse_strict but I don't know why this line was added back in #11258 by @aragilar (and approved by @adrn).

--- a/astropy/units/format/vounit.py
+++ b/astropy/units/format/vounit.py
@@ -111,7 +111,7 @@ class VOUnit(generic.Generic):
                     core.UnitsWarning,
                 )

-                return cls._def_custom_unit(t.value)
+                #return cls._def_custom_unit(t.value)

             raise

@gpdf
Copy link
Contributor

gpdf commented Mar 15, 2024

There's clearly tension between

  • flexibly dealing with units that users find in datasets in the wild (as well as avoiding avalanches of nuisance warning messages); and
  • assisting data publishers in rigorously verifying that their published data follows applicable standards.

The former is of course of interest to way more users of Astropy. But I think it's important for data publishers (@JeremyMcCormick and I are with the Rubin Observatory) to have the tools to do this properly. Our intent is to validate our published units specifically against VOUnit. From what I understand, the parse_strict option is the tool we should be using to do this.

@aragilar
Copy link
Contributor

Looking at git blame, that was part of the changes @mhvk made to the PR. @mhvk do you recall why the VOUnit changes were made?

@mhvk
Copy link
Contributor

mhvk commented Mar 15, 2024

I don't recall in detail, but recall the vounit specification has a very liberal idea about custom units. I do agree it would be nice to allow checking things without having to turn warnings into errors, and parse_strict seems logical for it... I think passing that down would make sense!

@neutrinoceros
Copy link
Contributor

I think this is somewhat up my alley so if no one else is already working on a fix I'm happy to give it a go !

@eerovaher
Copy link
Member

From what I can tell this is a duplicate of #6302 and #13017. #13042 was an attempt to fix the problem.

I am not very familiar with the VOUnit standard, but from what I remember from participating in the discussions the last time this was brought up is that apparently "bad" is a valid unit as far as VOUnit is concerned. You can look up the details and references in the issues and pull request I mentioned.

@pllim
Copy link
Member

pllim commented Mar 15, 2024

I totally forgot I tried to fix it before, and from a different angle... 😆

@tomdonaldson , your input would be appreciated here. Thanks!

@gpdf
Copy link
Contributor

gpdf commented Mar 15, 2024

@eerovaher I think you are in essence referring to the following text from the standard?

In general, this Recommendation is concerned almost exclusively with the syntactic question of what is and is not a valid unit string, leaving most questions of interpretation or enforcement to a higher layer in an application stack. Specifically:

  • The specification does not forbid `unknown' units. An implementation of this specification should be able to recognise, and communicate, that a unit is unknown, but it is not required to reject a unit string on the grounds that it is unrecognised.

(I.e., there's nothing special about bad, and, for example, bad is not like furlong with its famous prefix interpretation.)

Let's be clear what this says: "an implementation ... is not required to reject [an unrecognized] unit string". But it is not forbidden from doing so. This goes to my previous comment.

If we expect Astropy to be used by people who are creating data, not just reading existing data in a permissive, annoyance-reducing way, it absolutely should have a way to validate units for being recognized VOUnit strings (the issue in this ticket) as well as a simple way to verify that in-memory Unit objects can be written out as a recognized VOUnit expression (not exactly the subject of this ticket).

The more we support writing valid VOUnit strings, the less, over time, the leniencies at read time may be needed.

@mhvk
Copy link
Contributor

mhvk commented Mar 15, 2024

Thanks, @eerovaher, for finding those older issues - I knew there was something but didn't find them quickly enough! And thanks, @gpdf, for the good summary.

Looking at the older issues, there seemed a general consensus on the outcome parse_strict='raise' should, in fact, raise on an unrecognized unit.

I'd love to see a new PR, though clearly the travails of #13042 shows that it is not as trivial as one might think.

@JeremyMcCormick
Copy link
Author

JeremyMcCormick commented Mar 18, 2024

I would like to point out if it is not clear from the discussion that the unit validation in vounit format just prints warnings for other non-standard unit strings and not just in my "bad" example. :)

WARNING: UnitsWarning: Unit 'day' not supported by the VOUnit standard. Did you mean daJy, daRy or dayr? [astropy.units.format.vounit]
WARNING:astropy:UnitsWarning: Unit 'day' not supported by the VOUnit standard. Did you mean daJy, daRy or dayr?
WARNING: UnitsWarning: Unit 'degree' not supported by the VOUnit standard.  [astropy.units.format.vounit]
WARNING:astropy:UnitsWarning: Unit 'degree' not supported by the VOUnit standard. 
WARNING: UnitsWarning: Unit 'DN' not supported by the VOUnit standard. Did you mean dN or daN? [astropy.units.format.vounit]
WARNING:astropy:UnitsWarning: Unit 'DN' not supported by the VOUnit standard. Did you mean dN or daN?
WARNING: UnitsWarning: Unit 'second' not supported by the VOUnit standard.  [astropy.units.format.vounit]
WARNING:astropy:UnitsWarning: Unit 'second' not supported by the VOUnit standard. 
WARNING: UnitsWarning: Unit 'nmgy' not supported by the VOUnit standard.  [astropy.units.format.vounit]
WARNING:astropy:UnitsWarning: Unit 'nmgy' not supported by the VOUnit standard. 

When using vounit format, deviations from the VOUnit standard should result in an exception when parse_strict='raise' is set. If the user wants less strict parsing of their unit strings with exceptions still enabled, then the format argument can simply be left to the default, which will accept these units without complaining. In the case that they only want a warning message with vounit enabled, then parse_strict can be set accordingly. I imagine the latter might be the case when reading existing data with invalid units that cannot be changed.

I would really like to be able to get proper exceptions thrown here, because we are using a programmatic framework based on Pydantic where errors with unit specifications should percolate up as validation errors.

Additional minor issue: Warnings messages are being printed twice.

@JeremyMcCormick
Copy link
Author

JeremyMcCormick commented Apr 17, 2024

@neutrinoceros et al.

Any progress on this or plans to include a fix in an upcoming release?

We would very much like to be able to validate against the IVOA units specification and get proper exceptions when errors occur rather than just warning messages.

@neutrinoceros
Copy link
Contributor

It's definitely non trivial to fix, and there were already multiple failed attempts, so I don't want to make any promises here. However, I am still up for it if others think it's worth prioritizing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants