New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser only partially accounts for case-insensitive nature of ISO8601 datestamps #820

Closed
cjgibson opened this Issue Sep 20, 2018 · 3 comments

Comments

Projects
None yet
3 participants
@cjgibson

cjgibson commented Sep 20, 2018

$ python -c "from dateutil.parser import parse
> print parse('1970-01-01T00:00:00Z')
> print parse('1970-01-01t00:00:00Z')
> print parse('1970-01-01t00:00:00')
> print parse('1970-01-01t00:00:00z')"
1970-01-01 00:00:00+00:00
1970-01-01 00:00:00+00:00
1970-01-01 00:00:00
Traceback (most recent call last):
  File "<string>", line 5, in <module>
  File "/usr/local/lib/python2.7/dist-packages/dateutil/parser.py", line 1182, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/dateutil/parser.py", line 559, in parse
    raise ValueError("Unknown string format")
ValueError: Unknown string format

It looks like this library properly supports both the use of t and T as legal time separators for ISO8601 datestamps, but only supports the uppercased shorthand for UTC (Z), and not z. From what I can find online (namely, this RFC: https://tools.ietf.org/html/rfc3339#section-5.6), it seems like the latter is also legal syntax.

@pganssle

This comment has been minimized.

Member

pganssle commented Sep 21, 2018

If true that the ISO 8601 format allows for z, then we should at the very least update dateutil.parser.isoparse.

I don't know if we need to or want to add support for this in dateutil.parser.parse. The obvious way to do so would be to change parserinfo.UTCZONE to include 'z', but I suspect a lower-case z is more often there for some reason other than because the datetime is supposed to be in UTC. Anyone who knows they have a z can always create a modified parserinfo if they run into this problem.

That said, one thing I was considering for the long term is refactoring dateutil.parser.parse to use dateutil.parser.isoparse first, and fall back to the regular parser if the parse fails. In that case, YYYY-MM-DDtHH:MM:SSz would be accepted, but Dec 19, 2018 10:30z would not be, which I'm kinda OK with.

@jbrockmendel Thoughts? Am I overthinking this? Should we just add 'z' to the default UTCZONE?

@jbrockmendel

This comment has been minimized.

Contributor

jbrockmendel commented Sep 21, 2018

Should we just add 'z' to the default UTCZONE?

It seems pretty unambiguous in the example Dec 19, 2018 10:30z. Off the top of my head I can't think of a counter-example where recognizing it as UTC would be incorrect.

@pganssle

This comment has been minimized.

Member

pganssle commented Sep 23, 2018

@jbrockmendel OK, I agree.

Take-away for anyone who wants to make a PR:

  1. Change the Z detection to detect z as well in dateutil.parser.isoparse
  2. Add z to UTCZONE.

The only remaining uncertainty I have is whether we should make UTCZONE case-insensitive. I'm thinking maybe in a future version we can have that, but with a switch to turn it off.

@pganssle pganssle closed this in #822 Oct 6, 2018

pganssle added a commit that referenced this issue Oct 6, 2018

Merge pull request #822 from Cheukting/isoparse
BUG: Closes #820 Accepting 'z' as valid UTC in isoparser
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment