Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make accepted UTCTime formats more lenient #197

Closed
nurpax opened this issue Apr 28, 2014 · 13 comments
Closed

Make accepted UTCTime formats more lenient #197

nurpax opened this issue Apr 28, 2014 · 13 comments

Comments

@nurpax
Copy link
Contributor

nurpax commented Apr 28, 2014

I noticed that Aeson only accepts UTCTime's with the following syntax: %FT%T%QZ

Unfortunately, it's quite common to see some variations of this syntax where the middle 'T' and the trailing 'Z' have been dropped. I think many Aeson users would appreciate if the UTCTime parser would be a bit more lenient on these; these parse failures can be hard to track down in a Haskell program if all you get is a Nothing from decode.

Here's a couple of examples of where timestamps with the troublesome format could originate from:

(.hs code & cabal that reproduce the parse error here: https://gist.github.com/nurpax/11369860)

Python's default __str__() conversion for datetime:

> d = datetime.now()
> print datetime.now()
2014-04-28 13:13:13.112233

Similarly, SQLite yields:

sqlite> select datetime('now');
2014-04-28 12:35:12

Data.Time's read for UTCTime also accepts both of these formats.

I have an attoparsec-based parser adapted from postgresql-simple that's pretty well tested. I'd be happy to send a PR for relaxing the UTCTime parser rules if you agree with the general idea of allowing some leeway in timestamp syntax.

I am aware of #109. Also, I do not propose any change to UTCTime encode format.

@nurpax
Copy link
Contributor Author

nurpax commented Apr 28, 2014

It's of course possible to work-around this in client-code, like say with something like this:

kludgeParseTime :: String -> UTCTime
kludgeParseTime t =
  case parseTime defaultTimeLocale "%F %T" t of
    Just d -> d
    Nothing ->
      case parseTime defaultTimeLocale "%FT%T%QZ" t of
        Just d -> d
        Nothing -> error "could not parse date"


instance FromJSON Foo where
  parseJSON (Object v) = Foo <$> (kludgeParseTime <$> v .: "timestamp")

@cmsaperstein
Copy link

+1. This would be really convenient!

nurpax added a commit to nurpax/aeson that referenced this issue May 8, 2014
It's quite common to encounter timestamp strings of format:

2014-04-28 13:13:13.112233
2014-04-28 13:13:13

(I.e., without 'T' in the middle and without the 'Z' postfix)

An example source of these strings is SQLite:

sqlite> select datetime('now');
2014-04-28 12:35:12

Rather than fail to parse these strings, allow them in JSON -> UTCTime
conversion.

See haskell#197 for more examples & test case that reproduces the failure.
@ghost
Copy link

ghost commented May 25, 2014

At the very least, all 8601 formats should be accepted, for instance having a time zone indicator without including seconds or fractions of a second.

@hvr
Copy link
Member

hvr commented May 25, 2014

Unfortunately, it's quite common to see some variations of this syntax where the middle 'T' and the trailing 'Z' have been dropped.

Personally, I'm strongly -1 on dropping the Z part, as that's been a major source of annoying timezone bugs. If the data-source left off the Z part it was very often an indicator that someone didn't pay attention to timezone handling, and passed a local-time (lacking any explicit TZ info) where an utc-time was expected.

@nurpax
Copy link
Contributor Author

nurpax commented May 25, 2014

@hvr has a point. For example, my Python example above uses local time, not UTC. Perhaps it'd be better not support by this by default but add a passage to docs on how to deal with non-standard UTC strings.

@rbtcollins
Copy link

Launchpads API outputs timestamps like so: 2014-04-03T15:58:03.572018+00:00 - so UTC but explicit rather than Z :/. But these are actually UTC times - just an explicit 0 offset. That should be safely handleable.

@rbtcollins
Copy link

FWIW the format I needed is handled by the ZonedTime class (Data.Time.ZonedTime rather than Clock.UTCTime in my data structure). Perhaps this is now just a docs issue?

@bos
Copy link
Collaborator

bos commented Jul 11, 2015

@nurpax I'd be quite interested in that pull request, especially if it's faster than the current horrible ZonedTime mess.

@nurpax
Copy link
Contributor Author

nurpax commented Jul 12, 2015

@bos Sure. I put together a UTCTime parser with attoparsec. It beats the performance of the existing UTCTime parser by 6x (30 usec vs 5 usec per a datetime string).

The UTCTime parser still accepts many inputs that the earlier parser didn't. I'll clean that up so that UTCTime parser only accepts what it used to (parseTime defaultTimeLocale "%FT%T%QZ").

If the tests for ZonedTime are comprehensive, I guess it's possible to turn that into an attoparsec based parser too. Looks like a lot more work though. I'll put together a PR for UTCTime first and maybe follow up with ZonedTime later.

@hvr
Copy link
Member

hvr commented Jul 12, 2015

@nurpax I think it's ok if it accepts more inputs (which ought to be clearly documented) than the current parser does as long as it doesn't accept inputs lacking any timezone indicator

nurpax added a commit to nurpax/aeson that referenced this issue Jul 12, 2015
Replace the old 'parseTime' based UTCTime parser with an attoparsec
based parser.  This is roughly 5x faster than the old one.

It also allows decoding some timestamp formats that the old version
rejected.  The new version still requires all timestamps to specify a
timezone when decoding to a UTCTime type.

Also add a few unit tests for both what should be accepted and also
for things that should be rejected.

A lot of this code was bummed from the postgresql-simple library -
thus credits to @lpsmith too!
@nurpax
Copy link
Contributor Author

nurpax commented Jul 12, 2015

@hvr, @bos, I think the referenced PR captures the requirements expressed in this bug thread. I didn't touch ZonedTime because the new UTCTime parser should accepts the commonly occurring ISO-8601 formats, while still always requiring a timezone qualifier to be present. It departs from the original in that earlier only UTC (e.g., Zulu timezone) timestamps were accepted when decoding to UTCTime type. Now timestamps that specify a timezone can be implicitly converted to UTCTime by appropriately adjusting for the timezone offset.

@bos
Copy link
Collaborator

bos commented Jul 13, 2015

@nurpax Thanks for putting in the work to improve this. I'll review the pull request as soon as I can.

bos pushed a commit that referenced this issue Jul 18, 2015
Replace the old 'parseTime' based UTCTime parser with an attoparsec
based parser.  This is roughly 5x faster than the old one.

It also allows decoding some timestamp formats that the old version
rejected.  The new version still requires all timestamps to specify a
timezone when decoding to a UTCTime type.

Also add a few unit tests for both what should be accepted and also
for things that should be rejected.

A lot of this code was bummed from the postgresql-simple library -
thus credits to @lpsmith too!
@bos
Copy link
Collaborator

bos commented Jul 21, 2015

I believe this should now be all fixed up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants