Accurate handling of parsing errors #91

vrutsky · 2014-01-28T10:46:35Z

Arrow can silently fail to parse complete date string and return invalid result when date string partially matches one of formats.

Consider parsing SQLite date-time string:

>>> arrow.get('2014-01-25 01:22:58')
<Arrow [2014-01-25T00:00:00+00:00]>

Note that time part of date not parsed and were silently lost.

Or parsing even not date-time sting:

>>> arrow.get("Happy 2014!")
<Arrow [2014-01-01T00:00:00+00:00]>

The text was updated successfully, but these errors were encountered:

sochoa · 2014-02-08T20:36:57Z

The challenge with this issue is that, based on how the parser works, its supposed to behave this way. It found a year, and returned a date/time that has the right year and everything else zeroed out. The issue here is that the input needs to be sanitized, and that (IMO) would be the job of the application prior to calling arrow.get().

I would suggest closing this issue as a non-issue.

rutsky · 2014-02-10T20:34:47Z

I don't think its supposed to behave this way.

Documentation for arrow says that in current context it tries to parse ISO-8601-formatted str. Method for parsing in source code is called parse_iso.

momentjs (on which arrow interface is based) allows to parse in corresponding case browser dependent date string or ISO-8601 date string. If momentjs fails to parse date it returns special "invalid date" value.

As I see ISO-8601 describe wide range of date strings formats that can be unambiguously interpreted. Here is the list of supported ISO-8601 formats in momentjs:

"2013-02-08"
"2013-02-08T09"
"2013-02-08 09"
"2013-02-08T09:30"
"2013-02-08 09:30"
"2013-02-08T09:30:26"
"2013-02-08 09:30:26"
"2013-02-08T09:30:26.123"
"2013-02-08 09:30:26.123"
"2013-02-08T09:30:26 Z"
"2013-02-08 09:30:26 Z"
"2013-W06-5"
"2013-W06-5T09"
"2013-W06-5 09"
"2013-W06-5T09:30"
"2013-W06-5 09:30"
"2013-W06-5T09:30:26"
"2013-W06-5 09:30:26"
"2013-W06-5T09:30:26.123"
"2013-W06-5 09:30:26.123"
"2013-W06-5T09:30:26 Z"
"2013-W06-5 09:30:26 Z"
"2013-039"
"2013-039T09"
"2013-039 09"
"2013-039T09:30"
"2013-039 09:30"
"2013-039T09:30:26"
"2013-039 09:30:26"
"2013-039T09:30:26.123"
"2013-039 09:30:26.123"
"2013-039T09:30:26 Z"
"2013-039 09:30:26 Z"

In my opinion arrow.get should work as momentjs's moment() function and strictly parse ISO-8601-formatted date (in addition to parsing timestamp, tzinfo and other quite strict formats that it supports now of course).

honzajavorek · 2014-03-30T12:54:33Z

There could be a way how to switch the parser to a strict mode, if one needs it. Sometimes it's useful to fail fast and be strict about inputs. Also, ParserError should rather be a subclass of ValueError I think.

keynmol · 2015-05-19T10:27:37Z

Hi, is there any movement on this? Or a workaround? We are parsing OG tags on pages and some of them have some really malformed shit there. And Arrow is not helping:

In [52]: arrow_get('blabla102015').isoformat()
Out[52]: '1020-01-01T00:00:00+00:00'

jacobsvante · 2015-09-10T11:22:10Z

Really annoying that it parses so relaxedly. Almost worse than php date parsing.

laruellef · 2016-04-01T22:31:02Z

Ditto,
I would support bombing on incomplete parsing such as:
arrow.get("02/01/2004")
Arrow [2004-01-01T00:00:00+00:00]

I deal with various data sources and date formats on a daily basis and arrow has been very valuable in handling parsing automagically.
However, after months of use, I was unaware that the use case above wasn't working until now.
This has resulted in service affecting problems...

Would also be useful to document the complete list of supported formats,
is that avail anywhere?

andrewelkins · 2016-04-02T01:51:13Z

@laruellef I don't think it's documented anywhere.

@rutsky I agree, but inorder to not break backwards compatibility it would need to be a flag to be set.

andrewelkins · 2016-12-31T22:55:58Z

Related to #292, #267 and #399

I'd like to handle two situations which might be fixed with the same code:

10/10/2016 #Non-iso format which currently returns 2016-01-01 - At minimum should return an error
2016-1-10 #Non-padded month which currently returns 2016-01-01 - At minimum should return an error

Marco-Sulla · 2017-01-01T18:44:01Z

I don't think JS and JS libraries are good example of robust design in general. moment.js it's a good library, but Python good practices enforces the use of exceptions (or at least to return a None, if you want a C-style speedup in your code)

andrewelkins · 2017-01-01T22:39:53Z

@MarcoSulla Agreed, it's just a basis, but doesn't mean Arrow has to implement it verbatim.

systemcatch · 2019-02-15T00:05:45Z

The current situation for all these examples.

>>> arrow.get('2014-01-25 01:22:58')
<Arrow [2014-01-25T01:22:58+00:00]>
>>> arrow.get('blabla102015').isoformat()
'1020-01-01T00:00:00+00:00'
>>> arrow.get("Happy 2014!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/chris/arrow/arrow/api.py", line 22, in get
    return _factory.get(*args, **kwargs)
  File "/home/chris/arrow/arrow/factory.py", line 174, in get
    dt = parser.DateTimeParser(locale).parse_iso(arg)
  File "/home/chris/arrow/arrow/parser.py", line 119, in parse_iso
    return self._parse_multiformat(string, formats)
  File "/home/chris/arrow/arrow/parser.py", line 286, in _parse_multiformat
    raise ParserError('Could not match input to any of {} on \'{}\''.format(formats, string))
arrow.parser.ParserError: Could not match input to any of ['YYYY-MM-DD HH:mm'] on 'Happy 2014!'
>>> arrow.get("02/01/2004")
<Arrow [2004-01-01T00:00:00+00:00]>
>>> arrow.get("10/10/2016")
<Arrow [2016-01-01T00:00:00+00:00]>
>>> arrow.get("2016-1-10")
<Arrow [2016-01-01T00:00:00+00:00]>

crsmithdev added bug labels Jul 28, 2014

This was referenced Dec 31, 2016

incorrect date without leading zero in .get('2016-1-17') #292

Closed

arrow.get("invalid date") does not raise an error #267

Closed

Validater for if arrow.get() properly parsed the date. #399

Closed

Confused parsing #268

Closed

systemcatch mentioned this issue Nov 29, 2018

Allow fetching of historical data for CL-SIC electricitymaps/electricitymaps-contrib#1692

Merged

systemcatch mentioned this issue Dec 19, 2018

Address Pendulum Retort #428

Closed

jadchaar mentioned this issue Jul 14, 2019

v0.15.0 Changes❗ #612

Closed

jadchaar mentioned this issue Sep 8, 2019

v0.15.0: More robust parsing and support for ordinal dates and 8601 basic format #655

Merged

jadchaar closed this as completed in #655 Sep 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accurate handling of parsing errors #91

Accurate handling of parsing errors #91

vrutsky commented Jan 28, 2014

sochoa commented Feb 8, 2014

rutsky commented Feb 10, 2014

honzajavorek commented Mar 30, 2014

keynmol commented May 19, 2015

jacobsvante commented Sep 10, 2015

laruellef commented Apr 1, 2016

andrewelkins commented Apr 2, 2016

andrewelkins commented Dec 31, 2016

Marco-Sulla commented Jan 1, 2017 •

edited

Loading

andrewelkins commented Jan 1, 2017

systemcatch commented Feb 15, 2019

Accurate handling of parsing errors #91

Accurate handling of parsing errors #91

Comments

vrutsky commented Jan 28, 2014

sochoa commented Feb 8, 2014

rutsky commented Feb 10, 2014

honzajavorek commented Mar 30, 2014

keynmol commented May 19, 2015

jacobsvante commented Sep 10, 2015

laruellef commented Apr 1, 2016

andrewelkins commented Apr 2, 2016

andrewelkins commented Dec 31, 2016

Marco-Sulla commented Jan 1, 2017 • edited Loading

andrewelkins commented Jan 1, 2017

systemcatch commented Feb 15, 2019

Marco-Sulla commented Jan 1, 2017 •

edited

Loading