Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse python logger timestamps #106

Merged
merged 1 commit into from Aug 24, 2015
Merged

Parse python logger timestamps #106

merged 1 commit into from Aug 24, 2015

Conversation

ghost
Copy link

@ghost ghost commented Aug 13, 2015

By default, the Python logging module uses a format that includes %(asctime)s. The Python docs define this variable as:

Human-readable time when the LogRecord was created. By default this is of the form ‘2003-07-08 16:49:45,896’ (the numbers after the comma are millisecond portion of the time).

Example: 2003-09-25 10:49:41,502

Currently, dateutil fails to parse a string like this:

>>> from dateutil.parser import parse
>>> parse("2003-09-25 10:49:41,502")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/dateutil/parser.py", line 1008, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/dateutil/parser.py", line 395, in parse
    raise ValueError("Unknown string format")
ValueError: Unknown string format

This patch enables dateutil to parse these timestamps which are extremely common in Python log files.

>>> from dateutil.parser import parse
>>> parse("2003-09-25 10:49:41,502")
datetime.datetime(2003, 9, 25, 10, 49, 41, 502000)

@pganssle
Copy link
Member

Thanks @ryanss. I haven't given this a thorough look-over yet, but I believe this solves Issue #28. I remember when I looked at this some months ago I found it a bit difficult to handle all the edge cases, so I'll have to see what I was thinking about at the time.

@pganssle pganssle added this to the 2.5.0 milestone Aug 14, 2015
@pganssle
Copy link
Member

@ryanss I think my previous concern was that there could be some weird edge cases where this approach could be problematic and since there isn't actually a spec saying what formats and what variations on formats we support, it's hard to say whether this could break any of it.

That said, I think the best thing to do is to take our unit tests as a de facto spec for what formats are supported, and since this doesn't break the unit tests, it's a 👍 from me.

I did make a PR (ryanss/dateutil#1) against your branch to pre-compile the regular expression. Since this tokenizer is called in the innermost loop of the parser, I think it's best to move the overhead of creating the regular expression into a static member. Let me know if you disagree.

ex. 2003-09-25 10:49:41,502
@ghost
Copy link
Author

ghost commented Aug 24, 2015

Completely agree, this is much more efficient. I've updated my pull request to reflect the compiled regular expression changes.

@pganssle
Copy link
Member

Looks good, merging.

@pganssle pganssle closed this Aug 24, 2015
@pganssle pganssle reopened this Aug 24, 2015
pganssle added a commit that referenced this pull request Aug 24, 2015
Allow comma separator for fractional seconds in parser. (Fixes #28 and lp:974463)
@pganssle pganssle merged commit ca8e98b into dateutil:master Aug 24, 2015
aschatten pushed a commit to aschatten/dateutil that referenced this pull request Nov 16, 2015
Allow comma separator for fractional seconds in parser. (Fixes dateutil#28 and lp:974463)
@pganssle pganssle mentioned this pull request Feb 25, 2016
@wimglenn
Copy link

@pganssle

It seems to ignore the year in the string, and use the current year (or the year of default if that was specified)

>>> import dateutil.parser
>>> dateutil.parser.parse("May 20,2019")
datetime.datetime(2020, 5, 20, 0, 0)

git bisected it to this logger change, specifically bc69c3f

In python-dateutil<2.5.0 the result was parsed correctly.

Any way to make dateutil throw an exception here instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants