Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support syslog datetime format in parseDateTimeBestEffort() #45306

Closed
qoega opened this issue Jan 16, 2023 · 0 comments · Fixed by #50925
Closed

Support syslog datetime format in parseDateTimeBestEffort() #45306

qoega opened this issue Jan 16, 2023 · 0 comments · Fixed by #50925
Labels

Comments

@qoega
Copy link
Member

qoega commented Jan 16, 2023

Use case

Syslog format is quite common and all the common linux logs use this format as they are written by syslog (/var/log/auth.log, /var/log/syslog, /var/log/kern.log, etc.)

Currently we parse it completely wrong

SELECT parseDateTimeBestEffort('Jan 10 06:07:06')

┌─parseDateTimeBestEffort('Jan 10 06:07:06')─┐
│                        2000-01-10 06:07:06 │
└────────────────────────────────────────────┘

If we append current year it is parsed correctly, but it is inconvenient and there is a high risk that user will not process Dec->Jan correctly and will get incorrect year.

SELECT parseDateTimeBestEffort(concat(CAST(toYear(today()), 'String'), ' Jan 10 06:07:06'))

┌─parseDateTimeBestEffort(concat(CAST(toYear(today()), 'String'), ' Jan 10 06:07:06'))─┐
│                                                                  2023-01-10 06:07:06 │
└──────────────────────────────────────────────────────────────────────────────────────┘

Additional context

It is important that you may parse December logs in January so we should have some safeguard that year is detected as a year corresponding to month in previous 11 months(or less) and not just current year.

Syslog format is
https://www.rfc-editor.org/rfc/rfc3164
https://www.rfc-editor.org/rfc/rfc5424

The TIMESTAMP field is the local time and is in the format of "Mmm dd
   hh:mm:ss" (without the quote marks) where:

         Mmm is the English language abbreviation for the month of the
         year with the first character in uppercase and the other two
         characters in lowercase.  The following are the only acceptable
         values:

         Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec

         dd is the day of the month.  If the day of the month is less
         than 10, then it MUST be represented as a space and then the
         number.  For example, the 7th day of August would be
         represented as "Aug  7", with two spaces between the "g" and
         the "7".

         hh:mm:ss is the local time.  The hour (hh) is represented in a
         24-hour format.  Valid entries are between 00 and 23,
         inclusive.  The minute (mm) and second (ss) entries are between
         00 and 59 inclusive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant