Skip to content

Presets

sat edited this page Jun 26, 2026 · 2 revisions

Presets

Presets are ready-made LogParsers for common formats. Call one and start parsing — and read it as a worked example of the Header Rules and Statement Rules it is built from.

from log2seq import init_parser, preset

parser = init_parser()                    # same as preset.default()
parser = preset.default()
parser = preset.apache_errorlog_parser()

You can also reuse a preset's pieces — preset.default_header_parsers() and preset.default_statement_parser() return the two halves so you can keep one and replace the other.

The default parser (init_parser / preset.default)

The default parser targets syslog and ISO-style lines. Its header stage is two rules tried in order (first match wins):

  • Rule 1 — syslog. Digit("year", optional=True), MonthAbbreviation(), Digit("day"), Time(), Hostname("host"), Statement(). The year is optional; when the line has none, it comes from the current year.
  • Rule 2 — ISO date. Date(), Time(), Hostname("host"), Statement() (for 2024-03-05 06:07:08 host …).
p = preset.default()
p.process_line("Jun 30 11:11:11.012345+09:00 host app[7]: started ok")["timestamp"]
# datetime.datetime(2026, 6, 30, 11, 11, 11, 12345, tzinfo=+09:00)

p.process_line("2024 Mar  5 06:07:08 host msg")["timestamp"]   # 2024-03-05 ... (rule 1, explicit year)
p.process_line("2024-03-05 06:07:08 host msg")["timestamp"]    # 2024-03-05 ... (rule 2)
p.process_line("Mar  5 06:07:08 host msg")["timestamp"]        # current-year-03-05 (rule 1)

Its statement stage is a four-step pipeline (preset.default_statement_parser):

  1. Split on the standard symbols "()[]{}|+',=><;\#and space — but **not**:`.
  2. FixIP() — protect IPv4/IPv6 addresses (including network addresses).
  3. Fix([pattern_time, pattern_macaddr]) — protect clock times and MAC addresses.
  4. Split(":") — split the remaining :.

Deferring the : split to the end is exactly the ordering lesson from Statement Rules: IPv6 addresses, 12:34:56 times and MAC addresses are fixed first, so the final : split cannot break them. You can watch this pipeline rewrite a line step by step with process_line(..., verbose=True) — see Watching the pipeline in Statement Rules.

The Apache error-log parser (preset.apache_errorlog_parser)

Handles both the 2.2 and 2.4 error_log layouts. The module name is matched generically (not just core), and exposed as modulename.

ap = preset.apache_errorlog_parser()

ap.process_line("[Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] denied")
# severityname='error', host='127.0.0.1', message='denied'   (2.2: no module/pid)

ap.process_line("[Fri Sep 09 10:42:29.902022 2011] [core:error] "
                "[pid 35:tid 9] [client 1.2.3.4] no file")
# modulename='core', severityname='error', processid=35, threadid=9,
# host='1.2.3.4', message='no file'

ap.process_line("[Mon Dec 05 08:10:12.123456 2016] [authz_core:error] "
                "[pid 1:tid 2] AH01630 denied")
# modulename='authz_core', message='AH01630 denied'   (no client -> host omitted)

The [client <ip>] field is optional; when it is absent the host key is simply omitted (an absent optional item leaves no key — see Header Rules).

More worked examples

The repository's example/loghub_*/parser.py are fuller, real-world parsers for the loghub datasets (Apache, Linux, Mac, Thunderbird, Windows, …). They are the best reference for the patterns covered in Practical Patterns: separator vs full_format, anchoring free-form fields, and modelling tag-less lines with a second rule.

See also

Clone this wiki locally