-
Notifications
You must be signed in to change notification settings - Fork 0
Presets
Presets are ready-made LogParsers for common formats. Call one and start
parsing — and read it as a worked example of the Header Rules and
Statement Rules it is built from.
from log2seq import init_parser, preset
parser = init_parser() # same as preset.default()
parser = preset.default()
parser = preset.apache_errorlog_parser()You can also reuse a preset's pieces — preset.default_header_parsers() and
preset.default_statement_parser() return the two halves so you can keep one and
replace the other.
The default parser targets syslog and ISO-style lines. Its header stage is two rules tried in order (first match wins):
-
Rule 1 — syslog.
Digit("year", optional=True),MonthAbbreviation(),Digit("day"),Time(),Hostname("host"),Statement(). The year is optional; when the line has none, it comes from the current year. -
Rule 2 — ISO date.
Date(),Time(),Hostname("host"),Statement()(for2024-03-05 06:07:08 host …).
p = preset.default()
p.process_line("Jun 30 11:11:11.012345+09:00 host app[7]: started ok")["timestamp"]
# datetime.datetime(2026, 6, 30, 11, 11, 11, 12345, tzinfo=+09:00)
p.process_line("2024 Mar 5 06:07:08 host msg")["timestamp"] # 2024-03-05 ... (rule 1, explicit year)
p.process_line("2024-03-05 06:07:08 host msg")["timestamp"] # 2024-03-05 ... (rule 2)
p.process_line("Mar 5 06:07:08 host msg")["timestamp"] # current-year-03-05 (rule 1)Its statement stage is a four-step pipeline (preset.default_statement_parser):
-
Spliton the standard symbols"()[]{}|+',=><;\#and space — but **not**:`. -
FixIP()— protect IPv4/IPv6 addresses (including network addresses). -
Fix([pattern_time, pattern_macaddr])— protect clock times and MAC addresses. -
Split(":")— split the remaining:.
Deferring the : split to the end is exactly the ordering lesson from
Statement Rules: IPv6 addresses, 12:34:56 times and MAC
addresses are fixed first, so the final : split cannot break them. You can
watch this pipeline rewrite a line step by step with process_line(..., verbose=True) — see Watching the pipeline in Statement Rules.
Handles both the 2.2 and 2.4 error_log layouts. The module name is matched
generically (not just core), and exposed as modulename.
ap = preset.apache_errorlog_parser()
ap.process_line("[Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] denied")
# severityname='error', host='127.0.0.1', message='denied' (2.2: no module/pid)
ap.process_line("[Fri Sep 09 10:42:29.902022 2011] [core:error] "
"[pid 35:tid 9] [client 1.2.3.4] no file")
# modulename='core', severityname='error', processid=35, threadid=9,
# host='1.2.3.4', message='no file'
ap.process_line("[Mon Dec 05 08:10:12.123456 2016] [authz_core:error] "
"[pid 1:tid 2] AH01630 denied")
# modulename='authz_core', message='AH01630 denied' (no client -> host omitted)The [client <ip>] field is optional; when it is absent the host key is simply
omitted (an absent optional item leaves no key — see Header Rules).
The repository's example/loghub_*/parser.py are fuller, real-world parsers for
the loghub datasets (Apache, Linux, Mac,
Thunderbird, Windows, …). They are the best reference for the patterns covered in
Practical Patterns: separator vs full_format, anchoring
free-form fields, and modelling tag-less lines with a second rule.
- Building a Parser — reuse a preset, or replace one stage.
- Header Rules / Statement Rules — the parts.
- Practical Patterns — writing your own parser well.