-
Notifications
You must be signed in to change notification settings - Fork 0
Building a Parser
The whole job of log2seq is to turn a raw log line into structured fields. This page is the spine of the guide: it shows the shape of a parser, walks through assembling one from parts, and lists the three ways to drive it. The catalogs of parts live in Header Rules and Statement Rules; the internals are in Architecture Overview.
A LogParser runs every line through two stages:
-
Header — split off the front matter (timestamp, host, …) and leave the
free-format body as the
message. -
Statement — tokenize that body into
wordsand thesymbolsbetween them.
import log2seq
mes = ("Jan 1 12:34:56 host-device1 system[12345]: "
"host 2001:0db8:1234::1 (interface:eth0) disconnected")
parser = log2seq.init_parser() # the default parser
d = parser.process_line(mes)d is a plain dict:
{
'timestamp': datetime.datetime(2026, 1, 1, 12, 34, 56),
'host': 'host-device1',
'message': 'system[12345]: host 2001:0db8:1234::1 (interface:eth0) disconnected',
'words': ['system', '12345', 'host', '2001:0db8:1234::1',
'interface', 'eth0', 'disconnected'],
'symbols': ['', '[', ']: ', ' ', ' (', ':', ') ', ''],
}A few facts worth knowing up front (see Python API for the full contract):
- The header keys are not fixed:
hostis present because a default rule names an itemhost. Each item's value lands under its own name, so the available header keys depend on the rule. - A syslog line carries no year, so the default parser fills the current
year. Provide one explicitly (a
<year>item, ordefaults={"year": ...}) when you need a fixed value. -
symbolsis always one longer thanwords(len(symbols) == len(words) + 1): there is a separator before the first word and after the last; either end may be empty.
A header rule is a list of Items; a statement parser is a list of
Actions. You build each stage, then bind them with LogParser.
from log2seq import LogParser
from log2seq.header import (HeaderParser, MonthAbbreviation, Digit, Time,
Hostname, UserItem, Statement)
from log2seq.statement import StatementParser, Split, FixIP
# Stage 1: a header rule, placed with full_format (fixed "[pid]: " delimiter)
header_rule = [
MonthAbbreviation(), Digit("day"), Time(), Hostname("host"),
UserItem("program", r"[a-zA-Z0-9._-]+"), Digit("pid", optional=True),
Statement(),
]
hp = HeaderParser(header_rule,
full_format=r"<0> <1> <2> <3> <4>(\[<5>\])?: <6>",
defaults={"year": 2024})
# Stage 2: split on spaces, keep IP addresses whole, then split on ":"
sp = StatementParser([Split(" "), FixIP(), Split(":")])
parser = LogParser(hp, sp)
d = parser.process_line("Aug 9 11:22:33 web01 nginx[4521]: connect from 10.0.0.5:443 ok")d['timestamp'] # datetime.datetime(2024, 8, 9, 11, 22, 33)
d['host'] # 'web01'
d['program'] # 'nginx'
d['pid'] # 4521
d['words'] # ['connect', 'from', '10.0.0.5', '443', 'ok']
d['symbols'] # ['', ' ', ' ', ':', ' ', '']Two ideas in that example carry most of the power of log2seq:
-
Placement.
full_formatpins the literal[,]and:so they are not mistaken for content; the alternative,separator=..., is simpler when fields are just whitespace-delimited. See Header Rules. -
Order matters in the statement stage.
FixIP()runs before the":"split, so10.0.0.5is marked as one word and the later split leaves it alone (without it, the address would break into10,0,0,5). See Statement Rules.
An optional item that does not match is simply absent from the result (the
key is omitted, not set to None), so "pid" in d tells you whether a pid was
present.
-
In code — build a
LogParseryourself, or start frominit_parser()and replace only one stage. You can reuse a preset's statement parser while swapping the header rules, and vice versa:from log2seq import init_parser, preset parser = init_parser(header_parsers=[hp], statement_parser=preset.default_statement_parser())
-
A bundled preset — call a ready-made parser for a common format:
from log2seq import preset parser = preset.default() # syslog / ISO-style date parser = preset.apache_errorlog_parser() # Apache error_log
See Presets.
-
An external parser script + the CLI — put a
LogParserin a.pyfile as a module-level variable namedparser, then point the command line at it:# myparser.py from log2seq import LogParser from log2seq.header import * from log2seq.statement import * # ... build hp and sp ... parser = LogParser(hp, sp)
$ log2seq --parser myparser.py app.log $ python -m log2seq -p myparser.py app.log
The CLI imports the script with
load_parser_scriptand runs every line through itsparser. See Practical Patterns for using the CLI to debug a parser against sample data.
-
Header Rules — the
Itemcatalog and how to place them. -
Statement Rules — the
Actioncatalog and the(part, flag)model that makes ordering matter. - Presets — ready-made parsers, also readable as worked examples.
- Practical Patterns — authoring real parsers and debugging them with the CLI.