Header Rules

The header stage extracts the structured front matter of a line — at least a message, and usually a timestamp and host. You describe it with a HeaderParser: an ordered list of Items that log2seq compiles into one regular expression.

from log2seq.header import HeaderParser, MonthAbbreviation, Digit, Time, Hostname, Statement

rule = [MonthAbbreviation(), Digit("day"), Time(), Hostname("host"), Statement()]
hp = HeaderParser(rule, separator=" ", defaults={"year": 2024})
hp.process_line("Mar  5 06:07:08 db1 disk full")
# {'host': 'db1', 'message': 'disk full', 'timestamp': datetime.datetime(2024, 3, 5, 6, 7, 8)}

Exactly one Statement() is mandatory in every rule; it captures the body under message.
Each item's value lands in the result under its value name (see the catalog). Timestamp-related items are reassembled into a single timestamp.
Missing timestamp fields (a syslog line has no year) come from defaults.

Placing items: `separator` vs `full_format`

A HeaderParser needs to know where one item ends and the next begins. Two ways:

separator (simple, recommended) — a set of characters that separate items. separator=" :[]" means runs of space/:/[/] divide the fields. Good when the layout is just whitespace/punctuation-delimited.
full_format — a template where <i> is replaced by item i's pattern and everything else is literal. Use it to pin fixed delimiters so they are not read as content. Runs of spaces become \s+; wrap optional items by hand with (...)?.

rule = [MonthAbbreviation(), Digit("day"), Time(), Hostname("host"),
        UserItem("comp", r".+?"), Digit("pid"), Statement()]
hp = HeaderParser(rule, full_format=r"<0> <1> <2> <3> <4>\[<5>\]: <6>",
                  defaults={"year": 2024})
hp.process_line("Mar  5 06:07:08 db1 sshd[42]: accepted")
# {'host': 'db1', 'comp': 'sshd', 'pid': 42, 'message': 'accepted',
#  'timestamp': datetime.datetime(2024, 3, 5, 6, 7, 8)}

Here the [, ] and : are literal in the template, so comp (.+?) and pid are cleanly delimited. With a plain separator that included []:, those brackets would be consumed as separators. Choosing between the two is the subject of Practical Patterns.

Several formats in one parser (first match wins)

A LogParser can hold a list of HeaderParsers, tried from the front; the first that matches is used. Put the more specific rule first.

from log2seq import LogParser
from log2seq.statement import StatementParser, Split
from log2seq.header import Date

iso = HeaderParser([Date(), Time(), Hostname("host"), Statement()], separator=" ")
parser = LogParser([hp, iso], StatementParser([Split(" ")]))   # hp from above, then iso

parser.process_line("Mar  5 06:07:08 db1 sshd[42]: accepted")["message"]   # 'accepted'  (rule 1)
parser.process_line("2024-03-05 06:07:08 db1 plain message")["timestamp"]  # 2024-03-05 06:07:08 (rule 2)

If no rule matches, process_line raises LogParseFailure (unless the LogParser was built with ignore_failure=True).

The Item catalog

Items split into timestamp components (reassembled into timestamp) and plain fields.

Timestamp components

Item	matches	value (name)
`Date()`	`2024-03-05`	`datetime.date` (`date`)
`Time()`	`06:07:08`, `06:07:08.012345+09:00`	`datetime.time` (`time`)
`DatetimeISOFormat()`	`2024-03-05T06:07:08+09:00`	`datetime.datetime` (`timestamp`)
`MonthAbbreviation()`	`Jan`…`Dec`	month int (`month`)
`Digit("year"/"month"/"day"/"hour"/…)`	digits	int, under the given name
`YearWithoutCentury(century=20)`	`24`	year int (`year`) — `century*100 + nn`
`DateConcat(no_century=False, century=20)`	`20240305` / `240305`	`datetime.date` (`date`)
`TimeConcat()`	`060708`	`datetime.time` (`time`)
`DemicalSecond()`	fractional digits	microseconds int (`microsecond`)
`UnixTime(tz=timezone.utc)`	`1551024123`	`datetime.datetime` (`timestamp`)
`TimeZone()`	`Z`, `+0900`, `+09:00`	`datetime.tzinfo` (`tzinfo`)

from log2seq import header as h
def val(item, s): return item.pick_value(item.test(s))

val(h.MonthAbbreviation(), "Mar")              # 3
val(h.YearWithoutCentury(), "24")              # 2024   (default century 20)
val(h.YearWithoutCentury(century=19), "98")    # 1998
val(h.UnixTime(), "1551024123")                # datetime(2019, 2, 24, 16, 2, 3, tzinfo=utc)

Notes on determinism (see also Practical Patterns):

YearWithoutCentury / DateConcat(no_century=True) complete the century from the century argument (default 20 = 2000-2099), not from the wall clock.
UnixTime resolves the epoch in UTC by default; pass tz= for another zone.

The reassembly works on value names: an item named year/month/day/hour/ minute/second/microsecond/tzinfo feeds the timestamp; or use the aggregate items (Date → date, Time → time, DatetimeISOFormat/UnixTime → the whole timestamp). Supply any missing piece through defaults.

Plain fields

Item	matches	notes
`Hostname("host")`	hostnames / IPv4 / IPv6	e.g. `2001:db8::1` ✓, but a token with a space ✗
`String("name", symbols="_-")`	`[A-Za-z0-9]+` plus any `symbols`	letters/digits + the extra chars
`Digit("name")`	`\d+`	returns an int
`UserItem("name", r"…", strip=None)`	your own regex	the most flexible item; `strip` trims the value
`ItemGroup([...], separator=…, optional=…)`	a sub-group	a cluster sharing a local separator, optionally absent
`Statement()`	the rest (`.*`)	the message body — exactly one per rule

val(h.String("s", symbols="_-"), "a_b-c")          # 'a_b-c'
val(h.UserItem("u", r".+", strip=" "), " x ")      # 'x'
bool(h.Hostname("h").test("2001:db8::1"))           # True

Item flags

optional=True — the item may be absent. An absent optional item is omitted from the result (the key is not added), so "pid" in d tells you whether it matched. (Digit("pid", optional=True).)
dummy=True — match but extract nothing. Use it for a fixed marker, or to avoid a duplicate value name when the same field appears twice.
strip="…" (UserItem only) — str.strip() the extracted value.

UserItem patterns must not contain ^, $, or optional groups (?); make a group optional through ItemGroup(optional=True) or the (...)? wrapper in full_format instead.

Writing a custom Item

Subclass Item (or NamedItem for a named one), give it a pattern, and — if the matched text needs converting — a pick_value:

from log2seq.header import NamedItem

class HexId(NamedItem):
    @property
    def pattern(self):
        return r"0x[0-9a-fA-F]+"

    def pick_value(self, mo):
        return int(mo[self.match_name], 16)   # mo[self.match_name] is the matched text

HexId("id").pick_value(HexId("id").test("0x1f"))   # 31

The contract: pattern returns the item's regex (no capture group — log2seq adds the named group itself); pick_value(mo) reads mo[self.match_name] and returns the final value (any type), or mo[self.match_name] unchanged if you don't override it. For a named item, the name doubles as both the regex group name and the result-dict key. Use Item.test(s) to probe one item in isolation (it compiles a throwaway anchored pattern — for debugging only).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Header Rules

Header Rules

Placing items: `separator` vs `full_format`

Several formats in one parser (first match wins)

The Item catalog

Timestamp components

Plain fields

Item flags

Writing a custom Item

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Uh oh!

Header Rules

Header Rules

Placing items: separator vs full_format

Several formats in one parser (first match wins)

The Item catalog

Timestamp components

Plain fields

Item flags

Writing a custom Item

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Placing items: `separator` vs `full_format`