Skip to content

Statement Rules

sat edited this page Jun 26, 2026 · 2 revisions

Statement Rules

The statement stage takes the message body and tokenizes it into words and the symbols that separate them. You describe it with a StatementParser: an ordered list of Actions applied in sequence.

from log2seq.statement import StatementParser, Split, FixIP

sp = StatementParser([Split(" "), FixIP(), Split(":")])
words, symbols = sp.process_line("ping 2001:db8::1 from 10.0.0.5:80")

words     # ['ping', '2001:db8::1', 'from', '10.0.0.5', '80']
symbols   # ['', ' ', ' ', ' ', ':', '']

symbols is always one longer than words (len(symbols) == len(words) + 1): there is a separator before the first word and after the last. Either end may be the empty string.

The (part, flag) model

Internally the statement is carried as a list of (substring, flag) tuples, and each Action rewrites that list. A flag is one of three states:

  • UNKNOWN — not yet decided; still a candidate for later actions to split or protect.
  • FIXED — confirmed as a single word; later actions leave it alone.
  • SEPARATORS — a delimiter; it becomes a symbol, never a word.

Every Action.do(parts) -> parts consumes the list and returns a new one, re-flagging substrings as it goes. Because actions only ever transform this list, order is significant — and that is the main lever you have.

# FixIP runs BEFORE the ":" split, so the addresses are FIXED first and the
# later split leaves them whole:
StatementParser([Split(" "), FixIP(), Split(":")]).process_line("a 10.0.0.5:80")[0]
# ['a', '10.0.0.5', '80']

# Without FixIP the address would be torn apart by the ":" split.

This is why the default parser defers its : split to the very end (after fixing IPs, clock times and MAC addresses) — see Presets.

The Action catalog

Action what it does
Split("…") split UNKNOWN parts on any of the given separator characters
Fix(pattern | [patterns]) mark substrings matching a pattern as FIXED (one word)
FixIP() mark IPv4 / IPv6 addresses as FIXED
FixParenthesis([open, close]) mark a bracketed/quoted span (e.g. ["\"", "\""]) as FIXED
FixPartial(pattern, fix_groups=[…]) within a match, FIX the named groups and split the rest
Remove(pattern | [patterns]) mark matches as SEPARATORS (dropped from words)
RemovePartial(pattern, remove_groups=[…]) within a match, drop the named groups, keep the rest
ConditionalSplit(pattern, separators) split a part only if it matches pattern, by separators
def run(rules, s):
    from log2seq.statement import StatementParser
    return StatementParser(rules).process_line(s)[0]

run([Split(" "), Fix([r"\d+\.\d+"]), Split(".")], "v 1.2 build a.b")
# ['v', '1.2', 'build', 'a', 'b']         # 1.2 protected, a.b split

run([Split(" "), Remove(r"->")], "a -> b")
# ['a', 'b']                              # "->" becomes a separator

run([FixParenthesis(['"', '"']), Split(' ')], 'say "hello world" now')
# ['say', 'hello world', 'now']           # the quoted span stays whole

run([Split(" "),
     FixPartial(r'^(?P<ip>(\d{1,3}\.){3}\d{1,3})\.(?P<port>\d+)$', fix_groups=["ip", "port"]),
     Split(".")],
    "src 192.0.2.1.8080 ok")
# ['src', '192.0.2.1', '8080', 'ok']      # ip and port kept, the "." between them split

run([Split(" "), ConditionalSplit(r'^\w+=\w+$', '=')], "user=bob says hello")
# ['user', 'bob', 'says', 'hello']        # only the part matching key=value is split

Split and Remove accept a single pattern or a list. The Partial actions use named groups to address sub-spans: FixPartial keeps the listed groups whole, RemovePartial drops them. ConditionalSplit is for tokens that need their own splitting only when they match a shape (e.g. a Cisco-style %KERNEL-4-EVENT-7 mnemonic) while leaving everything else alone.

Writing a custom Action

An Action is any object with do(parts) -> parts, where parts is an iterable of (substring, flag) tuples and the result is the same shape. Skip parts that are not UNKNOWN, and re-flag the rest:

from log2seq.statement import _ActionBase, _FLAG_SEPARATORS

class DropExactly(_ActionBase):
    """Mark parts equal to a given token as separators (so they leave words)."""
    def __init__(self, token):
        self._token = token

    def do(self, iterable_parts):
        for s, flag in iterable_parts:
            if self._is_active_part(s, flag) and s == self._token:
                yield s, _FLAG_SEPARATORS
            else:
                yield s, flag

run([__import__("log2seq").statement.Split(" "), DropExactly("--")], "a -- b")
# ['a', 'b']

In practice the built-in catalog plus careful ordering covers most needs; subclass _ActionBase (and reuse _is_active_part and the _FLAG_* constants) only when a token needs handling the catalog can't express. The internals are documented in Architecture Overview.

See also

Clone this wiki locally