Release v0.4.4 · air-gapped/lessence

Highlights

Smarter Grouping

The grouping engine compared lines byte-by-byte by position, so a single token inserted at the front of an otherwise identical line — a node name, a severity prefix — cascaded into "no match" and split what should have been one group. Grouping is now token-based (longest common subsequence over the normalized line), so insertions no longer break folds. On a real 70k-line kubelet log the same input compresses to 380 lines instead of 941, with distinct messages preserved — for example, 32 near-identical "Operation for volume X failed" groups that differed only in volume IDs collapse into 10 honest ones. --threshold keeps its meaning: the percentage of tokens two lines must share. One consequence: distinct-but-similar messages that clear the bar now genuinely fold — raise --threshold (e.g. 85) if you want per-message granularity in dense groups.

Pipeline-Clean stdout

lessence app.log | grep ERROR used to receive the statistics footer mixed into the log output. The footer now goes to stderr — stdout carries only log lines, no -q required for piping. In the same spirit: a misspelled input file now exits 1 (like cat and grep) instead of silently succeeding, so scripts notice.

Fewer False Detections

Five detectors learned to leave ordinary text alone:

the word "request" in plain prose is no longer rewritten as request_id=<UUID>
parenthesized counts like (3) or (137) are no longer rewritten as PIDs — a process name must be attached, sshd(1234)-style
epoch timestamps and hex-looking words ("defaced") are no longer eaten as hashes
dotted code identifiers like hibernate.SQL or scope.go are no longer detected as hostnames — and real hostnames now get their own FQDN category instead of masquerading as IPv4 addresses
--disable-patterns brackets/json/key-value now actually disables every matching detector (two ran unconditionally before)

Accurate Statistics

--stats-json and the JSON summary lumped ports into "ips", JSON tokens into "paths", and six unrelated categories into "percentages". Every pattern category now has its own counter — the numbers finally mean what they say.

Bounded Work on Hostile Lines

A single long line of repeated key=value tokens triggered quadratic work in the key-value detector — a crafted 1 MB line could stall for minutes. The work is now linear: a 200 KB reproduction drops from 0.30s to 0.02s.

Ten of these eleven fixes came from a single fresh-eyes audit of the codebase by Claude Fable 5; the key=value stall had been flagged earlier by an automated threat-model/vuln-scan pass and was fixed in the same sweep.

0.4.4 (2026-06-09)

Bug Fixes

--disable-patterns brackets/json/key-value now disables all matching detectors (0b87792)
--stats-json and JSON summary report accurate per-category pattern counts (23aea53)
dotted code identifiers like hibernate.SQL are no longer detected as hostnames (13e530a)
epoch timestamps and hex-looking words are no longer detected as hashes (a585a63)
exit with code 1 when an input file cannot be opened (b10701f)
log lines containing the word "request" are no longer rewritten as request IDs (1446f48)
parenthesized counts like "(3)" are no longer rewritten as PIDs (a67adf6)
similarity grouping now tolerates inserted tokens instead of splitting groups (349493a)
statistics footer now goes to stderr, keeping stdout clean for pipelines (7893fcd)

Performance

flush remaining groups in O(n) instead of O(n^2) (2fcb135)
long key=value lines no longer stall the key-value detector (1485e11)

Full changelog: v0.4.3...v0.4.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.4

Choose a tag to compare

Sorry, something went wrong.