Skip to content

v0.4.4

Choose a tag to compare

@air-gapped-release-bot air-gapped-release-bot released this 09 Jun 22:17

Highlights

Smarter Grouping

The grouping engine compared lines byte-by-byte by position, so a single token inserted at the front of an otherwise identical line — a node name, a severity prefix — cascaded into "no match" and split what should have been one group. Grouping is now token-based (longest common subsequence over the normalized line), so insertions no longer break folds. On a real 70k-line kubelet log the same input compresses to 380 lines instead of 941, with distinct messages preserved — for example, 32 near-identical "Operation for volume X failed" groups that differed only in volume IDs collapse into 10 honest ones. --threshold keeps its meaning: the percentage of tokens two lines must share. One consequence: distinct-but-similar messages that clear the bar now genuinely fold — raise --threshold (e.g. 85) if you want per-message granularity in dense groups.

Pipeline-Clean stdout

lessence app.log | grep ERROR used to receive the statistics footer mixed into the log output. The footer now goes to stderr — stdout carries only log lines, no -q required for piping. In the same spirit: a misspelled input file now exits 1 (like cat and grep) instead of silently succeeding, so scripts notice.

Fewer False Detections

Five detectors learned to leave ordinary text alone:

  • the word "request" in plain prose is no longer rewritten as request_id=<UUID>
  • parenthesized counts like (3) or (137) are no longer rewritten as PIDs — a process name must be attached, sshd(1234)-style
  • epoch timestamps and hex-looking words ("defaced") are no longer eaten as hashes
  • dotted code identifiers like hibernate.SQL or scope.go are no longer detected as hostnames — and real hostnames now get their own FQDN category instead of masquerading as IPv4 addresses
  • --disable-patterns brackets/json/key-value now actually disables every matching detector (two ran unconditionally before)

Accurate Statistics

--stats-json and the JSON summary lumped ports into "ips", JSON tokens into "paths", and six unrelated categories into "percentages". Every pattern category now has its own counter — the numbers finally mean what they say.

Bounded Work on Hostile Lines

A single long line of repeated key=value tokens triggered quadratic work in the key-value detector — a crafted 1 MB line could stall for minutes. The work is now linear: a 200 KB reproduction drops from 0.30s to 0.02s.

Ten of these eleven fixes came from a single fresh-eyes audit of the codebase by Claude Fable 5; the key=value stall had been flagged earlier by an automated threat-model/vuln-scan pass and was fixed in the same sweep.


0.4.4 (2026-06-09)

Bug Fixes

  • --disable-patterns brackets/json/key-value now disables all matching detectors (0b87792)
  • --stats-json and JSON summary report accurate per-category pattern counts (23aea53)
  • dotted code identifiers like hibernate.SQL are no longer detected as hostnames (13e530a)
  • epoch timestamps and hex-looking words are no longer detected as hashes (a585a63)
  • exit with code 1 when an input file cannot be opened (b10701f)
  • log lines containing the word "request" are no longer rewritten as request IDs (1446f48)
  • parenthesized counts like "(3)" are no longer rewritten as PIDs (a67adf6)
  • similarity grouping now tolerates inserted tokens instead of splitting groups (349493a)
  • statistics footer now goes to stderr, keeping stdout clean for pipelines (7893fcd)

Performance

  • flush remaining groups in O(n) instead of O(n^2) (2fcb135)
  • long key=value lines no longer stall the key-value detector (1485e11)

Full changelog: v0.4.3...v0.4.4