Skip to content

v0.7.0 — column roles

Choose a tag to compare

@copyleftdev copyleftdev released this 01 Jun 17:36
· 13 commits to main since this release
80eaeab

Attack noise at the detection layer — by knowing what each column is.

Added

  • Column roles. Every scanned column is classified into a role — measurement / identifier / categorical / sequence / constant — and the full map ships in the envelope's new roles array. The point detector skips identifier and sequence columns (a large process-id or a counter's endpoint is not an anomaly).
  • --no-column-roles disables role-based skipping (roles still reported). Part of the config_version fingerprint (cr=).

Impact

On a real 20k journald capture, point findings drop from ~12,500 to ~240 (skipping _PID/_UID/_GID/JOB_ID/TID/timestamps/__SEQNUM). On a 127k-row parquet, the heavily-skewed DAYS_LOST stays measurement — all 32,893 genuine findings preserved. Noise eliminated, signal untouched.

Design — honoring "never guess"

Identifiers are recognized by name (*_id, uid, gid, pid, tid, session, uuid, …) — the only reliable signal, since a process-id column is statistically indistinguishable from a discrete measurement. Cardinality is deliberately not used to call a numeric column categorical (a near-constant column with a few outliers has low cardinality yet is exactly what point detection should catch). Heuristic, but never silent: every role is in the envelope, and the skipping is one flag from off.

Contract

Additive: new ax_core::roles module + roles array in the envelope and schema. PROTOCOL unchanged.

Gate

proptest + cargo-mutants 0 missed across all changed files.

Install: cargo install anomalyx

Full changelog: v0.6.0...v0.7.0