v0.7.0 — column roles
Attack noise at the detection layer — by knowing what each column is.
Added
- Column roles. Every scanned column is classified into a role —
measurement/identifier/categorical/sequence/constant— and the full map ships in the envelope's newrolesarray. The point detector skipsidentifierandsequencecolumns (a large process-id or a counter's endpoint is not an anomaly). --no-column-rolesdisables role-based skipping (roles still reported). Part of theconfig_versionfingerprint (cr=).
Impact
On a real 20k journald capture, point findings drop from ~12,500 to ~240 (skipping _PID/_UID/_GID/JOB_ID/TID/timestamps/__SEQNUM). On a 127k-row parquet, the heavily-skewed DAYS_LOST stays measurement — all 32,893 genuine findings preserved. Noise eliminated, signal untouched.
Design — honoring "never guess"
Identifiers are recognized by name (*_id, uid, gid, pid, tid, session, uuid, …) — the only reliable signal, since a process-id column is statistically indistinguishable from a discrete measurement. Cardinality is deliberately not used to call a numeric column categorical (a near-constant column with a few outliers has low cardinality yet is exactly what point detection should catch). Heuristic, but never silent: every role is in the envelope, and the skipping is one flag from off.
Contract
Additive: new ax_core::roles module + roles array in the envelope and schema. PROTOCOL unchanged.
Gate
proptest + cargo-mutants 0 missed across all changed files.
Install: cargo install anomalyx
Full changelog: v0.6.0...v0.7.0