Releases: copyleftdev/anomalyx
v1.1.2 — docs/examples; crates.io README wiring
No library or contract changes — the tq1 envelope, exit codes, and config_version are byte-identical to 1.1.1. A docs/examples release.
Examples
examples/synergy_market.py— anomalyx paired withagent-calc: anomalyx finds the anomalous days and the regime shift, the exact-math kernel proves what they mean (Gaussian tail probability, a two-sample t-test across the CUSUM break, exact correlations).examples/polymarket_anomalies.py— information shocks and odds regime shifts in a Polymarket prediction market.
Documentation
- README lists all four worked examples (stock, journal, polymarket, synergy).
- New mdbook page "Worked examples".
- The
anomalyxcrate now setsreadme = "../../README.md", so the crates.io page renders the project README (it had none before).
cargo install anomalyx · published to crates.io in dependency order.
v1.1.1 — journal example; timestamp columns skipped
Fixed
- Timestamp columns are recognized as sequences and skipped by the value detectors.
Role::Sequencerequired strict monotonicity, but real clock columns (journald__REALTIME_TIMESTAMP/__MONOTONIC_TIMESTAMP, a pcaptimestamp) tie/regress, so they were treated as measurements —coll.cusumflagged their "level shift" (time advancing) andpointtheir jumps. Atimestamp/tsname token now classifies a column assequence, kept narrow soresponse_time-style measurements are unaffected.cad.regularitystays role-agnostic, so--cadence timestampstill works. Noconfig_versionchange.
Examples
examples/journal_anomalies.py— find anomalies in the systemd journal (point/structural/collective within a window, e.g. per-unit CPU-usage spikes; or_SYSTEMD_UNIT/PRIORITYdistributional drift between windows via--baseline-since). Pipes journald JSON on stdin and maps findings back to timestamp/unit/message.examples/stock_anomalies.py— (from the previous drop) Yahoo-Finance stock anomalies / drift.
Gate
cargo-mutants 0 missed on roles.rs; CI green.
Install: cargo install anomalyx
Full changelog: v1.1.0...v1.1.1
v1.1.0 — column roles across all detectors
Changed
- Column roles now gate every value-distribution detector, not just
point.ctx.seasonal,coll.cusum,dist.ks/dist.psi/dist.chi2, andmv.mahalanobisnow skipidentifierandsequencecolumns (and exclude them from the Mahalanobis feature space). A seasonal subseries, level shift, drift test, or joint distance over arbitrary ids or a monotonic ramp is noise — closing the gap wherecoll.cusumstill flagged a syslogprocidshift.struct.schemastays role-agnostic;cad.regularityuses only the explicit--cadencecolumn. A sharedRole::skips_value_detection()centralizes the rule. - Output changes when
column_roles = true, so theconfig_versionfingerprint is bumped (anomalyx-cfg/9). Envelope shape andPROTOCOLunchanged;--no-column-rolesrestores pre-roles behavior everywhere.
Testing
- Scoped the parser-robustness fuzz harness's magic-prefixed test to formats whose decode allocation anomalyx bounds (
sqlite). Binary container decoders (parquet/arrow/avro/orc/evtx/pcap) trust internal length fields and can over-allocate on adversarial input — a dependency property, now documented rather than asserted (it had surfaced as an intermittent OOM).
Gate
proptest + cargo-mutants 0 missed across all six changed files; goldens re-blessed; CI green.
Install: cargo install anomalyx
Full changelog: v1.0.1...v1.1.0
v1.0.1 — real /var/log/syslog parses
Two dogfood-found fixes from running 1.0.0 on a real /var/log/syslog.
Fixed
- Syslog: the PRI-less file format now parses. rsyslog/syslog-ng write the file without the
<PRI>wire header (ISO-8601 or BSD timestamp, then host + tag). The sniff required<PRI>, so a real/var/log/syslogwas misdetected asiniand collapsed to one garbage row. Now recognized (timestamp + host + app);facility/severityappear only when a<PRI>is present. 50k real lines:ini/1 row →syslog/50k rows. procidis recognized as an identifier. The syslog process-id column was classed ameasurement, so PIDs were flagged as point outliers (~18.5k noise on the 50k syslog). It now joins the identifier name set → skipped → 1 finding.
Contract
Unchanged — tq1 / PROTOCOL / envelope shape stable (1.0.1 patch).
Gate
proptest + cargo-mutants 0 missed on both touched files.
Install: cargo install anomalyx
Full changelog: v1.0.0...v1.0.1
v1.0.0 — stable: the tq1 contract is committed
anomalyx is 1.0. Contract-first anomaly detection over arbitrary corpora — the executable is the contract, and that contract is now stable.
No code changes from 0.9.0; this commits the wire contract.
Stable as of 1.0
Breaking any of these requires a major bump + a PROTOCOL change — never quiet under 1.x:
- protocol id
anomalyx/tq1; exit codes0/1/2; - the dense finding-row layout
[detector, class, handle, confidence, severity, score, reason]+ dict string table; - handle forms
column:/cell:/row:/range:/dist:; - required envelope fields and the severity ladder.
What 1.0 includes
- ~30 input formats across tabular, columnar/data-lake/db, logs & observability, security telemetry, and network — one engine-independent record model.
- 9 detectors / 7-class taxonomy, all deterministic.
- FDR control (
--fdr), output scoping (--top/--min-severity, honest truncation), column roles (identifier/sequence skipping), unified confidence calibration (comparable severity across detectors), and--setto tune any config field — all fingerprinted intoconfig_version. - Determinism: same input + same
config_version⇒ byte-identical output, validated against NIST StRD and at 1,000,000 rows. - Gates: property tests +
cargo-mutants0-survivors, a fuzz-style parser-robustness harness, and golden-envelope snapshot tests guarding the contract.
Install: cargo install anomalyx
Full changelog: v0.9.0...v1.0.0
v0.9.0 — tunable detector config (--set)
Tune any detector threshold from the CLI — reproducibly.
Added
scan/explaingain--set KEY=VALUE(repeatable) — override anyDetectConfigfield by name (--set point_threshold=4.0,--set dist_alpha=0.01,--set column_roles=false, …). The settable keys and their defaults are exactly whatdescribe'sconfigobject reports. An unknown key, or a value that doesn't fit the field's type, is a hard error (exit2).
Overrides flow into config_version (e.g. pt=3.5000 → pt=2.0000), so a tuned run is exactly as reproducible and self-describing as a default one — tuning is never silent. The common knobs keep their dedicated flags (--fdr, --cad-max-cv, --period, --cadence); --set is the general escape hatch for the rest. Implemented as a JSON round-trip over the serialized config, so every field is settable with no per-field code. No envelope/PROTOCOL change.
Gate
proptest + cargo-mutants 0 missed on main.rs.
Install: cargo install anomalyx
Full changelog: v0.8.0...v0.9.0
v0.8.0 — unified confidence calibration
A 0.9 now means the same thing on every detector — so severity ranks across detectors.
Changed
- Unified confidence calibration. Confidence was computed three incompatible ways (
1−pfor distributional/multivariate; logistic-over-threshold for point/contextual/collective/PSI; linear for cadence). Now every detector routes through one sharedax_detect::calibrate: confidence is a logistic of how far the detector's statistic sits past its firing threshold, measured relatively so units cancel —0.5at the threshold, rising toward1.0. A finding "2× past threshold" earns the same confidence whether it came from a modified z-score, a KS p-value, a PSI, or a cadence CV. - This makes
severity(and--top/--min-severity) rank findings from different detectors on one scale — and replaces the old1−pthat saturated everything to "critical" with a real gradient.
Impact / contract
Recalibrates every published confidence and severity. config_version bumped (anomalyx-cfg/8) so the change is visible. Envelope shape and PROTOCOL unchanged. score remains the detector's raw statistic for drill-down.
Also
- Parser robustness harness (fuzz-style property tests: no parser panics/hangs on arbitrary/magic-prefixed/truncated bytes; normalize is deterministic over fuzz inputs).
Gate
proptest + cargo-mutants 0 missed on calibrate.rs (32/32) and every touched detector file.
Install: cargo install anomalyx
Full changelog: v0.7.0...v0.8.0
v0.7.0 — column roles
Attack noise at the detection layer — by knowing what each column is.
Added
- Column roles. Every scanned column is classified into a role —
measurement/identifier/categorical/sequence/constant— and the full map ships in the envelope's newrolesarray. The point detector skipsidentifierandsequencecolumns (a large process-id or a counter's endpoint is not an anomaly). --no-column-rolesdisables role-based skipping (roles still reported). Part of theconfig_versionfingerprint (cr=).
Impact
On a real 20k journald capture, point findings drop from ~12,500 to ~240 (skipping _PID/_UID/_GID/JOB_ID/TID/timestamps/__SEQNUM). On a 127k-row parquet, the heavily-skewed DAYS_LOST stays measurement — all 32,893 genuine findings preserved. Noise eliminated, signal untouched.
Design — honoring "never guess"
Identifiers are recognized by name (*_id, uid, gid, pid, tid, session, uuid, …) — the only reliable signal, since a process-id column is statistically indistinguishable from a discrete measurement. Cardinality is deliberately not used to call a numeric column categorical (a near-constant column with a few outliers has low cardinality yet is exactly what point detection should catch). Heuristic, but never silent: every role is in the envelope, and the skipping is one flag from off.
Contract
Additive: new ax_core::roles module + roles array in the envelope and schema. PROTOCOL unchanged.
Gate
proptest + cargo-mutants 0 missed across all changed files.
Install: cargo install anomalyx
Full changelog: v0.6.0...v0.7.0
v0.6.0 — output scoping (--top / --min-severity)
Make the findings list consumable — without ever hiding what was found.
Added
scan --top N— emit only the N most severe findings.scan --min-severity S— emit only findings ≥S(info/low/medium/high/critical).
On a real 127k-row parquet, --top 25 shrinks the envelope from ~3 MB to ~5.6 KB.
Honest truncation
summary (total, by_class, max_severity) and the exit code always describe everything detected, never the scoped view — so filtering can't make anomalies look absent or flip exit 1→0. When findings are withheld, the envelope gains a scope block recording the filter and detected / emitted / dropped counts; rows carries only the emitted subset. Absent when no scoping is applied (default output unchanged). A test pins that scoping to zero rows still exits 1.
The pairing
This is the volume complement to 0.5.0's --fdr (correctness): FDR makes findings statistically defensible; output scoping makes the list consumable. Compose them: --fdr 0.01 --min-severity high --top 25 = "the 25 most severe high+ findings, among the FDR-significant set."
Contract
Additive — new optional scope field + schema update; PROTOCOL unchanged. summary.total now reports the detected count (== emitted when unscoped).
Gate
proptest + cargo-mutants 0 missed on envelope.rs and main.rs+schema.rs.
Install: cargo install anomalyx
Full changelog: v0.5.0...v0.6.0
v0.5.0 — false-discovery-rate control (--fdr)
A principled significance basis for the point detector — a controlled error rate instead of a magic threshold.
Added
scan/explaingain--fdr Q— per-column Benjamini–Hochberg false-discovery-rate control forpoint.modz. Each cell's modified z-score becomes a two-sided p-value, and the fixedpoint_thresholdis replaced by a multiplicity-aware step-up cutoff bounding the expected proportion of false flags atQ(e.g.--fdr 0.05). Opt-in — omitted, behavior is unchanged. The level is part of theconfig_versionfingerprint (pfdr=).- New
ax_detect::fdrmodule:two_sided_p(normal tail viaerfc) +benjamini_hochberg(deterministic step-up).
Honest scope — correctness, not volume
FDR replaces an arbitrary cutoff with a principled error-rate guarantee and adapts to how many cells were tested (a noise column stops contributing chance flags; the same outlier can be significant in a small column yet not a large one). On genuinely heavy-tailed data it can flag more cells, not fewer — those cells really are significant at Q; the old fixed cutoff was stringent in an uncalibrated way. On the real 127k-row MSHA parquet: 32,893 → 40,079 point findings at q=0.05. Capping output volume is a separate lever (column scoping today; severity / top-N next) — they compose: "top-N by score, among the FDR-significant set."
Calibration
The p-value uses the consistent-σ standardized deviation (x − center)/scale (≈ N(0,1)), not the display-scaled modified z-score.
Gate
proptest + cargo-mutants 0 missed across all four changed files (BH step-up, the (x−center) sign, and multiplicity adaptation are all pinned).
Install: cargo install anomalyx
Full changelog: v0.4.1...v0.5.0