Skip to content

Releases: copyleftdev/anomalyx

v1.1.2 — docs/examples; crates.io README wiring

02 Jun 02:46
7ce61ce

Choose a tag to compare

No library or contract changes — the tq1 envelope, exit codes, and config_version are byte-identical to 1.1.1. A docs/examples release.

Examples

  • examples/synergy_market.py — anomalyx paired with agent-calc: anomalyx finds the anomalous days and the regime shift, the exact-math kernel proves what they mean (Gaussian tail probability, a two-sample t-test across the CUSUM break, exact correlations).
  • examples/polymarket_anomalies.py — information shocks and odds regime shifts in a Polymarket prediction market.

Documentation

  • README lists all four worked examples (stock, journal, polymarket, synergy).
  • New mdbook page "Worked examples".
  • The anomalyx crate now sets readme = "../../README.md", so the crates.io page renders the project README (it had none before).

cargo install anomalyx · published to crates.io in dependency order.

v1.1.1 — journal example; timestamp columns skipped

02 Jun 01:18
ff93a4f

Choose a tag to compare

Fixed

  • Timestamp columns are recognized as sequences and skipped by the value detectors. Role::Sequence required strict monotonicity, but real clock columns (journald __REALTIME_TIMESTAMP/__MONOTONIC_TIMESTAMP, a pcap timestamp) tie/regress, so they were treated as measurements — coll.cusum flagged their "level shift" (time advancing) and point their jumps. A timestamp/ts name token now classifies a column as sequence, kept narrow so response_time-style measurements are unaffected. cad.regularity stays role-agnostic, so --cadence timestamp still works. No config_version change.

Examples

  • examples/journal_anomalies.py — find anomalies in the systemd journal (point/structural/collective within a window, e.g. per-unit CPU-usage spikes; or _SYSTEMD_UNIT/PRIORITY distributional drift between windows via --baseline-since). Pipes journald JSON on stdin and maps findings back to timestamp/unit/message.
  • examples/stock_anomalies.py — (from the previous drop) Yahoo-Finance stock anomalies / drift.

Gate

cargo-mutants 0 missed on roles.rs; CI green.

Install: cargo install anomalyx

Full changelog: v1.1.0...v1.1.1

v1.1.0 — column roles across all detectors

01 Jun 23:37
b173338

Choose a tag to compare

Changed

  • Column roles now gate every value-distribution detector, not just point. ctx.seasonal, coll.cusum, dist.ks / dist.psi / dist.chi2, and mv.mahalanobis now skip identifier and sequence columns (and exclude them from the Mahalanobis feature space). A seasonal subseries, level shift, drift test, or joint distance over arbitrary ids or a monotonic ramp is noise — closing the gap where coll.cusum still flagged a syslog procid shift. struct.schema stays role-agnostic; cad.regularity uses only the explicit --cadence column. A shared Role::skips_value_detection() centralizes the rule.
  • Output changes when column_roles = true, so the config_version fingerprint is bumped (anomalyx-cfg/9). Envelope shape and PROTOCOL unchanged; --no-column-roles restores pre-roles behavior everywhere.

Testing

  • Scoped the parser-robustness fuzz harness's magic-prefixed test to formats whose decode allocation anomalyx bounds (sqlite). Binary container decoders (parquet/arrow/avro/orc/evtx/pcap) trust internal length fields and can over-allocate on adversarial input — a dependency property, now documented rather than asserted (it had surfaced as an intermittent OOM).

Gate

proptest + cargo-mutants 0 missed across all six changed files; goldens re-blessed; CI green.

Install: cargo install anomalyx

Full changelog: v1.0.1...v1.1.0

v1.0.1 — real /var/log/syslog parses

01 Jun 22:35
77e37bb

Choose a tag to compare

Two dogfood-found fixes from running 1.0.0 on a real /var/log/syslog.

Fixed

  • Syslog: the PRI-less file format now parses. rsyslog/syslog-ng write the file without the <PRI> wire header (ISO-8601 or BSD timestamp, then host + tag). The sniff required <PRI>, so a real /var/log/syslog was misdetected as ini and collapsed to one garbage row. Now recognized (timestamp + host + app); facility/severity appear only when a <PRI> is present. 50k real lines: ini/1 row → syslog/50k rows.
  • procid is recognized as an identifier. The syslog process-id column was classed a measurement, so PIDs were flagged as point outliers (~18.5k noise on the 50k syslog). It now joins the identifier name set → skipped → 1 finding.

Contract

Unchanged — tq1 / PROTOCOL / envelope shape stable (1.0.1 patch).

Gate

proptest + cargo-mutants 0 missed on both touched files.

Install: cargo install anomalyx

Full changelog: v1.0.0...v1.0.1

v1.0.0 — stable: the tq1 contract is committed

01 Jun 21:54
67821d0

Choose a tag to compare

anomalyx is 1.0. Contract-first anomaly detection over arbitrary corpora — the executable is the contract, and that contract is now stable.

No code changes from 0.9.0; this commits the wire contract.

Stable as of 1.0

Breaking any of these requires a major bump + a PROTOCOL change — never quiet under 1.x:

  • protocol id anomalyx/tq1; exit codes 0/1/2;
  • the dense finding-row layout [detector, class, handle, confidence, severity, score, reason] + dict string table;
  • handle forms column: / cell: / row: / range: / dist:;
  • required envelope fields and the severity ladder.

What 1.0 includes

  • ~30 input formats across tabular, columnar/data-lake/db, logs & observability, security telemetry, and network — one engine-independent record model.
  • 9 detectors / 7-class taxonomy, all deterministic.
  • FDR control (--fdr), output scoping (--top/--min-severity, honest truncation), column roles (identifier/sequence skipping), unified confidence calibration (comparable severity across detectors), and --set to tune any config field — all fingerprinted into config_version.
  • Determinism: same input + same config_version ⇒ byte-identical output, validated against NIST StRD and at 1,000,000 rows.
  • Gates: property tests + cargo-mutants 0-survivors, a fuzz-style parser-robustness harness, and golden-envelope snapshot tests guarding the contract.

Install: cargo install anomalyx

Full changelog: v0.9.0...v1.0.0

v0.9.0 — tunable detector config (--set)

01 Jun 19:42
d4a12f2

Choose a tag to compare

Tune any detector threshold from the CLI — reproducibly.

Added

  • scan / explain gain --set KEY=VALUE (repeatable) — override any DetectConfig field by name (--set point_threshold=4.0, --set dist_alpha=0.01, --set column_roles=false, …). The settable keys and their defaults are exactly what describe's config object reports. An unknown key, or a value that doesn't fit the field's type, is a hard error (exit 2).

Overrides flow into config_version (e.g. pt=3.5000pt=2.0000), so a tuned run is exactly as reproducible and self-describing as a default one — tuning is never silent. The common knobs keep their dedicated flags (--fdr, --cad-max-cv, --period, --cadence); --set is the general escape hatch for the rest. Implemented as a JSON round-trip over the serialized config, so every field is settable with no per-field code. No envelope/PROTOCOL change.

Gate

proptest + cargo-mutants 0 missed on main.rs.

Install: cargo install anomalyx

Full changelog: v0.8.0...v0.9.0

v0.8.0 — unified confidence calibration

01 Jun 19:03
add599d

Choose a tag to compare

A 0.9 now means the same thing on every detector — so severity ranks across detectors.

Changed

  • Unified confidence calibration. Confidence was computed three incompatible ways (1−p for distributional/multivariate; logistic-over-threshold for point/contextual/collective/PSI; linear for cadence). Now every detector routes through one shared ax_detect::calibrate: confidence is a logistic of how far the detector's statistic sits past its firing threshold, measured relatively so units cancel — 0.5 at the threshold, rising toward 1.0. A finding "2× past threshold" earns the same confidence whether it came from a modified z-score, a KS p-value, a PSI, or a cadence CV.
  • This makes severity (and --top / --min-severity) rank findings from different detectors on one scale — and replaces the old 1−p that saturated everything to "critical" with a real gradient.

Impact / contract

Recalibrates every published confidence and severity. config_version bumped (anomalyx-cfg/8) so the change is visible. Envelope shape and PROTOCOL unchanged. score remains the detector's raw statistic for drill-down.

Also

  • Parser robustness harness (fuzz-style property tests: no parser panics/hangs on arbitrary/magic-prefixed/truncated bytes; normalize is deterministic over fuzz inputs).

Gate

proptest + cargo-mutants 0 missed on calibrate.rs (32/32) and every touched detector file.

Install: cargo install anomalyx

Full changelog: v0.7.0...v0.8.0

v0.7.0 — column roles

01 Jun 17:36
80eaeab

Choose a tag to compare

Attack noise at the detection layer — by knowing what each column is.

Added

  • Column roles. Every scanned column is classified into a role — measurement / identifier / categorical / sequence / constant — and the full map ships in the envelope's new roles array. The point detector skips identifier and sequence columns (a large process-id or a counter's endpoint is not an anomaly).
  • --no-column-roles disables role-based skipping (roles still reported). Part of the config_version fingerprint (cr=).

Impact

On a real 20k journald capture, point findings drop from ~12,500 to ~240 (skipping _PID/_UID/_GID/JOB_ID/TID/timestamps/__SEQNUM). On a 127k-row parquet, the heavily-skewed DAYS_LOST stays measurement — all 32,893 genuine findings preserved. Noise eliminated, signal untouched.

Design — honoring "never guess"

Identifiers are recognized by name (*_id, uid, gid, pid, tid, session, uuid, …) — the only reliable signal, since a process-id column is statistically indistinguishable from a discrete measurement. Cardinality is deliberately not used to call a numeric column categorical (a near-constant column with a few outliers has low cardinality yet is exactly what point detection should catch). Heuristic, but never silent: every role is in the envelope, and the skipping is one flag from off.

Contract

Additive: new ax_core::roles module + roles array in the envelope and schema. PROTOCOL unchanged.

Gate

proptest + cargo-mutants 0 missed across all changed files.

Install: cargo install anomalyx

Full changelog: v0.6.0...v0.7.0

v0.6.0 — output scoping (--top / --min-severity)

01 Jun 14:55
cc9affd

Choose a tag to compare

Make the findings list consumable — without ever hiding what was found.

Added

  • scan --top N — emit only the N most severe findings.
  • scan --min-severity S — emit only findings ≥ S (info/low/medium/high/critical).

On a real 127k-row parquet, --top 25 shrinks the envelope from ~3 MB to ~5.6 KB.

Honest truncation

summary (total, by_class, max_severity) and the exit code always describe everything detected, never the scoped view — so filtering can't make anomalies look absent or flip exit 10. When findings are withheld, the envelope gains a scope block recording the filter and detected / emitted / dropped counts; rows carries only the emitted subset. Absent when no scoping is applied (default output unchanged). A test pins that scoping to zero rows still exits 1.

The pairing

This is the volume complement to 0.5.0's --fdr (correctness): FDR makes findings statistically defensible; output scoping makes the list consumable. Compose them: --fdr 0.01 --min-severity high --top 25 = "the 25 most severe high+ findings, among the FDR-significant set."

Contract

Additive — new optional scope field + schema update; PROTOCOL unchanged. summary.total now reports the detected count (== emitted when unscoped).

Gate

proptest + cargo-mutants 0 missed on envelope.rs and main.rs+schema.rs.

Install: cargo install anomalyx

Full changelog: v0.5.0...v0.6.0

v0.5.0 — false-discovery-rate control (--fdr)

01 Jun 13:28
e6f231b

Choose a tag to compare

A principled significance basis for the point detector — a controlled error rate instead of a magic threshold.

Added

  • scan / explain gain --fdr Q — per-column Benjamini–Hochberg false-discovery-rate control for point.modz. Each cell's modified z-score becomes a two-sided p-value, and the fixed point_threshold is replaced by a multiplicity-aware step-up cutoff bounding the expected proportion of false flags at Q (e.g. --fdr 0.05). Opt-in — omitted, behavior is unchanged. The level is part of the config_version fingerprint (pfdr=).
  • New ax_detect::fdr module: two_sided_p (normal tail via erfc) + benjamini_hochberg (deterministic step-up).

Honest scope — correctness, not volume

FDR replaces an arbitrary cutoff with a principled error-rate guarantee and adapts to how many cells were tested (a noise column stops contributing chance flags; the same outlier can be significant in a small column yet not a large one). On genuinely heavy-tailed data it can flag more cells, not fewer — those cells really are significant at Q; the old fixed cutoff was stringent in an uncalibrated way. On the real 127k-row MSHA parquet: 32,893 → 40,079 point findings at q=0.05. Capping output volume is a separate lever (column scoping today; severity / top-N next) — they compose: "top-N by score, among the FDR-significant set."

Calibration

The p-value uses the consistent-σ standardized deviation (x − center)/scale (≈ N(0,1)), not the display-scaled modified z-score.

Gate

proptest + cargo-mutants 0 missed across all four changed files (BH step-up, the (x−center) sign, and multiplicity adaptation are all pinned).

Install: cargo install anomalyx

Full changelog: v0.4.1...v0.5.0