Releases · PattersonResearch/Penrose

28 Jun 13:57

v0.3.0

a9c0362

Penrose v0.3.0 Latest

Latest

[0.3.0] — 2026-06-27

A robustness, agent-surface, and data release. Post-0.2.0 work, much of it surfaced by a fresh-clone
audit and by refereeing an external code-complete framework, then adversarially swarm-audited.

Added

Tail-risk / widow-maker gate (default-off). Every backtest now reports tail diagnostics (skew,
CVaR-5/95, tail ratio, max loss vs gain, worst-vs-typical). An opt-in TAIL_RISK_GATE kills (or caps
at watch) a stable, well-deflated strategy whose payoff is bounded-up / unbounded-down (negative
skew, fat left tail) — the short-vol / positive-carry blind spot the other gates miss. Default-off, so
no existing verdict moves; tail_asymmetric is a structural kill for principle formation.
Contrastive principles. A second distiller learns from the survivor-vs-kill boundary: when a
structural failure mode recurs in one domain but other domains yield survivors, it proposes an advisory
contrastive principle (e.g. "regime_fragile is specific to trend-following; carry survives it").
Additive (recurrence principles unchanged); surfaced via views.principles() and the read-only MCP.
Point-in-time futures data adapter (pysystemtrade). A fail-open BYO local vendor that reads
pysystemtrade adjusted-price CSVs, always resamples intraday→daily through the granularity gate before
the data can reach verdict logic, and tags provenance back-adjusted + resampled. Instrument names are
restricted to safe characters (no path traversal). Inactive/harmless when no futures dir is configured.
Agent-readable principle surface. views.principles() and views.proposals() expose the
distilled principle candidates and the propose-only store as structured read-only data, so an agent
can pull and discuss "what candidates exist" without the dashboard. The read-only MCP routes its
penrose_principles / penrose_proposals tools through these accessors (one read path, no drift);
promotion to the approved brain still requires human P9.
trend-following domain in cross-run principle inference, so trend / EWMAC claims cluster as
trend-following instead of falling through to other.
Data-granularity verification (penrose.data.granularity). Infers a series' empirical sampling
frequency from its index and flags a mismatch with the expected frequency (e.g. intraday bars where a
rule assumes daily, which silently corrupts every downstream statistic). The input-side analogue of
the existing output bars_per_year-vs-span check. DataBundle.granularity_warnings() surfaces it;
advisory and fail-open by default (no verdict change).

Fixed

Trusted operator modules now ship in every clone. The public .gitignore (modules/*) was
dropping the reviewed crypto_funding_carry and macro_vol_btc modules from published clones, so a
fresh clone failed the PROVENANCE-SHELF eval invariant (92/93) even though the README documents both.
The two trusted modules now ship (generated _auto modules stay ignored); a cold clone passes 93/93.
Graceful capacity on low-turnover strategies. capacity_ci raised OverflowError converting an
infinite modeled capacity (a strategy that barely trades drives turnover toward zero) into an integer,
crashing the entire backtest. It now drops non-finite resamples and reports capacity as undefined,
consistent with the fail-visibly contract. Regression-tested.
Public test bar. A test_cli check read Makefile.public, which the public build renames to
Makefile, so the shipped test failed in the distribution it ships to. It now reads whichever exists;
the public pytest bar is a clean 137 passed / 2 skipped.

Docs

Quickstart uses the real clone URL and surfaces the process-conditional worked example as the
recommended first reproduction; eval count corrected to 93/93 in AGENTS.md / CLAUDE.md.
Companion-paper bibliographies verified against publisher / arXiv records.

Assets 2

26 Jun 04:21

CharlesSPatterson

v0.2.0

2135d18

Penrose v0.2.0

[0.2.0] — 2026-06-25

A correctness-and-coverage release. Every change was implemented and adversarially swarm-audited;
the evaluation-invariant suite, the calibration battery (null + placebo + injection), and the unit
tests are green. The headline is two verdict-lane correctness fixes plus a real data unblock.

Verdict integrity

Order-independent deflation denominator (5c). The Deflated Sharpe multiple-testing count is now
pre-registered as a per-family cohort before evaluation, instead of a running tally read at backtest
time. Previously the same strategy could get a different verdict depending on whether it ran 1st or
8th in its family (early members were under-deflated). Now every member deflates by the full family
size, uniformly and race-free. This can only tighten verdicts (it closes a selection-bias hole); no
existing eval outcome moved.
Module generation learns to be faithful (6c). Claims are routed by type
(descriptive-statistical / trading-strategy / structural-proposition) so a descriptive claim (e.g. an
unconditional mean) is implemented as a statistic test, not a trading backtest. A pre-backtest
fidelity gate flags unfaithful specs before the expensive run, and a fidelity-rejection memory feeds
past divergences back into generation. Fidelity only ever demotes or blocks, never promotes.
Regime-scope declaration. A claim can pre-register a declared regime and be tested fairly within
it (adherence-gated), instead of being falsely killed as regime-fragile for concentrating where it
intends to trade.
CPCV / overfitting kill-lens. Combinatorial purged cross-validation (Lopez de Prado) added as an
independent robustness axis next to the bootstrap, permutation, and walk-forward gates.
Actionable underpowered verdicts. A verdict that can't resolve a realistic edge now reports how
much more would resolve it, the marginal OOS trades still needed (or the cross-sectional breadth
alternative), turning a dead-end label into a sequential next step.
Independent fidelity verifier (optional). The fidelity refuter can route to a genuinely
independent second LLM provider (configurable via PENROSE_LLM_VERIFIER_*), reducing the correlated
blind spots of a model checking its own work; it falls back to the same provider by default, and each
result records whether the check was independent.

Data ("works out of the box" for more than crypto)

Catalog-derived domain awareness. The relevance gate and spec generator read the data catalog at
runtime, so adding a new-domain series (equities, rates, inflation, commodities) makes those theses
testable and lets the generator request real series names instead of inventing them. Fail-open to the
built-in behavior when no catalog is present.
Keyless long-history adapter (Stooq). A 6th out-of-the-box data adapter: decades of daily
equity/index data with no API key, filling the gap where the free Alpha Vantage tier (~100 bars)
flipped equity theses to insufficient_data.
Conservative name-resolution. Near-miss series names resolve only on a unique high-confidence
match; ambiguous names miss (never a wrong-series resolution).
Auto-fetch the needs_data loop. When a claim needs a series an enabled vendor can supply
unambiguously, Penrose fetches it once and re-tests, instead of only logging the request. Bounded and
conservative (never supplies a wrong/ambiguous series).
Panel adapter. A panel catalog adapter type for resolution-outcome / microstructure data
(daily event-date aggregation), the framework for the largest class of data-blocked theses.

Learning surface (P9 firewall intact)

Cross-run principle distillation. Structural-kill principles are now distilled across the whole
decision corpus, not just within a single run, so recurring failure modes actually surface.
Propose-only read store. An agent-readable record of "what Penrose has learned" (status: proposed), strictly separate from the approved brain, promotion still requires human P9 approval.

Robustness & honesty

Output directories are created on startup (fresh/CI/sandboxed clones no longer fail on a missing
reports/).
A fidelity-refuter network timeout degrades to "fidelity unknown" and continues, instead of killing
the run; LLM timeouts are configurable per role; optional --max-claims.
Re-running an unchanged source is idempotent (atomic supersede by source identity), instead of
appending duplicate decisions.
Strategy-class alias collisions no longer log spurious warnings.

Agentic surface & tooling

Read-only MCP server (pip install penrose[mcp], penrose-mcp). Five read-only tools let an
agent query verdicts, proposed principles, open data-requests, and pipeline status over the Model
Context Protocol. It exposes operations, not escape hatches: nothing over MCP can approve a verdict
(P9 stays human), write the corpus, or run anything. mcp is optional; the core never requires it.
penrose run --json emits a single machine-readable result object (verdicts + principle), so callers
no longer have to tail a log; penrose run --claims <claims.json> injects pre-built structured claims
and bypasses the lossy P2 re-extraction round-trip.
Make targets honor a PY override (e.g. make eval PY=./.venv/bin/python); .PHONY completed;
stress-testing docs linked.
The public-build pipeline is hardened: tracked-files-only staging (gitignored operator artifacts can
never ship), fund-specific leak markers, a symlink guard, and a dry-run-by-default sync.

Assets 2

24 Jun 14:19

CharlesSPatterson

v0.1.0

58c5f91

Penrose v0.1.0

Penrose is an independent, power-aware falsification referee for quantitative trading claims. This is the first public release, a research prototype rather than a finished product.

What's in v0.1.0

The full falsification pipeline: ingestion, grounded claim extraction, sandboxed reconstruction, a robustness and power gate battery, and a calibrated verdict (kill / underpowered / watch / research-supported).
The statistical core: a Deflated Sharpe Ratio scoped to the size of the search seen, three-fold sign stability, a regime kill-lens, a bootstrap edge interval, a permutation test, walk-forward consistency, cost and capacity modeling, and a single-use per-claim locked holdout gated on significance.
A self-calibrated detector: placebo, injected-edge, native-breadth, dead-state, and persistence-matched controls plus a multi-null battery (eval 82/82; clone-and-go reproduces it with no key and no network).
Five working data adapters out of the box: Coinbase, Kraken, and Deribit (keyless live), plus FRED and Alpha Vantage (free key), over a bring-your-own data contract.
Pennie, the corpus-grounded chat assistant, and the corpus of invalidations.
Plain-language gate documentation (docs/GATES.md), agent onboarding (AGENTS.md), and two companion papers (systems + evidence standard).

Pre-1.0 expectations

This is a 0.x release: interfaces may change, costs and capacity are modeled rather than measured, and independent replication remains future work. The headline guarantees are described honestly, deflation scales with the search seen, and the holdout is single-use and conservative. Please try to break it and open an issue.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

[0.3.0] — 2026-06-27

Added

Fixed

Docs

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

[0.2.0] — 2026-06-25

Verdict integrity

Data ("works out of the box" for more than crypto)

Learning surface (P9 firewall intact)

Robustness & honesty

Agentic surface & tooling

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Uh oh!

Releases: PattersonResearch/Penrose

Penrose v0.3.0

[0.3.0] — 2026-06-27

Added

Fixed

Docs

Uh oh!

Penrose v0.2.0

[0.2.0] — 2026-06-25

Verdict integrity

Data ("works out of the box" for more than crypto)

Learning surface (P9 firewall intact)

Robustness & honesty

Agentic surface & tooling

Uh oh!

Penrose v0.1.0

Uh oh!