TraderAlice · luokerenx4 · Apr 23, 2026 · Apr 23, 2026 · Apr 23, 2026 · Apr 23, 2026
diff --git a/README.md b/README.md
@@ -31,29 +31,35 @@ them. Full write-up in
 
 ## How it works
 
-Four files that matter:
+Four things that matter:
 
 - **`config.json`** — FreqTrade config, fixed. Pairs, timeframe, fees, dry-run
   wallet, timerange. The agent does not touch this.
 - **`prepare.py`** — one-time data download from Binance via FreqTrade's Python
   API. The agent does not touch this.
-- **`run.py`** — in-process backtest. Calls FreqTrade's `Backtesting` class
-  directly (no CLI), extracts key metrics, prints a parseable `---` summary
-  block to stdout. The agent does not touch this.
-- **`user_data/strategies/AutoResearch.py`** — **the only file the agent edits**.
-  Contains the full strategy: indicators, entry/exit logic, ROI/stoploss. This
-  is the `train.py` equivalent of Karpathy's setup.
+- **`run.py`** — in-process **batch backtest**. Discovers every `.py` under
+  `user_data/strategies/` (skipping files prefixed `_`), runs FreqTrade's
+  `Backtesting` for each, and prints one `---` summary block per strategy.
+  The agent does not touch this.
+- **`user_data/strategies/`** — **the directory the agent owns**. Each `.py`
+  is one strategy; up to 3 active at a time. Agent creates / evolves / forks
+  / kills strategies here. `_template.py.example` is the skeleton reference.
 
 Plus:
 
 - **`program.md`** — the autonomous-research instructions the human points the
-  LLM agent at. Direct analog of Karpathy's `program.md`.
-- **`results.tsv`** — the journal. `commit | sharpe | max_dd | status | description`.
-  Git-ignored so it survives `git reset --hard` when the agent rolls back a
-  failed experiment — past lessons stay available even when experimental
+  LLM agent at.
+- **`results.tsv`** — event log. Schema: `commit | event | strategy_name | sharpe | max_dd | note`.
+  Events: `create | evolve | stable | fork | kill`. Gitignored so it survives
+  `git reset --hard` — past lessons stay available even when experimental
   commits get thrown away.
-- **`analysis.ipynb`** — post-hoc read of `results.tsv` once the loop has
-  collected some data.
+- **`analysis.ipynb`** — post-hoc read: per-strategy trajectories, cap
+  utilization, event distribution, note word frequency.
+
+*(v0.1.0 used a single `AutoResearch.py` file that the agent mutated in place.
+That mode anchored the agent on one paradigm for all 99 rounds. v0.2.0
+switched to multi-strategy; v0.1.0 is archived under [`versions/0.1.0/`](versions/0.1.0/)
+with a full [retrospective](versions/0.1.0/retrospective.md).)*
 
 ## Requirements
 
@@ -79,11 +85,15 @@ uv sync
 # 4. One-time data download (~a few minutes)
 uv run prepare.py
 
-# 5. Sanity check — run the baseline backtest
-uv run run.py > run.log 2>&1 && grep "^---" -A 12 run.log
+# 5. Sanity check — with no strategies yet, run.py should report
+#    "no strategies found" and exit. That's expected — the agent creates
+#    1-3 starting strategies during setup before the first real backtest.
+uv run run.py > run.log 2>&1; echo "exit=$?"
 ```
 
-If step 5 prints a `---` block ending with a `pairs:` line, you're ready.
+If step 5 prints `no strategies found...` and `exit=2`, you're ready. (An
+actual backtest run only starts once the agent has created at least one
+strategy file.)
 
 ## Running the agent
 
@@ -131,27 +141,34 @@ Auto-Quant/
 ├── analysis.ipynb                     # post-hoc analysis
 ├── user_data/
 │   ├── strategies/
-│   │   └── AutoResearch.py            # THE one file the agent edits
+│   │   ├── _template.py.example       # skeleton the agent copies from
+│   │   └── <agent-created files>.py   # up to 3 active at a time
 │   ├── data/                          # gitignored — downloaded OHLCV
 │   └── backtest_results/              # gitignored — FreqTrade outputs
-└── results.tsv                        # gitignored — agent's journal
+├── versions/                          # frozen snapshots of past runs
+└── results.tsv                        # gitignored — agent's event log
 ```
 
 ## Design notes
 
-- **Agent only modifies one file.** All other files are the evaluation
-  contract. This is the single biggest design decision; it keeps diffs
-  reviewable and prevents Goodharting the metric.
+- **Agent owns one directory, not one file.** `user_data/strategies/` is its
+  workspace; everything else is evaluation contract. Up to 3 strategies
+  simultaneously, hard cap. Multi-strategy exists specifically to fight
+  the single-paradigm anchoring that v0.1.0 exhibited.
 - **No CLI indirection.** The agent only runs `uv run prepare.py` and
   `uv run run.py`. `run.py` uses FreqTrade's `Backtesting` class in-process,
   so startup is fast and errors surface as real Python stack traces.
-- **`results.tsv` is gitignored.** When the agent reverts a failed
-  experiment with `git reset --hard`, the journal of what was tried
-  survives. Essential to avoid re-trying the same bad ideas.
-- **LLM decides keep/discard, not a scalar rule.** Backtest Sharpe on a
-  finite window is noisy. Rather than `if new_sharpe > old_sharpe: keep`,
-  the agent reads the full summary block and decides based on sharpe +
-  drawdown + trade count + its own read on the asset.
+- **`results.tsv` is a gitignored event log.** Each round, the agent appends
+  rows (one per strategy touched, with event type: create/evolve/stable/fork/kill).
+  It survives `git reset --hard` so past lessons stay available even when
+  experimental commits get thrown away.
+- **LLM decides keep/kill, not a scalar rule.** Sharpe on a finite window
+  is noisy and gameable. Agent reads the full per-strategy summary blocks
+  and decides inline which strategies to evolve, fork, or kill — the
+  program.md rules force action but not which action.
+- **Stagnation rule.** A strategy can't sit idle for more than 3 consecutive
+  stable rounds — agent must evolve, fork, or kill it. With only 3 slots,
+  dead weight is expensive.
 
 ## License
 

diff --git a/analysis.ipynb b/analysis.ipynb
@@ -3,138 +3,64 @@
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": [
-    "# Auto-Quant Experiment Analysis\n",
-    "\n",
-    "Analysis of autonomous strategy-iteration results from `results.tsv`.\n",
-    "\n",
-    "Run this after the agent has collected some experiments."
-   ]
+   "source": "# Auto-Quant Experiment Analysis — v0.2.0 multi-strategy event log\n\nReads `results.tsv` (event log schema: `commit | event | strategy_name | sharpe | max_dd | note`) and produces per-strategy timelines, active-slot utilization, event distributions, and note word frequency.\n\nRun this after the agent has accumulated some rounds of events."
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "import matplotlib.pyplot as plt\n",
-    "import numpy as np\n",
-    "\n",
-    "df = pd.read_csv(\"results.tsv\", sep=\"\\t\")\n",
-    "df[\"sharpe\"] = pd.to_numeric(df[\"sharpe\"], errors=\"coerce\")\n",
-    "df[\"max_dd\"] = pd.to_numeric(df[\"max_dd\"], errors=\"coerce\")\n",
-    "df[\"status\"] = df[\"status\"].str.strip().str.lower()\n",
-    "\n",
-    "print(f\"Total experiments: {len(df)}\")\n",
-    "print(f\"Columns: {list(df.columns)}\")\n",
-    "df.head(10)"
-   ]
+   "source": "import pandas as pd\nimport matplotlib.pyplot as plt\nimport numpy as np\n\ndf = pd.read_csv(\"results.tsv\", sep=\"\\t\")\ndf[\"sharpe\"] = pd.to_numeric(df[\"sharpe\"], errors=\"coerce\")\ndf[\"max_dd\"] = pd.to_numeric(df[\"max_dd\"], errors=\"coerce\")\ndf[\"event\"] = df[\"event\"].str.strip().str.lower()\n\n# For fork events, strategy_name is \"parent→child\" — split into two columns\n# so we can treat the child as the new strategy moving forward.\ndef _canonical(name: str) -> str:\n    if isinstance(name, str) and \"→\" in name:\n        return name.split(\"→\", 1)[1]\n    return name\n\ndf[\"parent\"] = df[\"strategy_name\"].map(\n    lambda s: s.split(\"→\", 1)[0] if isinstance(s, str) and \"→\" in s else None\n)\ndf[\"strategy\"] = df[\"strategy_name\"].map(_canonical)\n\n# Give each row a time-ordered index (the row order IS the event order\n# because we append as we go). Keep \"commit\" for reference.\ndf = df.reset_index(drop=True)\ndf[\"round_idx\"] = df.index\n\nprint(f\"Total events: {len(df)}\")\nprint(f\"Unique strategies ever seen: {df['strategy'].nunique()}\")\nprint(f\"Columns: {list(df.columns)}\")\ndf.head(10)"
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": [
-    "counts = df[\"status\"].value_counts()\n",
-    "print(\"Experiment outcomes:\")\n",
-    "print(counts.to_string())\n",
-    "\n",
-    "n_keep = counts.get(\"keep\", 0)\n",
-    "n_discard = counts.get(\"discard\", 0)\n",
-    "n_crash = counts.get(\"crash\", 0)\n",
-    "n_decided = n_keep + n_discard\n",
-    "if n_decided > 0:\n",
-    "    print(f\"\\nKeep rate: {n_keep}/{n_decided} = {n_keep / n_decided:.1%}\")\n",
-    "    print(f\"Crash rate: {n_crash}/{len(df)} = {n_crash / len(df):.1%}\")"
-   ]
+   "source": "# Event type distribution\nevent_counts = df[\"event\"].value_counts()\nprint(\"Events:\")\nprint(event_counts.to_string())\n\nprint(f\"\\nTotal strategies created: {event_counts.get('create', 0) + event_counts.get('fork', 0)}\")\nprint(f\"Total strategies killed:  {event_counts.get('kill', 0)}\")\nprint(f\"Evolve moves:             {event_counts.get('evolve', 0)}\")\nprint(f\"Stable rounds:            {event_counts.get('stable', 0)}\")\n\n# Which strategies are still alive at the end?\n# A strategy is alive iff its most recent event is NOT 'kill'\nlast_event_per = df.groupby(\"strategy\").tail(1).set_index(\"strategy\")\nalive = last_event_per[last_event_per[\"event\"] != \"kill\"].index.tolist()\ndead = last_event_per[last_event_per[\"event\"] == \"kill\"].index.tolist()\nprint(f\"\\nStill alive at end ({len(alive)}): {alive}\")\nprint(f\"Killed during run  ({len(dead)}): {dead}\")"
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": [
-    "# All KEPT experiments — the ones that stuck\n",
-    "kept = df[df[\"status\"] == \"keep\"].copy().reset_index(drop=True)\n",
-    "print(f\"KEPT experiments ({len(kept)}):\\n\")\n",
-    "for i, row in kept.iterrows():\n",
-    "    print(f\"  #{i:3d}  sharpe={row['sharpe']:.4f}  dd={row['max_dd']:.1f}%  {row['description']}\")"
-   ]
+   "source": "# Peak sharpe per strategy (across its lifetime)\npeak = df.dropna(subset=[\"sharpe\"]).groupby(\"strategy\")[\"sharpe\"].max().sort_values(ascending=False)\nprint(\"Peak Sharpe per strategy:\\n\")\nfor name, s in peak.items():\n    last_evt = last_event_per.loc[name, \"event\"]\n    status = \"☠ killed\" if last_evt == \"kill\" else \"✓ alive\"\n    print(f\"  {status}  {name:30s}  peak_sharpe={s:.4f}\")"
   },
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": [
-    "## Sharpe Frontier Over Time\n",
-    "\n",
-    "Running max of sharpe as experiments accumulate. The plateau shows where\n",
-    "the agent stopped finding improvements."
-   ]
+   "source": "## Per-strategy Sharpe trajectories\n\nEach active line = one strategy's sharpe over its lifetime. Gaps indicate\n`stable` rounds where the strategy wasn't modified. A line ending in a dot\nmeans the strategy was killed at that point. Scatter markers by event type\nlet you see where the agent chose to evolve / fork / kill each strategy."
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "df_ordered = df.reset_index(drop=True)\n\n# Running best must only consider KEEP experiments. Discarded runs\n# (including retroactive discards — e.g. agent rolling back Goodhart\n# wins) were explicitly rejected and should not move the frontier.\nkept_sharpe = df_ordered[\"sharpe\"].where(df_ordered[\"status\"] == \"keep\")\ndf_ordered[\"running_max_sharpe\"] = kept_sharpe.cummax()\n\nstatus_color = {\"keep\": \"tab:green\", \"discard\": \"tab:gray\", \"crash\": \"tab:red\"}\n\nfig, ax = plt.subplots(figsize=(10, 5))\nfor status, color in status_color.items():\n    mask = df_ordered[\"status\"] == status\n    ax.scatter(df_ordered.index[mask], df_ordered[\"sharpe\"][mask],\n               alpha=0.5, c=color, label=status)\nax.plot(df_ordered.index, df_ordered[\"running_max_sharpe\"],\n        color=\"red\", label=\"running best (keep-only)\")\nax.set_xlabel(\"experiment #\")\nax.set_ylabel(\"sharpe\")\nax.set_title(\"Sharpe frontier\")\nax.legend()\nax.grid(alpha=0.3)\nplt.show()"
+   "source": "fig, ax = plt.subplots(figsize=(12, 6))\n\nevent_marker = {\"create\": \"o\", \"evolve\": \".\", \"fork\": \"^\", \"stable\": \",\", \"kill\": \"x\"}\n\nfor name, g in df.dropna(subset=[\"sharpe\"]).groupby(\"strategy\"):\n    g = g.sort_values(\"round_idx\")\n    ax.plot(g[\"round_idx\"], g[\"sharpe\"], alpha=0.5, label=name)\n    for evt, marker in event_marker.items():\n        sub = g[g[\"event\"] == evt]\n        if len(sub) and evt != \"stable\":  # don't clutter with stable markers\n            ax.scatter(sub[\"round_idx\"], sub[\"sharpe\"], marker=marker, s=50, alpha=0.9)\n\n# Mark kills explicitly (last-event-of-strategy with no sharpe)\nfor name, g in df.groupby(\"strategy\"):\n    last = g.iloc[-1]\n    if last[\"event\"] == \"kill\":\n        # put an x at y=0 at the kill position to signal strategy ended\n        ax.scatter([last[\"round_idx\"]], [0], marker=\"x\", c=\"red\", s=80)\n\nax.axhline(0, color=\"gray\", linewidth=0.5, alpha=0.5)\nax.set_xlabel(\"event # (chronological)\")\nax.set_ylabel(\"sharpe\")\nax.set_title(\"Per-strategy Sharpe trajectories\")\nax.legend(loc=\"best\", fontsize=8)\nax.grid(alpha=0.3)\nplt.show()"
   },
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": [
-    "## Sharpe vs Drawdown Scatter\n",
-    "\n",
-    "Any kept experiment with drawdown way worse than baseline is probably\n",
-    "over-fitting to a specific regime."
-   ]
+   "source": "## Active strategy count over time\n\nWith a hard cap of 3, the count should mostly sit at 3 (agent keeps slots full)\nand occasionally dip to 2 when a strategy is killed without immediate replacement.\nSustained dips below 3 mean the agent isn't filling the cap — either exploring\ncautiously or running out of ideas."
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": [
-    "fig, ax = plt.subplots(figsize=(8, 6))\n",
-    "colors = {\"keep\": \"green\", \"discard\": \"gray\", \"crash\": \"red\"}\n",
-    "for status, g in df.groupby(\"status\"):\n",
-    "    ax.scatter(g[\"max_dd\"], g[\"sharpe\"], alpha=0.6, label=status, c=colors.get(status, \"black\"))\n",
-    "ax.set_xlabel(\"max drawdown %\")\n",
-    "ax.set_ylabel(\"sharpe\")\n",
-    "ax.set_title(\"Sharpe vs drawdown, colored by outcome\")\n",
-    "ax.legend()\n",
-    "ax.grid(alpha=0.3)\n",
-    "plt.show()"
-   ]
+   "source": "# Walk the event log, maintain a set of active strategies, record count per event.\n# `fork` is counted as a create for the child (+1), `kill` as -1, create as +1.\nactive = set()\ncounts = []\nfor _, row in df.iterrows():\n    name = row[\"strategy\"]\n    evt = row[\"event\"]\n    if evt == \"create\" or evt == \"fork\":\n        active.add(name)\n    elif evt == \"kill\":\n        active.discard(name)\n    # evolve / stable don't change membership\n    counts.append(len(active))\n\ndf[\"active_count\"] = counts\n\nfig, ax = plt.subplots(figsize=(12, 3))\nax.plot(df[\"round_idx\"], df[\"active_count\"], drawstyle=\"steps-post\")\nax.axhline(3, color=\"red\", linestyle=\"--\", alpha=0.3, label=\"cap (3)\")\nax.set_xlabel(\"event #\")\nax.set_ylabel(\"active strategies\")\nax.set_title(\"Cap utilization over time\")\nax.set_ylim(0, 4)\nax.legend()\nax.grid(alpha=0.3)\nplt.show()"
   },
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": [
-    "## Description Word Frequency\n",
-    "\n",
-    "What themes show up in the agent's own descriptions of what it tried?\n",
-    "A rough proxy for which directions it explored."
-   ]
+   "source": "## Note word frequency\n\nRough proxy for what paradigms, indicators, and failure modes the agent\nthought about during this run. Skim the top 30 — if you see only one\nparadigm family dominating, anchoring is creeping in."
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": [
-    "from collections import Counter\n",
-    "import re\n",
-    "\n",
-    "text = \" \".join(df[\"description\"].dropna().astype(str).str.lower().tolist())\n",
-    "words = re.findall(r\"[a-z]{3,}\", text)\n",
-    "stop = {\"the\", \"and\", \"for\", \"with\", \"this\", \"that\", \"from\", \"was\", \"too\", \"add\", \"added\", \"use\", \"using\"}\n",
-    "top = Counter(w for w in words if w not in stop).most_common(25)\n",
-    "print(\"Top words across all descriptions:\")\n",
-    "for w, c in top:\n",
-    "    print(f\"  {c:4d}  {w}\")"
-   ]
+   "source": "from collections import Counter\nimport re\n\ntext = \" \".join(df[\"note\"].dropna().astype(str).str.lower().tolist())\nwords = re.findall(r\"[a-z]{3,}\", text)\nstop = {\n    \"the\", \"and\", \"for\", \"with\", \"this\", \"that\", \"from\", \"was\", \"too\",\n    \"add\", \"added\", \"use\", \"using\", \"but\", \"all\", \"not\", \"has\", \"have\",\n    \"trade\", \"trades\", \"run\", \"ran\", \"same\", \"than\", \"more\", \"less\",\n    \"still\", \"then\", \"one\", \"two\", \"new\", \"old\", \"now\",\n}\ntop = Counter(w for w in words if w not in stop).most_common(30)\nprint(\"Top 30 words across all notes:\")\nfor w, c in top:\n    print(f\"  {c:4d}  {w}\")"
   }
  ],
  "metadata": {