EntityProcess · christso · May 15, 2026 · May 14, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -154,17 +154,28 @@ cd ../agentv.worktrees/<type>-<short-desc>
 - Prefer named exports
 - Keep modules cohesive
 
+## Naming Convention: "Project" vs "Benchmark"
+
+These two words have distinct, non-interchangeable meanings in this codebase. Get them right when adding new symbols, docs, or example dirs:
+
+- **Project** — the top-level container Studio organises around: a registered workspace directory (`.agentv/` + run artifacts + traces + experiments). Lives in `~/.agentv/projects.yaml`. Modelled by `ProjectEntry` / `ProjectRegistry` in `packages/core/src/projects.ts`. Matches the terminology used by Phoenix, Langfuse, Braintrust, W&B Weave, and LangSmith.
+- **Benchmark** — a curated *eval suite* designed to measure something specific (academic ML sense: MMLU, HumanEval, SWE-bench). Example dirs use this sense: `examples/showcase/multi-model-benchmark/`, `examples/showcase/offline-grader-benchmark/`, `examples/features/benchmark-tooling/`. Do not rename these — they are correctly named.
+
+The legacy registry file `~/.agentv/benchmarks.yaml` is auto-migrated to `projects.yaml` on first load by `migrateLegacyBenchmarksFile()`. The unrelated per-run `benchmark.json` artifact (Agent Skills compatibility output) is a third, separate concept — also keep that name.
+
+When in doubt: if the thing holds runs / traces / experiments, it's a **project**. If it's a curated set of eval cases meant to measure capability, it's a **benchmark**.
+
 ## Wire Format Convention
 
 **Everything that crosses a process boundary uses `snake_case` keys. Internal TypeScript uses `camelCase`. Translate at the boundary — never in the middle.**
 
 The rule is blanket: if the key is going to disk, to a user's editor, into a JSON response, or onto a CLI, it's snake_case. There is no "well this file is internal-ish" carve-out. If in doubt, snake_case.
 
 ### snake_case surfaces
-- All YAML files on disk: `*.eval.yaml`, `agentv.config.yaml`, `benchmarks.yaml`, `studio/config.yaml`, any future YAML we add.
+- All YAML files on disk: `*.eval.yaml`, `agentv.config.yaml`, `projects.yaml`, `studio/config.yaml`, any future YAML we add.
 - JSONL result files (`test_id`, `token_usage`, `duration_ms`).
 - Artifact-writer output (`pass_rate`, `tests_run`, `total_tool_calls`).
-- HTTP response bodies from `agentv serve` / Studio (`added_at`, `pass_rate`, `benchmark_id`).
+- HTTP response bodies from `agentv serve` / Studio (`added_at`, `pass_rate`, `project_id`).
 - CLI JSON output (`agentv results summary`, `results failures`, `results show`).
 - Anything consumed by non-TS tooling (Python, jq pipelines, external dashboards).
 
@@ -177,7 +188,7 @@ Define a second interface for the wire shape and convert in one place — don't
 
 ```typescript
 // Wire shape — snake_case, matches what hits disk / the network
-interface BenchmarkEntryYaml {
+interface ProjectEntryYaml {
   id: string;
   name: string;
   path: string;
@@ -186,19 +197,19 @@ interface BenchmarkEntryYaml {
 }
 
 // Internal shape — camelCase, what every TS call site sees
-interface BenchmarkEntry {
+interface ProjectEntry {
   id: string;
   name: string;
   path: string;
   addedAt: string;
   lastOpenedAt: string;
 }
 
-function fromYaml(e: BenchmarkEntryYaml): BenchmarkEntry {
+function fromYaml(e: ProjectEntryYaml): ProjectEntry {
   return { id: e.id, name: e.name, path: e.path, addedAt: e.added_at, lastOpenedAt: e.last_opened_at };
 }
 
-function toYaml(e: BenchmarkEntry): BenchmarkEntryYaml {
+function toYaml(e: ProjectEntry): ProjectEntryYaml {
   return { id: e.id, name: e.name, path: e.path, added_at: e.addedAt, last_opened_at: e.lastOpenedAt };
 }
 ```
@@ -213,7 +224,7 @@ Yes, this is two interfaces and two functions per entity. That's the price of ke
 ### Existing divergences
 If you spot a camelCase key already on disk or in a response (e.g. a legacy endpoint), treat it as a bug: migrate it to snake_case in the same PR where you touch that code path. Don't grandfather it in.
 
-**Reading back:** `parseJsonlResults()` in `artifact-writer.ts` converts snake_case → camelCase when reading JSONL into TypeScript. `fromYaml` / `toYaml` in `packages/core/src/benchmarks.ts` is the model for YAML boundaries.
+**Reading back:** `parseJsonlResults()` in `artifact-writer.ts` converts snake_case → camelCase when reading JSONL into TypeScript. `fromYaml` / `toYaml` in `packages/core/src/projects.ts` is the model for YAML boundaries.
 
 **Why:** Aligns with skill-creator (claude-plugins-official) and broader Python/JSON ecosystem conventions where snake_case is the standard wire format.
 

diff --git a/apps/web/src/content/docs/docs/evaluation/running-evals.mdx b/apps/web/src/content/docs/docs/evaluation/running-evals.mdx
@@ -429,7 +429,7 @@ The `{timestamp}` placeholder is replaced with an ISO-like timestamp (e.g., `202
 
 ### AGENTV_HOME
 
-Override the data directory for heavy runtime artifacts — workspaces, workspace pool, subagents, trace state, git cache, and downloaded dependencies. Lightweight config and cache files (`version-check.json`, `last-config.json`, `benchmarks.yaml`) always stay in `~/.agentv` regardless of this setting.
+Override the data directory for heavy runtime artifacts — workspaces, workspace pool, subagents, trace state, git cache, and downloaded dependencies. Lightweight config and cache files (`version-check.json`, `last-config.json`, `projects.yaml`) always stay in `~/.agentv` regardless of this setting.
 
 ```bash
 # Linux/macOS

diff --git a/apps/web/src/content/docs/docs/tools/studio.mdx b/apps/web/src/content/docs/docs/tools/studio.mdx
@@ -45,10 +45,10 @@ agentv studio .agentv/results/runs/2026-03-30T11-45-56-989Z
 |--------|-------------|
 | `--port`, `-p` | Port to listen on (flag > `PORT` env var > 3117) |
 | `--dir`, `-d` | Working directory (default: current directory) |
-| `--multi` | Launch in multi-benchmark dashboard mode (deprecated; use auto-detect or `--single`) |
-| `--single` | Force single-benchmark dashboard mode |
-| `--add <path>` | Register a benchmark by path |
-| `--remove <id>` | Unregister a benchmark by ID |
+| `--multi` | Launch in multi-project dashboard mode (deprecated; use auto-detect or `--single`) |
+| `--single` | Force single-project dashboard mode |
+| `--add <path>` | Register a project by path |
+| `--remove <id>` | Unregister a project by ID |
 
 ## Features
 
@@ -138,25 +138,25 @@ The section includes the following visualizations:
 
 The baseline comparison is also available via the API: `GET /api/compare?baseline=<target>` adds `delta` and `normalized_gain` fields to each non-baseline cell in the response.
 
-## Benchmarks Dashboard
+## Projects Dashboard
 
-By default, Studio shows results for the current directory. Register multiple benchmark repos to view them from a single dashboard.
+By default, Studio shows results for the current directory. Register multiple project repos to view them from a single dashboard.
 
-### Registering Benchmarks
+### Registering Projects
 
-Register benchmark repos one at a time:
+Register project repos one at a time:
 
 ```bash
 agentv studio --add /path/to/my-evals
 agentv studio --add /path/to/other-evals
 ```
 
-Each path must contain a `.agentv/` directory. Registered benchmarks are stored in `~/.agentv/benchmarks.yaml`.
+Each path must contain a `.agentv/` directory. Registered projects are stored in `~/.agentv/projects.yaml`.
 
-To register a remote repo and keep it synced automatically, add a `source` block to the entry in `~/.agentv/benchmarks.yaml`:
+To register a remote repo and keep it synced automatically, add a `source` block to the entry in `~/.agentv/projects.yaml`:
 
 ```yaml
-benchmarks:
+projects:
   - id: my-evals
     name: My Evals
     path: /srv/agentv/my-evals
@@ -169,32 +169,32 @@ On each Studio startup, AgentV clones the repo if the path is empty (`git clone
 
 ### Runtime behavior: no restart needed
 
-`benchmarks.yaml` is the single source of truth. Studio re-reads it on every `/api/benchmarks` request (which the UI polls every ~10 s), so any of these changes appear live without restarting `agentv serve`:
+`projects.yaml` is the single source of truth. Studio re-reads it on every `/api/projects` request (which the UI polls every ~10 s), so any of these changes appear live without restarting `agentv serve`:
 
-- Adding via the UI's **Add Benchmark** form or `POST /api/benchmarks`.
-- Removing via the UI's **Remove** button or `DELETE /api/benchmarks/:id`.
-- Editing `~/.agentv/benchmarks.yaml` directly.
+- Adding via the UI's **Add Project** form or `POST /api/projects`.
+- Removing via the UI's **Remove** button or `DELETE /api/projects/:id`.
+- Editing `~/.agentv/projects.yaml` directly.
 - Mounting the file via a Kubernetes ConfigMap — GitOps the ConfigMap and Studio reflects it within the next poll.
 
-This satisfies the 24/7-Studio use case: the server stays up; benchmarks come and go through config edits or API calls.
+This satisfies the 24/7-Studio use case: the server stays up; projects come and go through config edits or API calls.
 
 ### Launching the Dashboard
 
-Studio auto-detects the mode based on how many benchmarks are registered:
+Studio auto-detects the mode based on how many projects are registered:
 
-- `0` or `1` registered: single-benchmark view
-- `2+` registered: Benchmarks dashboard
+- `0` or `1` registered: single-project view
+- `2+` registered: Projects dashboard
 
 ```bash
 agentv studio          # auto-detects
-agentv studio --single # force single-benchmark view
+agentv studio --single # force single-project view
 ```
 
-The landing page shows a card for each benchmark with run count, pass rate, and last run time.
+The landing page shows a card for each project with run count, pass rate, and last run time.
 
-<Image src={studioProjectsMulti} alt="AgentV Studio benchmarks dashboard showing benchmark cards with pass rates" />
+<Image src={studioProjectsMulti} alt="AgentV Studio projects dashboard showing project cards with pass rates" />
 
-### Removing a Benchmark
+### Removing a Project
 
 Unregister by its ID: