feat: raki cohort command — date-based split and diff within a single report

## Parent

Child of #257 (cohort comparison). Depends on #259 (reaggregation helper).

## Problem

After shipping pipeline improvements, there is no way to answer "did things get better?" from a single raki report. Sessions from before and after a change are aggregated together, diluting improvements (see #257 for the SODA v0.5.0 case: 67% first-pass improvement masked by 70 older sessions).

## Solution

New `raki cohort` subcommand that splits a saved JSON report by date and produces a diff using existing infrastructure.

### How it works

1. Load saved JSON report via `load_json_report()`
2. Split `report.sample_results` by `sample.session.started_at` into "before" and "after" lists
3. Call `reaggregate_scores()` (#259) on each list to get per-cohort aggregate scores
4. Feed both aggregate dicts into existing `compute_deltas()` from `report/diff.py`
5. Build a `DiffReport` with cohort labels instead of run IDs
6. Render via existing `print_diff_summary()` (CLI) and `write_diff_html_report()` (HTML)

### CLI design

```
raki cohort REPORT_JSON --since DATE [--until DATE] [--html PATH] [--fail-on-regression] [--json] [-q]
```

### Cohort labeling

- `--since 2026-05-12` → "Before 2026-05-12" vs "Since 2026-05-12"
- `--since 2026-05-01 --until 2026-05-15` → "Before 2026-05-01" vs "2026-05-01 to 2026-05-15"


## Acceptance criteria

- [ ] `raki cohort` subcommand registered in `cli.py` under `main`
- [ ] `--since` (required) splits sessions by `started_at` date
- [ ] `--until` (optional) caps the "after" cohort
- [ ] CLI output reuses `print_diff_summary()` with cohort labels instead of run IDs
- [ ] `--html` produces diff HTML report using `write_diff_html_report()`
- [ ] `--fail-on-regression` exits non-zero on regression (reuses `gates/regression.py`)
- [ ] `--json` outputs machine-readable diff data
- [ ] `-q` quiet mode for CI
- [ ] Small-n warning when either cohort has fewer than 10 sessions
- [ ] Error when either cohort is empty ("No sessions found in the 'after' cohort")
- [ ] Error when report has no `sample_results` (stripped or empty)
- [ ] Tests with synthetic report data covering: normal split, empty cohort, single-session cohort, small-n warning
- [ ] Towncrier fragment
- [ ] Update `docs/comparing-runs.md` with cohort command section



## Implementation Plan

### Task 1: Create cohort splitting helper

**File**: `src/raki/report/cohort.py` (new)

Write failing tests first in `tests/test_cohort.py`:
- `test_split_by_date_divides_sessions` — 5 sessions, split at midpoint, verify 2 groups
- `test_split_by_date_empty_before` — all sessions after the date → error
- `test_split_by_date_empty_after` — all sessions before the date → error
- `test_split_by_date_with_until` — sessions outside the until range go to "before"
- `test_split_with_no_sample_results` — empty list → error

Implement:
```python
@dataclass
class CohortSplit:
    before: list[SampleResult]
    after: list[SampleResult]
    before_label: str
    after_label: str

def split_by_date(
    sample_results: list[SampleResult],
    since: datetime,
    until: datetime | None = None,
) -> CohortSplit:
```

Logic:
1. For each `SampleResult`, check `sample.session.started_at`
2. Sessions with `started_at >= since` (and `<= until` if set) go to `after`
3. Everything else goes to `before`
4. Raise `click.UsageError` if either cohort is empty
5. Generate labels: "Before {date}" / "Since {date}" (or "... to {until}")

### Task 2: Create cohort diff builder

**File**: `src/raki/report/cohort.py`

Write failing tests:
- `test_build_cohort_diff_produces_diff_report` — verify output is a `DiffReport`
- `test_build_cohort_diff_uses_reaggregate` — verify scores come from `reaggregate_scores()`
- `test_build_cohort_diff_labels` — verify `baseline_run_id` and `compare_run_id` are the cohort labels

Implement:
```python
def build_cohort_diff(split: CohortSplit) -> DiffReport:
```

Logic:
1. Call `reaggregate_scores(split.before)` and `reaggregate_scores(split.after)`
2. Call `compute_deltas(before_scores, after_scores)`
3. Build `DiffReport` with:
   - `baseline_run_id = split.before_label`
   - `compare_run_id = split.after_label`
   - `match_result` with `baseline_total=len(before)`, `compare_total=len(after)`, no matching (different cohorts)
   - `has_session_data = False` (no per-session transitions for cohort comparison)

### Task 3: Add `raki cohort` CLI command

**File**: `src/raki/cli.py`

Add a new Click command under `main`:

```python
@main.command()
@click.argument("input_path")
@click.option("--since", required=True, type=click.DateTime(formats=["%Y-%m-%d"]))
@click.option("--until", type=click.DateTime(formats=["%Y-%m-%d"]), default=None)
@click.option("--html", "html_path", default=None)
@click.option("--fail-on-regression", is_flag=True)
@click.option("--json", "json_output", is_flag=True)
@click.option("-q", "--quiet", is_flag=True)
def cohort(input_path, since, until, html_path, fail_on_regression, json_output, quiet):
```

Logic:
1. Load report via `load_json_report()`
2. Validate `sample_results` is not empty
3. Call `split_by_date(report.sample_results, since, until)`
4. Print small-n warning if either cohort has < 10 sessions
5. Call `build_cohort_diff(split)`
6. If not quiet: `print_diff_summary(diff_report)`
7. If html_path: `write_diff_html_report(diff_report, html_path)`
8. If json_output: serialize diff to stdout
9. If fail_on_regression: reuse `_handle_diff` regression logic

### Task 4: Update comparing-runs docs

**File**: `docs/comparing-runs.md`

Add a "Cohort comparison" section after the existing diff workflow:

```markdown
## Cohort comparison within a single report

Split sessions by date to compare before/after a pipeline change:

    raki cohort results/report.json --since 2026-05-12

This splits sessions into "Before 2026-05-12" and "Since 2026-05-12"
cohorts and produces the same diff output as `raki report --diff`.
```

### Task 5: Towncrier fragment

`changes/260.feature`:
```
Add ``raki cohort`` command for date-based before/after comparison within a
single report. Splits sessions by ``--since`` date, reaggregates metrics per
cohort, and produces the same diff output as ``raki report --diff``.
```

### Verification

```bash
uv run pytest tests/test_cohort.py -v
uv run pytest tests/ -v -m "not slow"
uv run ruff check src/ tests/
uv run ty check src/raki/
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: raki cohort command — date-based split and diff within a single report #260

Parent

Problem

Solution

How it works

CLI design

Cohort labeling

Acceptance criteria

Implementation Plan

Task 1: Create cohort splitting helper

Task 2: Create cohort diff builder

Task 3: Add `raki cohort` CLI command

Task 4: Update comparing-runs docs

Task 5: Towncrier fragment

Verification

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: raki cohort command — date-based split and diff within a single report #260

Description

Parent

Problem

Solution

How it works

CLI design

Cohort labeling

Acceptance criteria

Implementation Plan

Task 1: Create cohort splitting helper

Task 2: Create cohort diff builder

Task 3: Add raki cohort CLI command

Task 4: Update comparing-runs docs

Task 5: Towncrier fragment

Verification

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Task 3: Add `raki cohort` CLI command