Scout: Statistical Evaluation Framework (from eval-robuste)

## Statistical Evaluation Framework

**Source:** [Alexmacapple/alex-claude-skill](https://github.com/Alexmacapple/alex-claude-skill)
**Score:** 4.3/5.0 (Impact: 4, Novelty: 5, Applicability: 4, Effort: 4)

### Three techniques from eval-robuste

**1. Prompt Hash Chaining for Baseline Integrity**
Hash the full evaluation chain (SKILL.md + zone defs + agent prompts + git SHA). Store in results. Reject comparisons when hash changes — prevents confusing prompt changes with code improvements.

**2. Deterministic Stats Delegation**
ALL math (mean, stdev, CI, verdict) delegated to a Python script. LLM never computes numbers. Any math by the LLM is a protocol violation. Add `score_audit.py` to ShipGuard.

**3. Intrinsic Stability Verdict (NOISE/REGRESSION/IMPROVEMENT)**
Run audit N times, compute stdev of finding counts. STABLE if stdev < threshold. With baseline: NOISE/REGRESSION/IMPROVEMENT based on |delta| > sigma * ref_stdev. Prevents chasing phantom regressions from LLM stochasticity.

### What ShipGuard should do
Build a `sg-eval` skill or extend sg-improve with:
- `prompt_hash.py` — hash evaluation chain, store in results, reject invalid comparisons
- `score_audit.py` — deterministic aggregation (no LLM math)
- N-run stability detection with NOISE/REGRESSION/IMPROVEMENT verdicts

**Affected skill:** `sg-improve`
**Mutation type:** `add_constraint`
**Scouted:** 2026-04-16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scout: Statistical Evaluation Framework (from eval-robuste) #44

Statistical Evaluation Framework

Three techniques from eval-robuste

What ShipGuard should do

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Scout: Statistical Evaluation Framework (from eval-robuste) #44

Description

Statistical Evaluation Framework

Three techniques from eval-robuste

What ShipGuard should do

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions