Status
Medium implementation issue. Start after the RunContract projection (#98) and evidence/completion primitives (#99).
Summary
Add advisory ScoringHint / quality-signal primitives to RunContract status so supervisors can see what needs attention without treating the score as final authority or replacing existing validation gates.
Problem
Kapi has mode-specific qualityProbe readiness for Ralph and Autoresearch, but RunContract needs a generic advisory signal over evidence, artifacts, completion, freshness, and risk. If this is modeled as a hard gate or opaque numeric score too early, it can conflict with existing validation and human/workflow authority.
Scope
- Introduce generic advisory quality hint primitives.
- Compute simple transparent hints from generic RunContract state, evidence, artifacts, completion criteria, and risks.
- Include reason strings beside each hint so the result is inspectable.
- Keep score/hint metadata fail-open where possible: bad or missing scoring metadata should not break the underlying run.
Non-goals
- No hard approval gate for merge, PR review, tracker closure, or workflow completion.
- No replacement for
kapi-agent review or existing mode-specific validation.
- No opaque ML/ranking system.
- No GitHub/PR/kapi-agent-specific gate names in core scoring.
Design notes
- Prefer reason-first hints such as
ok, attention, blocked, or unknown before adding numeric scores.
- Keep existing
qualityProbe behavior intact; RunContract quality hints should be generic and advisory.
- If reusing existing quality-probe concepts, avoid making Ralph/Autoresearch dimensions mandatory for all runs.
- Quality hints should not become Goodhart targets; they should explain evidence-backed readiness rather than optimize for superficial score passing.
- Hints are advisory evaluator outputs, not completion authority.
- Simplicity/risk hints should flag abstraction bloat, unresolved decisions, stale context, and missing evidence where applicable.
Acceptance criteria
Verification
- Run targeted quality hint/scoring tests.
- Run affected quality-probe/domain tests.
- Inspect JSON fixtures or snapshots for generic wording and stable reason strings.
References
Status
Medium implementation issue. Start after the RunContract projection (#98) and evidence/completion primitives (#99).
Summary
Add advisory
ScoringHint/ quality-signal primitives to RunContract status so supervisors can see what needs attention without treating the score as final authority or replacing existing validation gates.Problem
Kapi has mode-specific
qualityProbereadiness for Ralph and Autoresearch, but RunContract needs a generic advisory signal over evidence, artifacts, completion, freshness, and risk. If this is modeled as a hard gate or opaque numeric score too early, it can conflict with existing validation and human/workflow authority.Scope
Non-goals
kapi-agentreview or existing mode-specific validation.Design notes
ok,attention,blocked, orunknownbefore adding numeric scores.qualityProbebehavior intact; RunContract quality hints should be generic and advisory.Acceptance criteria
Verification
References
src/domain/quality-probe.ts,test/quality-probe-matrix.test.ts,test/quality-integration.test.ts