Skip to content

feat: add soft quality scoring hints for RunContract status #100

@devkade

Description

@devkade

Status

Medium implementation issue. Start after the RunContract projection (#98) and evidence/completion primitives (#99).

Summary

Add advisory ScoringHint / quality-signal primitives to RunContract status so supervisors can see what needs attention without treating the score as final authority or replacing existing validation gates.

Problem

Kapi has mode-specific qualityProbe readiness for Ralph and Autoresearch, but RunContract needs a generic advisory signal over evidence, artifacts, completion, freshness, and risk. If this is modeled as a hard gate or opaque numeric score too early, it can conflict with existing validation and human/workflow authority.

Scope

  • Introduce generic advisory quality hint primitives.
  • Compute simple transparent hints from generic RunContract state, evidence, artifacts, completion criteria, and risks.
  • Include reason strings beside each hint so the result is inspectable.
  • Keep score/hint metadata fail-open where possible: bad or missing scoring metadata should not break the underlying run.

Non-goals

  • No hard approval gate for merge, PR review, tracker closure, or workflow completion.
  • No replacement for kapi-agent review or existing mode-specific validation.
  • No opaque ML/ranking system.
  • No GitHub/PR/kapi-agent-specific gate names in core scoring.

Design notes

  • Prefer reason-first hints such as ok, attention, blocked, or unknown before adding numeric scores.
  • Keep existing qualityProbe behavior intact; RunContract quality hints should be generic and advisory.
  • If reusing existing quality-probe concepts, avoid making Ralph/Autoresearch dimensions mandatory for all runs.
  • Quality hints should not become Goodhart targets; they should explain evidence-backed readiness rather than optimize for superficial score passing.
  • Hints are advisory evaluator outputs, not completion authority.
  • Simplicity/risk hints should flag abstraction bloat, unresolved decisions, stale context, and missing evidence where applicable.

Acceptance criteria

  • A RunContract status can include advisory quality hints with stable concise reasons.
  • Missing/incomplete evidence can lower or flag readiness in generic wording.
  • Tests cover complete, incomplete, stale/missing-evidence, and unknown states.
  • Existing completion semantics remain authoritative; scoring is advisory unless an adapter explicitly interprets it later.
  • Core scoring does not encode GitHub, PR, kapi-agent, Discord, or Ragna-specific gate names.
  • Quality hints include reason text that discourages superficial score-passing and preserves evidence-backed completion as the real target.

Verification

  • Run targeted quality hint/scoring tests.
  • Run affected quality-probe/domain tests.
  • Inspect JSON fixtures or snapshots for generic wording and stable reason strings.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions