Skip to content

Design: define reward calculation, penalties, PolicyHint, and calibration #189

@devkade

Description

@devkade

Design: define reward calculation, penalties, PolicyHint, and calibration

Parent: #167
Related: #172, #120

Summary

Define how Ilchul converts EvaluationResult and runtime outcomes into RewardRecord records, penalties, PolicyHint values, and simulator calibration inputs.

Scope

Define:

  • reward calculation formula;
  • metric-to-reward mapping;
  • penalty taxonomy;
  • PolicyHint schema;
  • prediction-vs-actual comparison;
  • calibration data model;
  • anti-Goodhart checks;
  • human-approved objective-weight calibration flow.

Non-goals

  • No automatic objective weight mutation.
  • No runtime plugin/module retirement behavior.
  • No hidden hard-blocking based on reward alone.

Acceptance criteria

  • Reward formula is documented.
  • Penalty taxonomy is documented.
  • Policy hint schema is documented.
  • Calibration flow from prediction-vs-actual is described.
  • Anti-Goodhart checks are tied to docs/runcontract-harness-evaluator.md.
  • Human-approved objective calibration is explicitly required.

Verification

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions