Design: define reward calculation, penalties, PolicyHint, and calibration
Parent: #167
Related: #172, #120
Summary
Define how Ilchul converts EvaluationResult and runtime outcomes into RewardRecord records, penalties, PolicyHint values, and simulator calibration inputs.
Scope
Define:
- reward calculation formula;
- metric-to-reward mapping;
- penalty taxonomy;
- PolicyHint schema;
- prediction-vs-actual comparison;
- calibration data model;
- anti-Goodhart checks;
- human-approved objective-weight calibration flow.
Non-goals
- No automatic objective weight mutation.
- No runtime plugin/module retirement behavior.
- No hidden hard-blocking based on reward alone.
Acceptance criteria
Verification
Design: define reward calculation, penalties, PolicyHint, and calibration
Parent: #167
Related: #172, #120
Summary
Define how Ilchul converts EvaluationResult and runtime outcomes into RewardRecord records, penalties, PolicyHint values, and simulator calibration inputs.
Scope
Define:
Non-goals
Acceptance criteria
docs/runcontract-harness-evaluator.md.Verification