Feature: add objective evaluation and reward-ledger learning loop

# Feature: add objective evaluation and reward-ledger learning loop

Parent: #167
Related: #114, #120

## Summary

Add ObjectiveFunction, EvaluationResult, and RewardLedger primitives so Ilchul can evaluate execution outcomes against explicit goals and record prediction-vs-actual data for future policy improvement.

This is the core learning loop:

```text
Policy chooses action
Runtime executes action
Evaluation scores result
Reward ledger records outcome
Simulator calibrates predictions
Policy hints improve future selection
```

## ObjectiveFunction

Should include:

- goal;
- acceptance criteria;
- metrics;
- weights;
- guardrails;
- stop conditions;
- pass threshold;
- repair threshold.

## EvaluationResult

Should include:

- target type: task, run, integration, or worker;
- score;
- verdict: pass, revise, fail, blocked;
- metric scores;
- guardrail violations;
- reasons;
- required repairs;
- evidence refs.

## RewardLedger

Append-only JSONL record containing:

- run/task/worker context;
- selected policy;
- simulation prediction;
- actual outcome;
- reward;
- penalties;
- generated policy hints.

## Governance connection

This should feed #120's metadata-only harness module registry as evidence, but #120 remains docs-only and must not become runtime plugin authority.

## Non-goals

- No automatic objective-weight mutation without human-approved calibration.
- No hidden hard-blocking score authority.
- No runtime plugin/module retirement behavior.
- No broad workflow API rename.

## Acceptance criteria

- [ ] ObjectiveFunction model is documented or implemented.
- [ ] EvaluationResult model supports metric scores, guardrails, reasons, repairs, and evidence refs.
- [ ] RewardLedger JSONL schema is defined.
- [ ] Prediction-vs-actual comparison is represented.
- [ ] Policy hints are recorded separately from authoritative runtime behavior.

## Verification

- Schema/unit tests for objective/evaluation/reward events.
- Tests that guardrail violations prevent pass verdict even when score is high.
- Tests that reward events can represent both success and repair/failure outcomes.
- Documentation states advisory-vs-authority boundary.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: add objective evaluation and reward-ledger learning loop #172

Feature: add objective evaluation and reward-ledger learning loop

Summary

ObjectiveFunction

EvaluationResult

RewardLedger

Governance connection

Non-goals

Acceptance criteria

Verification

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature: add objective evaluation and reward-ledger learning loop #172

Description

Feature: add objective evaluation and reward-ledger learning loop

Summary

ObjectiveFunction

EvaluationResult

RewardLedger

Governance connection

Non-goals

Acceptance criteria

Verification

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions