Skip to content

Feature: add objective evaluation and reward-ledger learning loop #172

@devkade

Description

@devkade

Feature: add objective evaluation and reward-ledger learning loop

Parent: #167
Related: #114, #120

Summary

Add ObjectiveFunction, EvaluationResult, and RewardLedger primitives so Ilchul can evaluate execution outcomes against explicit goals and record prediction-vs-actual data for future policy improvement.

This is the core learning loop:

Policy chooses action
Runtime executes action
Evaluation scores result
Reward ledger records outcome
Simulator calibrates predictions
Policy hints improve future selection

ObjectiveFunction

Should include:

  • goal;
  • acceptance criteria;
  • metrics;
  • weights;
  • guardrails;
  • stop conditions;
  • pass threshold;
  • repair threshold.

EvaluationResult

Should include:

  • target type: task, run, integration, or worker;
  • score;
  • verdict: pass, revise, fail, blocked;
  • metric scores;
  • guardrail violations;
  • reasons;
  • required repairs;
  • evidence refs.

RewardLedger

Append-only JSONL record containing:

  • run/task/worker context;
  • selected policy;
  • simulation prediction;
  • actual outcome;
  • reward;
  • penalties;
  • generated policy hints.

Governance connection

This should feed #120's metadata-only harness module registry as evidence, but #120 remains docs-only and must not become runtime plugin authority.

Non-goals

  • No automatic objective-weight mutation without human-approved calibration.
  • No hidden hard-blocking score authority.
  • No runtime plugin/module retirement behavior.
  • No broad workflow API rename.

Acceptance criteria

  • ObjectiveFunction model is documented or implemented.
  • EvaluationResult model supports metric scores, guardrails, reasons, repairs, and evidence refs.
  • RewardLedger JSONL schema is defined.
  • Prediction-vs-actual comparison is represented.
  • Policy hints are recorded separately from authoritative runtime behavior.

Verification

  • Schema/unit tests for objective/evaluation/reward events.
  • Tests that guardrail violations prevent pass verdict even when score is high.
  • Tests that reward events can represent both success and repair/failure outcomes.
  • Documentation states advisory-vs-authority boundary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestneeds-designMulti-PR epic or architectural change, needs human planning

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions