Skip to content

Design: define policy simulator features, estimators, and exploration safety #187

@devkade

Description

@devkade

Design: define policy simulator features, estimators, and exploration safety

Parent: #167
Related: #171, #172

Summary

Define how the PolicySimulator estimates the impact of candidate execution policies before dispatch and how it safely chooses exploratory policies for learning.

Scope

Define simulator inputs:

  • objective metrics and weights;
  • task complexity signals;
  • expected touched files/modules;
  • dependency depth;
  • worker count;
  • adapter mix;
  • isolation mode;
  • verification depth;
  • historical strategy stats;
  • recent RewardRecord calibration.

Define estimators:

  • conflict risk;
  • regression risk;
  • repair likelihood;
  • elapsed/tool cost;
  • review burden;
  • learning value;
  • confidence.

Non-goals

  • No neural model requirement.
  • No agent launch during simulation.
  • No hard-blocking score authority from simulation alone.

Acceptance criteria

  • Simulator input feature set is documented.
  • Estimator outputs and ranges are defined.
  • Exploration policy and safety caps are defined.
  • Human override representation is defined.
  • Prediction ids are linked to RewardRecord calibration.

Verification

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions