Design: define policy simulator features, estimators, and exploration safety
Parent: #167
Related: #171, #172
Summary
Define how the PolicySimulator estimates the impact of candidate execution policies before dispatch and how it safely chooses exploratory policies for learning.
Scope
Define simulator inputs:
- objective metrics and weights;
- task complexity signals;
- expected touched files/modules;
- dependency depth;
- worker count;
- adapter mix;
- isolation mode;
- verification depth;
- historical strategy stats;
- recent RewardRecord calibration.
Define estimators:
- conflict risk;
- regression risk;
- repair likelihood;
- elapsed/tool cost;
- review burden;
- learning value;
- confidence.
Non-goals
- No neural model requirement.
- No agent launch during simulation.
- No hard-blocking score authority from simulation alone.
Acceptance criteria
Verification
Design: define policy simulator features, estimators, and exploration safety
Parent: #167
Related: #171, #172
Summary
Define how the PolicySimulator estimates the impact of candidate execution policies before dispatch and how it safely chooses exploratory policies for learning.
Scope
Define simulator inputs:
Define estimators:
Non-goals
Acceptance criteria
Verification