Skip to content

Gate evolution: detect and flag overly restrictive scoring gates #4

@poofeth

Description

@poofeth

Problem

Hard gates in the judge are fixed at lab initialization and never questioned by the framework. In practice, a gate can systematically block profitable experiments for many cycles until a human manually intervenes. The framework should detect when a gate is the dominant rejection cause and surface it for review.

In a real deployment, a single gate accounted for >30% of all rejections and blocked a strategy with +13.6% return for 6 cycles. The human eventually removed it, which was the single most impactful scoring change of the entire session.

Proposal

Track gate failure frequency in branch_beliefs.json at the lab level:

{
  "gate_failure_counts": {
    "G1_min_entries": 3,
    "G4_custom_gate": 8,
    "G8_walk_forward": 12
  },
  "gate_binding_rate": {
    "G4_custom_gate": 0.38
  }
}

Automatic detection

After each scoring cycle, update gate failure counts. When a single gate accounts for >40% of rejections across the lab (minimum 10 rejections):

  1. Flag in handoff: "Gate {G} is the binding constraint on {N}/{M} rejections ({pct}%). Experiments blocked by this gate had average metric={X}. Consider whether this gate is appropriate."

  2. Propose diagnostic: "Run the top 3 G-rejected experiments with the gate disabled. If aggregate PnL/metric is positive, the gate is too restrictive."

  3. On human checkpoint cycles: Explicitly surface: "Gate {G} has blocked {N} experiments. Recommendation: {relax/remove/keep with justification}."

Optional auto-relaxation

For labs that opt in:

gate_evolution:
  auto_relax: true
  binding_threshold: 0.40  # fraction of rejections from one gate
  min_rejections: 10
  relaxation_factor: 0.80  # multiply threshold by 0.8
  max_relaxations: 2  # per gate

Why this matters

Gates should protect against bad strategies, not block good ones. A gate designed for one domain or phase of research may become the bottleneck as the lab evolves. The framework should surface this automatically rather than requiring human intuition to diagnose why nothing is getting promoted.

Relationship to existing features

  • Fits naturally into Step 7 (Update State) as an additional check
  • Complements the frame_challenge meta-branch which asks "are we measuring the right thing?"
  • Gate relaxation proposals are a type of meta experiment (0 budget cost)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions