Add modellock, semantic complexity, confidence-based escalation deferral, escalation feedback loop, cost tracking, and LLM-assisted complexity assessment by Copilot · Pull Request #123 · OpenAF/mini-a

Copilot · 2026-03-21T22:27:16Z

Six targeted enhancements to the dual-model escalation system in mini-a.js to make model selection smarter, more transparent, and user-controllable.

Issue 1 — `modellock` parameter

New arg modellock ("main" | "lc" | "auto") hard-pins the model tier for the entire session, bypassing all escalation/de-escalation logic
Logs a one-time [info] at startup when a lock is active
Documented in USAGE.md, CHEATSHEET.md; added to mini-a.yaml args and check validation

Issue 2 — Enhanced `_assessGoalComplexity`

Return type expanded to { level, score, signals }. New signal sources:

Domain keywords: refactor, optimize, migrate, debug, security, etc. (+1 each)
Negation/scope modifiers: without, except, only if, unless, etc. (+1 each)
Entity count: numeric quantities ("all 50 files"), multiple file paths, multiple URLs
Signals logged in verbose mode; tests updated with assertions on score and signals

Issue 3 — Confidence scoring before escalation

New _scoreLCResponse(response, recentThoughts) → [0,1] based on JSON validity, completeness (thought + action/final_answer), repetition penalty (>80% token overlap), and action specificity
When escalation triggers but LC confidence ≥ 0.7, defers by one step; a second consecutive trigger escalates immediately
Controlled by lcescalatedefer (boolean, default true)

Issue 4 — Outcome-based escalation feedback loop

_escalationHistory: array of { step, reason, resolved, stepsToResolve } entries, marked resolved on de-escalation
Adaptive threshold adjustment: ≥3 same-reason escalations all resolved in ≤1 step → raise threshold; resolve rate < 50% → lower it
getEscalationStats() returns history + current adaptive thresholds

Issue 5 — Per-step cost tracking and LC budget cap

_costTracker accumulates { calls, totalTokens } for lc and main per session
lcbudget (default 0 = unlimited): when LC tokens exceed the cap, permanently locks to main with a [warn] log
getCostStats() exposes the tracker; verbose mode prints a cost summary at run end
Documented in USAGE.md and CHEATSHEET.md

Issue 6 — LLM-assisted complexity assessment (gated)

llmcomplexity=true: fires a single LC model call to validate "medium" heuristic results before selecting escalation thresholds
Goal text is backslash-escaped before interpolation to reduce prompt injection exposure

# Always use LC model, no escalation
mini-a goal="summarize README" modellock=lc

# Cap LC at 50k tokens, then switch to main
mini-a goal="..." lcbudget=50000

# Defer escalation only when LC is clearly stuck
mini-a goal="..." lcescalatedefer=false

Original prompt

Overview

This PR improves the dynamic model escalation system in mini-a.js (and supporting files) with six targeted enhancements. The goal is to make model selection smarter, more transparent, and more user-controllable.

Issue 1 — Model Lock Flag (`modellock`)

Requirement: Add a new parameter modellock (string, one of "main", "lc", or unset/"auto") that forces Mini-A to always use a specific model, bypassing all dynamic escalation and de-escalation logic entirely.

When modellock=main → always use the main model (OAF_MODEL / model), never escalate down to LC.
When modellock=lc → always use the LC model (OAF_LC_MODEL / modellc), never escalate up to main (even on errors or loops).
When modellock=auto or unset → current dynamic escalation behaviour (default).

Implementation requirements:

Read and validate modellock in the agent initialization (in mini-a.js, where other args like deescalate and lccontextlimit are processed).
In every place where the code decides which model to use (i.e. the escalation check, the de-escalation logic, and the initial step-0 model selection), check modellock first and short-circuit if it is set to "main" or "lc".
Log a one-time [info] message at startup when modellock is active, e.g. [info] Model lock active: always using lc model.
Document modellock in USAGE.md under the Dual-Model Controls section and in CHEATSHEET.md under the Model Parameter table.
Add modellock to mini-a-modes.yaml and mini-a.yaml (args/params lists) so it is accepted without warning.

Issue 2 — Semantic / Embedding-Based Complexity Assessment

Requirement: Enhance _assessGoalComplexity (in mini-a.js) with additional semantic signals on top of the existing keyword/token heuristics.

Current heuristics (from docs/OPTIMIZATIONS.md):

Complex: token > 200 OR (multi-step AND conditions) OR (tasks AND token > 150)
Medium:  token > 100 OR multi-step OR multiple tasks
Simple:  Everything else

Improvements to implement:

Domain-complexity keywords: Add a weighted keyword list for domains that inherently require more reasoning (e.g., "refactor", "architect", "migrate", "debug", "security", "optimize", "integrate", "deploy", "test", "validate", "analyze"). Each match adds weight to the complexity score.
Negation & scope modifiers: Detect phrases like "do not", "without", "except", "only if", "unless" — these add conditional complexity.
Entity/file count signals: If the goal mentions multiple file paths, URLs, or numeric ranges (e.g., "all 50 files", "3 services"), treat it as higher complexity.
Clarify the returned object: _assessGoalComplexity should return { level: "simple"|"medium"|"complex", score: <number>, signals: [<string>] } where signals lists the matched heuristics (for transparency in verbose mode).
Verbose output: When verbose=true, log each matched signal, e.g. [info] Complexity signals: multi-step, domain:refactor, conditions.

Update the tests in tests/advancedPlanning.js to assert on the new score and signals fields.

Issue 3 — Confidence Scoring Before Escalation

Requirement: Before escalating from LC to main model, compute a lightweight "confidence score" for the last LC model response and only escalate if the score is below a threshold.

Implementation:

Add a helper _scoreLCResponse(response, context) in mini-a.js that evaluates:
- JSON validity: Is the response valid JSON? (already triggers fallback — keep, but also feed into score)
- Completeness: Does the response contain a thought and at least one action or a final_answer?
- Repetition: Is the thought text nearly identical to a recent previous thought (>80% token overlap)? If so, penalize.
- Action specificity: Are action parameters non-empty and non-trivially short?
- Returns a score in [0, 1].
When the escalation trigger fires (consecutive errors, thought loops, etc.), call _scoreLCResponse. If score >= 0.7, defer escalation by one additional step (give the LC model one more chance). Log: [info] LC response confidence 0.82 — deferring escalation by 1 step.
If the deferred step also triggers escalation, escalate immediately regardless of score.
Add a parameter lcescalatedefer (boolean, default true) to enable/disable this one-step deferral.

Issue 4 — Outcome-Based Escalation Feedback Loop

Requirement: Track whether escalation to the main model actually resolved the issue that triggered it, and use this to adaptively tune escalation thresholds over the session.

Implementation:

Maintain a session-level _escalationHistory array on the agent instance. Each entry: { step, reason, resolved: bool, stepsToResolve: number }.
When de-escalating back to LC (after deescalate clean steps), mark the last escalation...

This pull request was created from Copilot chat.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…ing, escalation feedback, cost tracking, LLM-assisted complexity Co-authored-by: nmaguiar <11761746+nmaguiar@users.noreply.github.com> Agent-Logs-Url: https://github.com/OpenAF/mini-a/sessions/6301a2f7-8341-4a79-b964-a9a95eb4c6d1

The modellock parameter description contains quoted values that need proper YAML escaping to avoid parsing issues.

Initial plan

91fbe97

Copilot AI assigned Copilot and nmaguiar Mar 21, 2026

Copilot started work on behalf of nmaguiar March 21, 2026 22:27 View session

Copilot AI changed the title ~~[WIP] Add model lock flag for dynamic model escalation~~ Add modellock, semantic complexity, confidence-based escalation deferral, escalation feedback loop, cost tracking, and LLM-assisted complexity assessment Mar 21, 2026

Copilot AI requested a review from nmaguiar March 21, 2026 22:43

Copilot finished work on behalf of nmaguiar March 21, 2026 22:43

fix(config): escape quotes in modellock parameter description

63c712a

The modellock parameter description contains quoted values that need proper YAML escaping to avoid parsing issues.

nmaguiar marked this pull request as ready for review March 23, 2026 04:40

nmaguiar added the enhancement New feature or request label Mar 23, 2026

nmaguiar added this to Continuous Enhancement Mar 23, 2026

github-project-automation bot moved this to Backlog in Continuous Enhancement Mar 23, 2026

nmaguiar approved these changes Mar 23, 2026

View reviewed changes

nmaguiar merged commit 9502649 into main Mar 23, 2026

nmaguiar deleted the copilot/add-model-lock-parameter branch March 23, 2026 18:51

github-project-automation bot moved this from Backlog to Done in Continuous Enhancement Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add modellock, semantic complexity, confidence-based escalation deferral, escalation feedback loop, cost tracking, and LLM-assisted complexity assessment#123

Add modellock, semantic complexity, confidence-based escalation deferral, escalation feedback loop, cost tracking, and LLM-assisted complexity assessment#123
nmaguiar merged 3 commits intomainfrom
copilot/add-model-lock-parameter

Copilot AI commented Mar 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue 1 — modellock parameter

Issue 2 — Enhanced _assessGoalComplexity

Issue 3 — Confidence scoring before escalation

Issue 4 — Outcome-based escalation feedback loop

Issue 5 — Per-step cost tracking and LC budget cap

Issue 6 — LLM-assisted complexity assessment (gated)

Overview

Issue 1 — Model Lock Flag (modellock)

Issue 2 — Semantic / Embedding-Based Complexity Assessment

Issue 3 — Confidence Scoring Before Escalation

Issue 4 — Outcome-Based Escalation Feedback Loop

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 21, 2026 •

edited

Loading

Issue 1 — `modellock` parameter

Issue 2 — Enhanced `_assessGoalComplexity`

Issue 1 — Model Lock Flag (`modellock`)