Conversation
…ing, escalation feedback, cost tracking, LLM-assisted complexity Co-authored-by: nmaguiar <11761746+nmaguiar@users.noreply.github.com> Agent-Logs-Url: https://github.com/OpenAF/mini-a/sessions/6301a2f7-8341-4a79-b964-a9a95eb4c6d1
Copilot
AI
changed the title
[WIP] Add model lock flag for dynamic model escalation
Add modellock, semantic complexity, confidence-based escalation deferral, escalation feedback loop, cost tracking, and LLM-assisted complexity assessment
Mar 21, 2026
The modellock parameter description contains quoted values that need proper YAML escaping to avoid parsing issues.
nmaguiar
approved these changes
Mar 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Six targeted enhancements to the dual-model escalation system in
mini-a.jsto make model selection smarter, more transparent, and user-controllable.Issue 1 —
modellockparametermodellock("main"|"lc"|"auto") hard-pins the model tier for the entire session, bypassing all escalation/de-escalation logic[info]at startup when a lock is activeUSAGE.md,CHEATSHEET.md; added tomini-a.yamlargs and check validationIssue 2 — Enhanced
_assessGoalComplexityReturn type expanded to
{ level, score, signals }. New signal sources:refactor,optimize,migrate,debug,security, etc. (+1 each)without,except,only if,unless, etc. (+1 each)"all 50 files"), multiple file paths, multiple URLsscoreandsignalsIssue 3 — Confidence scoring before escalation
_scoreLCResponse(response, recentThoughts)→[0,1]based on JSON validity, completeness (thought+action/final_answer), repetition penalty (>80% token overlap), and action specificitylcescalatedefer(boolean, defaulttrue)Issue 4 — Outcome-based escalation feedback loop
_escalationHistory: array of{ step, reason, resolved, stepsToResolve }entries, marked resolved on de-escalationgetEscalationStats()returns history + current adaptive thresholdsIssue 5 — Per-step cost tracking and LC budget cap
_costTrackeraccumulates{ calls, totalTokens }forlcandmainper sessionlcbudget(default0= unlimited): when LC tokens exceed the cap, permanently locks to main with a[warn]loggetCostStats()exposes the tracker; verbose mode prints a cost summary at run endUSAGE.mdandCHEATSHEET.mdIssue 6 — LLM-assisted complexity assessment (gated)
llmcomplexity=true: fires a single LC model call to validate"medium"heuristic results before selecting escalation thresholdsOriginal prompt
Overview
This PR improves the dynamic model escalation system in
mini-a.js(and supporting files) with six targeted enhancements. The goal is to make model selection smarter, more transparent, and more user-controllable.Issue 1 — Model Lock Flag (
modellock)Requirement: Add a new parameter
modellock(string, one of"main","lc", or unset/"auto") that forces Mini-A to always use a specific model, bypassing all dynamic escalation and de-escalation logic entirely.modellock=main→ always use the main model (OAF_MODEL/model), never escalate down to LC.modellock=lc→ always use the LC model (OAF_LC_MODEL/modellc), never escalate up to main (even on errors or loops).modellock=autoor unset → current dynamic escalation behaviour (default).Implementation requirements:
modellockin the agent initialization (inmini-a.js, where other args likedeescalateandlccontextlimitare processed).modellockfirst and short-circuit if it is set to"main"or"lc".[info]message at startup whenmodellockis active, e.g.[info] Model lock active: always using lc model.modellockinUSAGE.mdunder the Dual-Model Controls section and inCHEATSHEET.mdunder the Model Parameter table.modellocktomini-a-modes.yamlandmini-a.yaml(args/params lists) so it is accepted without warning.Issue 2 — Semantic / Embedding-Based Complexity Assessment
Requirement: Enhance
_assessGoalComplexity(inmini-a.js) with additional semantic signals on top of the existing keyword/token heuristics.Current heuristics (from
docs/OPTIMIZATIONS.md):Improvements to implement:
Domain-complexity keywords: Add a weighted keyword list for domains that inherently require more reasoning (e.g.,
"refactor","architect","migrate","debug","security","optimize","integrate","deploy","test","validate","analyze"). Each match adds weight to the complexity score.Negation & scope modifiers: Detect phrases like
"do not","without","except","only if","unless"— these add conditional complexity.Entity/file count signals: If the goal mentions multiple file paths, URLs, or numeric ranges (e.g., "all 50 files", "3 services"), treat it as higher complexity.
Clarify the returned object:
_assessGoalComplexityshould return{ level: "simple"|"medium"|"complex", score: <number>, signals: [<string>] }wheresignalslists the matched heuristics (for transparency in verbose mode).Verbose output: When
verbose=true, log each matched signal, e.g.[info] Complexity signals: multi-step, domain:refactor, conditions.Update the tests in
tests/advancedPlanning.jsto assert on the newscoreandsignalsfields.Issue 3 — Confidence Scoring Before Escalation
Requirement: Before escalating from LC to main model, compute a lightweight "confidence score" for the last LC model response and only escalate if the score is below a threshold.
Implementation:
Add a helper
_scoreLCResponse(response, context)inmini-a.jsthat evaluates:thoughtand at least oneactionor afinal_answer?thoughttext nearly identical to a recent previous thought (>80% token overlap)? If so, penalize.[0, 1].When the escalation trigger fires (consecutive errors, thought loops, etc.), call
_scoreLCResponse. Ifscore >= 0.7, defer escalation by one additional step (give the LC model one more chance). Log:[info] LC response confidence 0.82 — deferring escalation by 1 step.If the deferred step also triggers escalation, escalate immediately regardless of score.
Add a parameter
lcescalatedefer(boolean, defaulttrue) to enable/disable this one-step deferral.Issue 4 — Outcome-Based Escalation Feedback Loop
Requirement: Track whether escalation to the main model actually resolved the issue that triggered it, and use this to adaptively tune escalation thresholds over the session.
Implementation:
Maintain a session-level
_escalationHistoryarray on the agent instance. Each entry:{ step, reason, resolved: bool, stepsToResolve: number }.When de-escalating back to LC (after
deescalateclean steps), mark the last escalation...This pull request was created from Copilot chat.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.