Skip to content

Add modellock, semantic complexity, confidence-based escalation deferral, escalation feedback loop, cost tracking, and LLM-assisted complexity assessment#123

Merged
nmaguiar merged 3 commits intomainfrom
copilot/add-model-lock-parameter
Mar 23, 2026
Merged

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 21, 2026

Six targeted enhancements to the dual-model escalation system in mini-a.js to make model selection smarter, more transparent, and user-controllable.

Issue 1 — modellock parameter

  • New arg modellock ("main" | "lc" | "auto") hard-pins the model tier for the entire session, bypassing all escalation/de-escalation logic
  • Logs a one-time [info] at startup when a lock is active
  • Documented in USAGE.md, CHEATSHEET.md; added to mini-a.yaml args and check validation

Issue 2 — Enhanced _assessGoalComplexity

Return type expanded to { level, score, signals }. New signal sources:

  • Domain keywords: refactor, optimize, migrate, debug, security, etc. (+1 each)
  • Negation/scope modifiers: without, except, only if, unless, etc. (+1 each)
  • Entity count: numeric quantities ("all 50 files"), multiple file paths, multiple URLs
  • Signals logged in verbose mode; tests updated with assertions on score and signals

Issue 3 — Confidence scoring before escalation

  • New _scoreLCResponse(response, recentThoughts)[0,1] based on JSON validity, completeness (thought + action/final_answer), repetition penalty (>80% token overlap), and action specificity
  • When escalation triggers but LC confidence ≥ 0.7, defers by one step; a second consecutive trigger escalates immediately
  • Controlled by lcescalatedefer (boolean, default true)

Issue 4 — Outcome-based escalation feedback loop

  • _escalationHistory: array of { step, reason, resolved, stepsToResolve } entries, marked resolved on de-escalation
  • Adaptive threshold adjustment: ≥3 same-reason escalations all resolved in ≤1 step → raise threshold; resolve rate < 50% → lower it
  • getEscalationStats() returns history + current adaptive thresholds

Issue 5 — Per-step cost tracking and LC budget cap

  • _costTracker accumulates { calls, totalTokens } for lc and main per session
  • lcbudget (default 0 = unlimited): when LC tokens exceed the cap, permanently locks to main with a [warn] log
  • getCostStats() exposes the tracker; verbose mode prints a cost summary at run end
  • Documented in USAGE.md and CHEATSHEET.md

Issue 6 — LLM-assisted complexity assessment (gated)

  • llmcomplexity=true: fires a single LC model call to validate "medium" heuristic results before selecting escalation thresholds
  • Goal text is backslash-escaped before interpolation to reduce prompt injection exposure
# Always use LC model, no escalation
mini-a goal="summarize README" modellock=lc

# Cap LC at 50k tokens, then switch to main
mini-a goal="..." lcbudget=50000

# Defer escalation only when LC is clearly stuck
mini-a goal="..." lcescalatedefer=false
Original prompt

Overview

This PR improves the dynamic model escalation system in mini-a.js (and supporting files) with six targeted enhancements. The goal is to make model selection smarter, more transparent, and more user-controllable.


Issue 1 — Model Lock Flag (modellock)

Requirement: Add a new parameter modellock (string, one of "main", "lc", or unset/"auto") that forces Mini-A to always use a specific model, bypassing all dynamic escalation and de-escalation logic entirely.

  • When modellock=main → always use the main model (OAF_MODEL / model), never escalate down to LC.
  • When modellock=lc → always use the LC model (OAF_LC_MODEL / modellc), never escalate up to main (even on errors or loops).
  • When modellock=auto or unset → current dynamic escalation behaviour (default).

Implementation requirements:

  • Read and validate modellock in the agent initialization (in mini-a.js, where other args like deescalate and lccontextlimit are processed).
  • In every place where the code decides which model to use (i.e. the escalation check, the de-escalation logic, and the initial step-0 model selection), check modellock first and short-circuit if it is set to "main" or "lc".
  • Log a one-time [info] message at startup when modellock is active, e.g. [info] Model lock active: always using lc model.
  • Document modellock in USAGE.md under the Dual-Model Controls section and in CHEATSHEET.md under the Model Parameter table.
  • Add modellock to mini-a-modes.yaml and mini-a.yaml (args/params lists) so it is accepted without warning.

Issue 2 — Semantic / Embedding-Based Complexity Assessment

Requirement: Enhance _assessGoalComplexity (in mini-a.js) with additional semantic signals on top of the existing keyword/token heuristics.

Current heuristics (from docs/OPTIMIZATIONS.md):

Complex: token > 200 OR (multi-step AND conditions) OR (tasks AND token > 150)
Medium:  token > 100 OR multi-step OR multiple tasks
Simple:  Everything else

Improvements to implement:

  1. Domain-complexity keywords: Add a weighted keyword list for domains that inherently require more reasoning (e.g., "refactor", "architect", "migrate", "debug", "security", "optimize", "integrate", "deploy", "test", "validate", "analyze"). Each match adds weight to the complexity score.

  2. Negation & scope modifiers: Detect phrases like "do not", "without", "except", "only if", "unless" — these add conditional complexity.

  3. Entity/file count signals: If the goal mentions multiple file paths, URLs, or numeric ranges (e.g., "all 50 files", "3 services"), treat it as higher complexity.

  4. Clarify the returned object: _assessGoalComplexity should return { level: "simple"|"medium"|"complex", score: <number>, signals: [<string>] } where signals lists the matched heuristics (for transparency in verbose mode).

  5. Verbose output: When verbose=true, log each matched signal, e.g. [info] Complexity signals: multi-step, domain:refactor, conditions.

Update the tests in tests/advancedPlanning.js to assert on the new score and signals fields.


Issue 3 — Confidence Scoring Before Escalation

Requirement: Before escalating from LC to main model, compute a lightweight "confidence score" for the last LC model response and only escalate if the score is below a threshold.

Implementation:

  1. Add a helper _scoreLCResponse(response, context) in mini-a.js that evaluates:

    • JSON validity: Is the response valid JSON? (already triggers fallback — keep, but also feed into score)
    • Completeness: Does the response contain a thought and at least one action or a final_answer?
    • Repetition: Is the thought text nearly identical to a recent previous thought (>80% token overlap)? If so, penalize.
    • Action specificity: Are action parameters non-empty and non-trivially short?
    • Returns a score in [0, 1].
  2. When the escalation trigger fires (consecutive errors, thought loops, etc.), call _scoreLCResponse. If score >= 0.7, defer escalation by one additional step (give the LC model one more chance). Log: [info] LC response confidence 0.82 — deferring escalation by 1 step.

  3. If the deferred step also triggers escalation, escalate immediately regardless of score.

  4. Add a parameter lcescalatedefer (boolean, default true) to enable/disable this one-step deferral.


Issue 4 — Outcome-Based Escalation Feedback Loop

Requirement: Track whether escalation to the main model actually resolved the issue that triggered it, and use this to adaptively tune escalation thresholds over the session.

Implementation:

  1. Maintain a session-level _escalationHistory array on the agent instance. Each entry: { step, reason, resolved: bool, stepsToResolve: number }.

  2. When de-escalating back to LC (after deescalate clean steps), mark the last escalation...

This pull request was created from Copilot chat.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…ing, escalation feedback, cost tracking, LLM-assisted complexity

Co-authored-by: nmaguiar <11761746+nmaguiar@users.noreply.github.com>
Agent-Logs-Url: https://github.com/OpenAF/mini-a/sessions/6301a2f7-8341-4a79-b964-a9a95eb4c6d1
Copilot AI changed the title [WIP] Add model lock flag for dynamic model escalation Add modellock, semantic complexity, confidence-based escalation deferral, escalation feedback loop, cost tracking, and LLM-assisted complexity assessment Mar 21, 2026
Copilot AI requested a review from nmaguiar March 21, 2026 22:43
The modellock parameter description contains quoted values that need proper YAML escaping to avoid parsing issues.
@nmaguiar nmaguiar marked this pull request as ready for review March 23, 2026 04:40
@nmaguiar nmaguiar added the enhancement New feature or request label Mar 23, 2026
@nmaguiar nmaguiar merged commit 9502649 into main Mar 23, 2026
@nmaguiar nmaguiar deleted the copilot/add-model-lock-parameter branch March 23, 2026 18:51
@github-project-automation github-project-automation bot moved this from Backlog to Done in Continuous Enhancement Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Development

Successfully merging this pull request may close these issues.

2 participants