What problem does this proposed feature solve?
Switching between models that have different context lengths, e.g. Sonnet 3.7 (200 k tokens) vs. Gemini 2.5 Pro (1 M tokens). When one mode uses a high-capacity model and hands off to a lower-capacity model, the API call can fail if the context is too large.
Describe the proposed solution in detail
When there is a mode switch (Auto-approve Roo), and if user is using different models for different modes, Roo should be able to check if there is no conflict in context lengths used. For example if the Architect mode used Gemini 2.5 pro and currently uses 400k tokens of context, then switching to Code mode with Sonnet 3.7 maximum length of 200k, then the API call will fail. The proposed solution in this scenario is to have Architect mode compress the context before switching to Code mode for implementation.
On the other hand, if the Code mode finishes and passes back the context of 150k tokens, no need to compress, as Architect mode has capacity for working with that API call.
This should be a layer before the switch, get the context length previously, check the handoff message, make a decision on compression.
When there is a mode switch (e.g., Auto-approve Roo), Roo should:
- Fetch the length of the outgoing handoff context.
- Lookup the target mode’s model max-token capacity.
- Compare the two values.
- If context > capacity → compress (experimental Roo feature already exists) the context to fit the limit.
- Otherwise, hand off unchanged.
For example:
- Architect mode (Gemini 2.5 Pro, 1 M limit) accumulates 400 k tokens.
- Switch to Code mode (Sonnet 3.7, 200 k limit) → detect overflow → compress to ≤200 k before switching.
- Switch back to Architect mode with 150 k tokens → no compression needed.
Technical considerations or implementation details (optional)
- Add a hook in the mode-switcher to retrieve model metadata (max tokens).
- Integrate experimental Roo Intelligent context condensing.
Describe alternatives considered (if any)
If using Boomerang mode (Orchestrator), you could enforce only one mode active per task to guarantee handoffs—but this reduces concurrency and flexibility, so it’s suboptimal compared to dynamic compression in the same chat/task window
Additional Context & Mockups
No response
Proposal Checklist
Are you interested in implementing this feature if approved?
What problem does this proposed feature solve?
Switching between models that have different context lengths, e.g. Sonnet 3.7 (200 k tokens) vs. Gemini 2.5 Pro (1 M tokens). When one mode uses a high-capacity model and hands off to a lower-capacity model, the API call can fail if the context is too large.
Describe the proposed solution in detail
When there is a mode switch (Auto-approve Roo), and if user is using different models for different modes, Roo should be able to check if there is no conflict in context lengths used. For example if the Architect mode used Gemini 2.5 pro and currently uses 400k tokens of context, then switching to Code mode with Sonnet 3.7 maximum length of 200k, then the API call will fail. The proposed solution in this scenario is to have Architect mode compress the context before switching to Code mode for implementation.
On the other hand, if the Code mode finishes and passes back the context of 150k tokens, no need to compress, as Architect mode has capacity for working with that API call.
This should be a layer before the switch, get the context length previously, check the handoff message, make a decision on compression.
When there is a mode switch (e.g., Auto-approve Roo), Roo should:
For example:
Technical considerations or implementation details (optional)
Describe alternatives considered (if any)
If using Boomerang mode (Orchestrator), you could enforce only one mode active per task to guarantee handoffs—but this reduces concurrency and flexibility, so it’s suboptimal compared to dynamic compression in the same chat/task window
Additional Context & Mockups
No response
Proposal Checklist
Are you interested in implementing this feature if approved?