Switch models and modes context length verification and compression

### What problem does this proposed feature solve?

Switching between models that have different context lengths, e.g. Sonnet 3.7 (200 k tokens) vs. Gemini 2.5 Pro (1 M tokens). When one mode uses a high-capacity model and hands off to a lower-capacity model, the API call can fail if the context is too large.

### Describe the proposed solution in detail

When there is a mode switch (Auto-approve Roo), and if user is using different models for different modes, Roo should be able to check if there is no conflict in context lengths used. For example if the Architect mode used Gemini 2.5 pro and currently uses 400k tokens of context, then switching to Code mode with Sonnet 3.7 maximum length of 200k, then the API call will fail. The proposed solution in this scenario is to have Architect mode compress  the context before switching to Code mode for implementation.
On the other hand, if the Code mode finishes and passes back the context of 150k tokens, no need to compress, as Architect mode has capacity for working with that API call.
This should be a layer before the switch, get the context length previously, check the handoff message, make a decision on compression.

When there is a mode switch (e.g., Auto-approve Roo), Roo should:

1. **Fetch** the length of the outgoing handoff context.  
2. **Lookup** the target mode’s model max-token capacity.  
3. **Compare** the two values.  
4. **If** context > capacity → **compress** (experimental Roo feature already exists) the context to fit the limit.  
5. **Otherwise**, hand off unchanged.

For example:
- **Architect mode** (Gemini 2.5 Pro, 1 M limit) accumulates 400 k tokens.
- **Switch to Code mode** (Sonnet 3.7, 200 k limit) → detect overflow → compress to ≤200 k before switching.
- **Switch back** to Architect mode with 150 k tokens → no compression needed.

### Technical considerations or implementation details (optional)

- Add a hook in the mode-switcher to retrieve model metadata (max tokens).  
- Integrate experimental Roo Intelligent context condensing.

### Describe alternatives considered (if any)

If using Boomerang mode (Orchestrator), you could enforce **only one mode** active per task to guarantee handoffs—but this reduces concurrency and flexibility, so it’s suboptimal compared to dynamic compression in the same chat/task window

### Additional Context & Mockups

_No response_

### Proposal Checklist

- [x] I have searched existing Issues and Discussions to ensure this proposal is not a duplicate.
- [x] This proposal is for a specific, actionable change intended for implementation (not a general idea).
- [x] I understand that this proposal requires review and approval before any development work begins.

### Are you interested in implementing this feature if approved?

- [ ] Yes, I would like to contribute to implementing this feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch models and modes context length verification and compression #4022

What problem does this proposed feature solve?

Describe the proposed solution in detail

Technical considerations or implementation details (optional)

Describe alternatives considered (if any)

Additional Context & Mockups

Proposal Checklist

Are you interested in implementing this feature if approved?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Switch models and modes context length verification and compression #4022

Description

What problem does this proposed feature solve?

Describe the proposed solution in detail

Technical considerations or implementation details (optional)

Describe alternatives considered (if any)

Additional Context & Mockups

Proposal Checklist

Are you interested in implementing this feature if approved?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions