Ollama models take +50% memory when called by Continue

### Before submitting your bug report

- [x] I've tried using the "Ask AI" feature on the [Continue docs site](https://docs.continue.dev/) to see if the docs have an answer
- [x] I believe this is a bug. I'll try to join the [Continue Discord](https://discord.gg/NWtdYexhMs) for questions
- [x] I'm not able to find an [open issue](https://github.com/continuedev/continue/issues?q=is%3Aopen+is%3Aissue) that reports the same bug
- [x] I've seen the [troubleshooting guide](https://docs.continue.dev/troubleshooting) on the Continue Docs

### Relevant environment info

```Markdown
- OS: Arch Linux (EndeavourOS)
- GPU: Radeon 7900XTX 24gb vram
- System: Ryzen 9 9950X3D, 64gb system ram
- Continue version: 1.2.1
- IDE version: VSCode 1.103.2
- Ollama version: 0.11.8
- Model: Any
- config:

%YAML 1.1
---
name: Notes Assistant
version: 1.0.0
schema: v1

context:
  - provider: code
  - provider: codebase
  - provider: currentFile
  - provider: diff
  - provider: docs
  - provider: folder
  - provider: open
  - provider: problems
  - provider: search
  - provider: terminal

rules:
  - You are a friendly assistant whose purpose is to analyze a Zettelkasten style "second brain" notes collection to do things like find patterns, surface related notes, organize and normalize tags, and other tasks related to taking and curating notes.
  - Files should default to markdown, unless otherwise specified
  - When asked to suggest tags, do not suggest existing items in the `tags` front matter, unless suggesting changes to any of them.

prompts:
  - name: Tag This File
    description: Suggest tags for the current file.
    prompt: |
      @codebase @currentFile Analyze the current file and suggest tags for it.
      Look for tags in other files that potentially match and suggest them, so that files can be linked together.

docs:
  - name: Foam
    startUrl: https://foambubble.github.io/foam/


%YAML 1.1
---
name: Qwen3 Large Context
version: 1.0.0
schema: v1

ollama_provider: &ollama_provider
  provider: ollama
  capbilities:
    - tool_use

models:
  - name: Qwen 3 30b
    <<: *ollama_provider
    model: qwen3:30b
```

### Description

For some reason, whenever I run any model through Continue, it uses roughly half again the amount of vRAM that it's supposed to. This happens regardless of whether I'm in Chat or Agent mode, and regardless of context items or rules added.

- cogito:4b 5.1gb -> 7.5gb
- qwen3:30b 19gb -> 26gb
- command-r:latest 20gb -> 30gb

On the larger models, this pushes them beyond my max vRAM, requiring Ollama to offload some of the work onto the CPU, and drastically slowing down processing. 

### To reproduce

1. Install Ollama with ROCm (or possibly CUDA) support
2. Set up any model
3. Run any command on that model in Ollama's internal client
4. Run `ollama ps` -- Size should report the same as the listing on https://ollama.com/models
5. Stop the client and run `ollama ps` to ensure the model is unloaded
6. Run any command in Continue using the same model
7. Run `ollama ps` -- Size reports roughly +50% for all models. This is particularly obvious with models that fit on the graphics card when using other clients, but causes Ollama to resource split when using Continue.

### Log output

```Shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ollama models take +50% memory when called by Continue #7583

Before submitting your bug report

Relevant environment info

Description

To reproduce

Log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ollama models take +50% memory when called by Continue #7583

Description

Before submitting your bug report

Relevant environment info

Description

To reproduce

Log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions