Skip to content

Ollama models take +50% memory when called by Continue #7583

@ShaunaGordon

Description

@ShaunaGordon

Before submitting your bug report

Relevant environment info

- OS: Arch Linux (EndeavourOS)
- GPU: Radeon 7900XTX 24gb vram
- System: Ryzen 9 9950X3D, 64gb system ram
- Continue version: 1.2.1
- IDE version: VSCode 1.103.2
- Ollama version: 0.11.8
- Model: Any
- config:

%YAML 1.1
---
name: Notes Assistant
version: 1.0.0
schema: v1

context:
  - provider: code
  - provider: codebase
  - provider: currentFile
  - provider: diff
  - provider: docs
  - provider: folder
  - provider: open
  - provider: problems
  - provider: search
  - provider: terminal

rules:
  - You are a friendly assistant whose purpose is to analyze a Zettelkasten style "second brain" notes collection to do things like find patterns, surface related notes, organize and normalize tags, and other tasks related to taking and curating notes.
  - Files should default to markdown, unless otherwise specified
  - When asked to suggest tags, do not suggest existing items in the `tags` front matter, unless suggesting changes to any of them.

prompts:
  - name: Tag This File
    description: Suggest tags for the current file.
    prompt: |
      @codebase @currentFile Analyze the current file and suggest tags for it.
      Look for tags in other files that potentially match and suggest them, so that files can be linked together.

docs:
  - name: Foam
    startUrl: https://foambubble.github.io/foam/


%YAML 1.1
---
name: Qwen3 Large Context
version: 1.0.0
schema: v1

ollama_provider: &ollama_provider
  provider: ollama
  capbilities:
    - tool_use

models:
  - name: Qwen 3 30b
    <<: *ollama_provider
    model: qwen3:30b

Description

For some reason, whenever I run any model through Continue, it uses roughly half again the amount of vRAM that it's supposed to. This happens regardless of whether I'm in Chat or Agent mode, and regardless of context items or rules added.

  • cogito:4b 5.1gb -> 7.5gb
  • qwen3:30b 19gb -> 26gb
  • command-r:latest 20gb -> 30gb

On the larger models, this pushes them beyond my max vRAM, requiring Ollama to offload some of the work onto the CPU, and drastically slowing down processing.

To reproduce

  1. Install Ollama with ROCm (or possibly CUDA) support
  2. Set up any model
  3. Run any command on that model in Ollama's internal client
  4. Run ollama ps -- Size should report the same as the listing on https://ollama.com/models
  5. Stop the client and run ollama ps to ensure the model is unloaded
  6. Run any command in Continue using the same model
  7. Run ollama ps -- Size reports roughly +50% for all models. This is particularly obvious with models that fit on the graphics card when using other clients, but causes Ollama to resource split when using Continue.

Log output

Metadata

Metadata

Assignees

Labels

area:configurationRelates to configuration optionside:vscodeRelates specifically to VS Code extensionkind:bugIndicates an unexpected problem or unintended behavioros:linuxHappening specifically on Linuxstale

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions