-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Ollama models take +50% memory when called by Continue #7583
Copy link
Copy link
Closed as not planned
Labels
area:configurationRelates to configuration optionsRelates to configuration optionside:vscodeRelates specifically to VS Code extensionRelates specifically to VS Code extensionkind:bugIndicates an unexpected problem or unintended behaviorIndicates an unexpected problem or unintended behavioros:linuxHappening specifically on LinuxHappening specifically on Linuxstale
Description
Before submitting your bug report
- I've tried using the "Ask AI" feature on the Continue docs site to see if the docs have an answer
- I believe this is a bug. I'll try to join the Continue Discord for questions
- I'm not able to find an open issue that reports the same bug
- I've seen the troubleshooting guide on the Continue Docs
Relevant environment info
- OS: Arch Linux (EndeavourOS)
- GPU: Radeon 7900XTX 24gb vram
- System: Ryzen 9 9950X3D, 64gb system ram
- Continue version: 1.2.1
- IDE version: VSCode 1.103.2
- Ollama version: 0.11.8
- Model: Any
- config:
%YAML 1.1
---
name: Notes Assistant
version: 1.0.0
schema: v1
context:
- provider: code
- provider: codebase
- provider: currentFile
- provider: diff
- provider: docs
- provider: folder
- provider: open
- provider: problems
- provider: search
- provider: terminal
rules:
- You are a friendly assistant whose purpose is to analyze a Zettelkasten style "second brain" notes collection to do things like find patterns, surface related notes, organize and normalize tags, and other tasks related to taking and curating notes.
- Files should default to markdown, unless otherwise specified
- When asked to suggest tags, do not suggest existing items in the `tags` front matter, unless suggesting changes to any of them.
prompts:
- name: Tag This File
description: Suggest tags for the current file.
prompt: |
@codebase @currentFile Analyze the current file and suggest tags for it.
Look for tags in other files that potentially match and suggest them, so that files can be linked together.
docs:
- name: Foam
startUrl: https://foambubble.github.io/foam/
%YAML 1.1
---
name: Qwen3 Large Context
version: 1.0.0
schema: v1
ollama_provider: &ollama_provider
provider: ollama
capbilities:
- tool_use
models:
- name: Qwen 3 30b
<<: *ollama_provider
model: qwen3:30bDescription
For some reason, whenever I run any model through Continue, it uses roughly half again the amount of vRAM that it's supposed to. This happens regardless of whether I'm in Chat or Agent mode, and regardless of context items or rules added.
- cogito:4b 5.1gb -> 7.5gb
- qwen3:30b 19gb -> 26gb
- command-r:latest 20gb -> 30gb
On the larger models, this pushes them beyond my max vRAM, requiring Ollama to offload some of the work onto the CPU, and drastically slowing down processing.
To reproduce
- Install Ollama with ROCm (or possibly CUDA) support
- Set up any model
- Run any command on that model in Ollama's internal client
- Run
ollama ps-- Size should report the same as the listing on https://ollama.com/models - Stop the client and run
ollama psto ensure the model is unloaded - Run any command in Continue using the same model
- Run
ollama ps-- Size reports roughly +50% for all models. This is particularly obvious with models that fit on the graphics card when using other clients, but causes Ollama to resource split when using Continue.
Log output
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:configurationRelates to configuration optionsRelates to configuration optionside:vscodeRelates specifically to VS Code extensionRelates specifically to VS Code extensionkind:bugIndicates an unexpected problem or unintended behaviorIndicates an unexpected problem or unintended behavioros:linuxHappening specifically on LinuxHappening specifically on Linuxstale
Type
Projects
Status
Done