Models remain resident in VRAM after deletion #4443

coder543 · 2024-05-15T02:49:58Z

What is the issue?

I downloaded the wrong model, ran it, realized my mistake, then deleted it, and noticed it was still listed as being present in VRAM according to ollama ps.

$ ollama ps
NAME            ID              SIZE    PROCESSOR       UNTIL
yi:9b-v1.5-q8_0 6ea05582d5ca    10 GB   100% GPU        4 minutes from now
$ ollama rm yi:9b-v1.5-q8_0
deleted 'yi:9b-v1.5-q8_0'
$ ollama ps
NAME            ID              SIZE    PROCESSOR       UNTIL
yi:9b-v1.5-q8_0 6ea05582d5ca    10 GB   100% GPU        4 minutes from now
$ nvidia-smi
Wed May 15 02:48:11 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        Off | 00000000:01:00.0 Off |                  N/A |
|  0%   50C    P8              24W / 420W |   9588MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     63185      C   ...unners/cuda_v11/ollama_llama_server     9582MiB |
+---------------------------------------------------------------------------------------+

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.1.38

The text was updated successfully, but these errors were encountered:

dhiltgen · 2024-05-21T22:26:02Z

As noted in the output of ps it will unload after 5 minutes by default (looks like you had about 4 minutes remaining.) We'll also unload an idle model if we need the VRAM for loading other models automatically, so you can safely pull and run the model you actually wanted to load.

If you upgrade to the latest version, you can use ollama.exe run yi:9b-v1.5-q8_0 --keepalive 0 "" to quickly trigger an unload.

coder543 · 2024-05-21T22:27:56Z

I agree with what you said in general, but it is still surprising behavior, and if you're using the GPU for other things besides ollama... there is a window of opportunity for the surprising behavior to cause an OOM for whatever other application is trying to load a model into VRAM.

There is no valid use case for a deleted model to remain in VRAM, since you cannot use the deleted model. (ollama will either complain or start downloading it again, rather than just using it from VRAM.)

But, it's fine, I guess.

coder543 added the bug Something isn't working label May 15, 2024

dhiltgen closed this as completed May 21, 2024

dhiltgen self-assigned this May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models remain resident in VRAM after deletion #4443

Models remain resident in VRAM after deletion #4443

coder543 commented May 15, 2024 •

edited

dhiltgen commented May 21, 2024

coder543 commented May 21, 2024 •

edited

Models remain resident in VRAM after deletion #4443

Models remain resident in VRAM after deletion #4443

Comments

coder543 commented May 15, 2024 • edited

What is the issue?

OS

GPU

CPU

Ollama version

dhiltgen commented May 21, 2024

coder543 commented May 21, 2024 • edited

coder543 commented May 15, 2024 •

edited

coder543 commented May 21, 2024 •

edited