You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I downloaded the wrong model, ran it, realized my mistake, then deleted it, and noticed it was still listed as being present in VRAM according to ollama ps.
$ ollama ps
NAME ID SIZE PROCESSOR UNTIL
yi:9b-v1.5-q8_0 6ea05582d5ca 10 GB 100% GPU 4 minutes from now
$ ollama rm yi:9b-v1.5-q8_0
deleted 'yi:9b-v1.5-q8_0'
$ ollama ps
NAME ID SIZE PROCESSOR UNTIL
yi:9b-v1.5-q8_0 6ea05582d5ca 10 GB 100% GPU 4 minutes from now
$ nvidia-smi
Wed May 15 02:48:11 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 ||-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |||| MIG M. ||=========================================+======================+======================|| 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A || 0% 50C P8 24W / 420W | 9588MiB / 24576MiB | 0% Default |||| N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=======================================================================================|| 0 N/A N/A 63185 C ...unners/cuda_v11/ollama_llama_server 9582MiB |
+---------------------------------------------------------------------------------------+
OS
Linux
GPU
Nvidia
CPU
AMD
Ollama version
0.1.38
The text was updated successfully, but these errors were encountered:
As noted in the output of ps it will unload after 5 minutes by default (looks like you had about 4 minutes remaining.) We'll also unload an idle model if we need the VRAM for loading other models automatically, so you can safely pull and run the model you actually wanted to load.
If you upgrade to the latest version, you can use ollama.exe run yi:9b-v1.5-q8_0 --keepalive 0 "" to quickly trigger an unload.
I agree with what you said in general, but it is still surprising behavior, and if you're using the GPU for other things besides ollama... there is a window of opportunity for the surprising behavior to cause an OOM for whatever other application is trying to load a model into VRAM.
There is no valid use case for a deleted model to remain in VRAM, since you cannot use the deleted model. (ollama will either complain or start downloading it again, rather than just using it from VRAM.)
What is the issue?
I downloaded the wrong model, ran it, realized my mistake, then deleted it, and noticed it was still listed as being present in VRAM according to
ollama ps
.OS
Linux
GPU
Nvidia
CPU
AMD
Ollama version
0.1.38
The text was updated successfully, but these errors were encountered: