You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have already downloaded qwen:7b, but when i run ollama run qwen:7b,got this error Error: timed out waiting for llama runner to start:, in the server.log have this msg gpu VRAM usage didn't recover within timeout
OS
Windows
GPU
Nvidia
CPU
Intel
Ollama version
ollama version is 0.1.37
The text was updated successfully, but these errors were encountered:
I also get it (with smaller Params) when running RTX 2080 TI OR GTX 1060 with codeqwen:chat and codegamma:instruct for Win10.
It stays in RAM but will have to copy to GPU RAM everytime after one chat POST.
The GPU RAM is not exceeding on them, so not sure why it timesout every time.
@pamanseau from the logs you shared, it looks like the client gave up before the model finished loading, and since the client request was canceled, we canceled the loading of the model. Are you using our CLI, or are you calling the API? If you're calling the API, what timeout are you setting in your client?
What is the issue?
I have already downloaded qwen:7b, but when i run
ollama run qwen:7b
,got this errorError: timed out waiting for llama runner to start:
, in the server.log have this msggpu VRAM usage didn't recover within timeout
OS
Windows
GPU
Nvidia
CPU
Intel
Ollama version
ollama version is 0.1.37
The text was updated successfully, but these errors were encountered: