-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable model loading timeout #4350
Comments
I can try to add this feature. |
/assign |
@ProjectMoon Your problem is that when ollama runs a model that does not exist locally, does pulling it from the registry time out? |
No, it's the loading of the model from disk. There's a hard coded timeout of 10 minutes. I forget the exact file where this is, but the comment above the line says something like "be generous, as long models can take a long time to load." |
An example of why this would be useful (to me): I can load Mixtral 8x7b using the Q2_K quant (smallest available file). It loads in 565 seconds, just under the 10 minute timeout limit. But once it's loaded, it generates at 16 tokens/second. It would be lovely if I could try the higher quants and see what happens. |
The model loading timeout, the time to wait for the llama runner, is hard coded. It would be nice to be able to configure this to increase or decrease it (for me, mostly increase). This would allow experimenting with big models that take forever to load, but might run fine once loaded.
The text was updated successfully, but these errors were encountered: