Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable model loading timeout #4350

Closed
ProjectMoon opened this issue May 11, 2024 · 5 comments · Fixed by #4547
Closed

Configurable model loading timeout #4350

ProjectMoon opened this issue May 11, 2024 · 5 comments · Fixed by #4547
Labels
feature request New feature or request

Comments

@ProjectMoon
Copy link

The model loading timeout, the time to wait for the llama runner, is hard coded. It would be nice to be able to configure this to increase or decrease it (for me, mostly increase). This would allow experimenting with big models that take forever to load, but might run fine once loaded.

@ProjectMoon ProjectMoon added the feature request New feature or request label May 11, 2024
@lengrongfu
Copy link

I can try to add this feature.

@lengrongfu
Copy link

/assign

@lengrongfu
Copy link

@ProjectMoon Your problem is that when ollama runs a model that does not exist locally, does pulling it from the registry time out?

@ProjectMoon
Copy link
Author

No, it's the loading of the model from disk. There's a hard coded timeout of 10 minutes. I forget the exact file where this is, but the comment above the line says something like "be generous, as long models can take a long time to load."

@ProjectMoon
Copy link
Author

An example of why this would be useful (to me): I can load Mixtral 8x7b using the Q2_K quant (smallest available file). It loads in 565 seconds, just under the 10 minute timeout limit. But once it's loaded, it generates at 16 tokens/second. It would be lovely if I could try the higher quants and see what happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants