Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU layer control / prioritisation #4433

Open
AncientMystic opened this issue May 14, 2024 · 0 comments
Open

GPU layer control / prioritisation #4433

AncientMystic opened this issue May 14, 2024 · 0 comments
Labels
feature request New feature or request

Comments

@AncientMystic
Copy link

Would it be possible to add into the configuration of ollama something similar to LM studio to control the gpu utilisation?

Also would it be possible to fine tune ollama to somehow only load certain layers to the gpu similar to unsloth?

Possibly a way to load accessed and adjacent layers maybe with configuration on how many adjacent layers/how much of the model to load at once and either offload the unused layers to ram or not load them at all and just swap out loading layers when needed instead of just loading the entire model every time

Could maybe have add it as lazy loading or something to enable the usage of larger models at higher performance

It seems to have a significant performance advantage especially on lower hardware for those of us without extreme setups if possible within ollama at least

@AncientMystic AncientMystic added the feature request New feature or request label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant