You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Would it be possible to add into the configuration of ollama something similar to LM studio to control the gpu utilisation?
Also would it be possible to fine tune ollama to somehow only load certain layers to the gpu similar to unsloth?
Possibly a way to load accessed and adjacent layers maybe with configuration on how many adjacent layers/how much of the model to load at once and either offload the unused layers to ram or not load them at all and just swap out loading layers when needed instead of just loading the entire model every time
Could maybe have add it as lazy loading or something to enable the usage of larger models at higher performance
It seems to have a significant performance advantage especially on lower hardware for those of us without extreme setups if possible within ollama at least
The text was updated successfully, but these errors were encountered:
Would it be possible to add into the configuration of ollama something similar to LM studio to control the gpu utilisation?
Also would it be possible to fine tune ollama to somehow only load certain layers to the gpu similar to unsloth?
Possibly a way to load accessed and adjacent layers maybe with configuration on how many adjacent layers/how much of the model to load at once and either offload the unused layers to ram or not load them at all and just swap out loading layers when needed instead of just loading the entire model every time
Could maybe have add it as lazy loading or something to enable the usage of larger models at higher performance
It seems to have a significant performance advantage especially on lower hardware for those of us without extreme setups if possible within ollama at least
The text was updated successfully, but these errors were encountered: