You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i recommend using a q8_0 model, as it's the most optimized one and tested quantization format
(but i'm a bit afraid of there's a risk that we might already supported mistral model as it's also considered as a llama model, it need a confirmation though)
I'm sorry, I grabbed this issue in #155 , mistral actually did not make difference with llama, but there's a bug in GQA implementation, it works well after this bug get resolved.
i've splitted more model support related tasks in: #157 , we can make some investigations together to make these models work with crabml
No description provided.
The text was updated successfully, but these errors were encountered: