Replies: 2 comments
-
llama.cpp also runs on android and iphone, ggml is being moved into the standard ML libraries of Android from what I've seen. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
it would be great if the original authors could chip in to help speed up local deployments ..
bnb 4bit is certainly a good start
but to make that available to the masses we would need to have that in a faster quant ( llama.cpp/gptq/awq) any of them will do and build the fundament so we could use that and port that to the other quants formats
i spoken with a few guys ( casper from autoawq / turboderp from exllama/v2) and they state its a huge effort to implement that 50h+
im not that deep on the architecture parts of the vision llms so i cant really judge that
but if we could get some hints / or some help from the authors that would certainly help ALOT
cogvlm is the best vision model we have so far - its just very restrictive in current quant formats for the normal enduser ( no access to a/h 100s)
ggerganov/llama.cpp#4387
there is some demand from the community .. but we really need some help here
Beta Was this translation helpful? Give feedback.
All reactions