gptq/llama.cpp #346

darkacorn · 2024-01-24T15:13:40Z

darkacorn
Jan 24, 2024

it would be great if the original authors could chip in to help speed up local deployments ..

bnb 4bit is certainly a good start

but to make that available to the masses we would need to have that in a faster quant ( llama.cpp/gptq/awq) any of them will do and build the fundament so we could use that and port that to the other quants formats

i spoken with a few guys ( casper from autoawq / turboderp from exllama/v2) and they state its a huge effort to implement that 50h+

im not that deep on the architecture parts of the vision llms so i cant really judge that

but if we could get some hints / or some help from the authors that would certainly help ALOT

cogvlm is the best vision model we have so far - its just very restrictive in current quant formats for the normal enduser ( no access to a/h 100s)

ggerganov/llama.cpp#4387
there is some demand from the community .. but we really need some help here

cmp-nct · 2024-01-24T17:06:24Z

cmp-nct
Jan 24, 2024

llama.cpp also runs on android and iphone, ggml is being moved into the standard ML libraries of Android from what I've seen.
llava-1.5, Mobile-VLM and Yi-VL are supported in llama.cpp
It would make sense to officially support llama.cpp

0 replies

zRzRzRzRzRzRzR · 2024-04-24T09:37:42Z

zRzRzRzRzRzRzR
Apr 24, 2024
Maintainer

目前不支持

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gptq/llama.cpp #346

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

gptq/llama.cpp #346

darkacorn Jan 24, 2024

Replies: 2 comments

cmp-nct Jan 24, 2024

zRzRzRzRzRzRzR Apr 24, 2024 Maintainer

darkacorn
Jan 24, 2024

cmp-nct
Jan 24, 2024

zRzRzRzRzRzRzR
Apr 24, 2024
Maintainer