New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

question about the quantization #81

Closed

irasin opened this issue May 31, 2023 · 1 comment

irasin commented May 31, 2023

How to generate the quantized INT4, INT5 and INT8 model?

Do you use GPTQ/RPTQ or normal per-tensor/per-channel PTQ? For quantized int8 model? Do you use int8 @ int8 -> int32 cublas?

Collaborator

saharNooby commented May 31, 2023

How to quantize: follow README.md.
"Do you use GPTQ/RPTQ": no; maybe they are experimenting with it in upstream ggml, but currently tensors are just split into fixed-size blocks of size 32 and then quantized block-wise.
"Do you use int8 @ int8 -> int32 cublas": don't know... You may check out ggml CUDA code.

saharNooby closed this as completed

Cyberhan123 mentioned this issue

add hipBLAS for windows #135

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment