Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about the quantization #81

Closed
irasin opened this issue May 31, 2023 · 1 comment
Closed

question about the quantization #81

irasin opened this issue May 31, 2023 · 1 comment

Comments

@irasin
Copy link

irasin commented May 31, 2023

How to generate the quantized INT4, INT5 and INT8 model?

Do you use GPTQ/RPTQ or normal per-tensor/per-channel PTQ? For quantized int8 model? Do you use int8 @ int8 -> int32 cublas?

@saharNooby
Copy link
Collaborator

  1. How to quantize: follow README.md.

  2. "Do you use GPTQ/RPTQ": no; maybe they are experimenting with it in upstream ggml, but currently tensors are just split into fixed-size blocks of size 32 and then quantized block-wise.

  3. "Do you use int8 @ int8 -> int32 cublas": don't know... You may check out ggml CUDA code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants