[quantization] Force tie embedding in example script#695
Conversation
This commit forces tie embedding in example script. TICO-DCO-1.0-Signed-off-by: seongwoo <mhs4670go@naver.com>
|
@mhs4670go Thank you! |
|
In my personal and humble opinion we don't need GPTQ for |
|
@mhs4670go |
I thought you introduced I'll make applying |
Ah, okay. Let's go with 4-bits as a default one. |
@mhs4670go |
mse with embedding_lm_head_weight_bits 4-bit: 11.96 I'm running the evaluation for smse right now. That'll be updated soon. |
|
@mhs4670go /cc @Torrero |
Because we'll tie weights in final use case. Llama-3.2 model already comes with tied weights. Before supporting the export with shared weight, we can simulate the tied embedding with this approach. I've posted a PR for qwen, too. mse with embedding_lm_head_weight_bits 4-bit: 11.96 |
@mhs4670go |
IIUC, |
@mhs4670go |
@mhs4670go
IMHO |
I got your point. I think we should skip GPTQ on lm head when it's tied. I missed this point. Thank you for the opinion! Eventually, we will use a model whose embeddings are tied. Therefore, we might as well make current GPTQ on lm head optional. What's your take on this? |
I'm fine with it. |
|
@mhs4670go |
|
Ah, I think this PR should be reconsidered later when we fully supports tie embedding as you said. I'll just post a PR for making lm_head GPTQ optional. |
|
@mhs4670go and the resulting circle had shared weights for both input |
|
@mhs4670go for forced weights equalization. |
|
For |
|
@stamalakhov Thank you for the checks! I wonder it's natural to set them tied even though they are not meant to be tied. Therefore, I'll just a PR that just validates if the model is untied when different bits are given. |
This commit forces tie embedding in example script.
I've run PPL evaluations with below command.
python tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py \ --model /group-volume/Llama-3.2-3B-Instruct --max_seq_len 2048 \ --linear_weight_bits 4 --verbose --nsamples_for_qcalibration 128 \ --decode_calibration_steps 8 --gptq_mse smse \ --embedding_lm_head_weight_bits 4 mse with embedding_lm_head_weight_bits 8-bit : 11.89 smse with embedding_lm_head_weight_bits 8-bit: 12.05 mse with embedding_lm_head_weight_bits 4-bit: 11.96Related: #624
TICO-DCO-1.0-Signed-off-by: seongwoo mhs4670go@naver.com