[quantization] Force tie embedding in example script by mhs4670go · Pull Request #695 · Samsung/TICO

mhs4670go · 2026-05-12T04:53:30Z

This commit forces tie embedding in example script.

I've run PPL evaluations with below command.

python tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py \
  --model /group-volume/Llama-3.2-3B-Instruct --max_seq_len 2048 \
  --linear_weight_bits 4 --verbose --nsamples_for_qcalibration 128 \
  --decode_calibration_steps 8 --gptq_mse smse \
  --embedding_lm_head_weight_bits 4

mse with embedding_lm_head_weight_bits 8-bit : 11.89
smse with embedding_lm_head_weight_bits 8-bit: 12.05
mse with embedding_lm_head_weight_bits 4-bit: 11.96

Related: #624
TICO-DCO-1.0-Signed-off-by: seongwoo mhs4670go@naver.com

This commit forces tie embedding in example script. TICO-DCO-1.0-Signed-off-by: seongwoo <mhs4670go@naver.com>

stamalakhov · 2026-05-12T05:02:46Z

@mhs4670go Thank you!
Have you tried to save full circle model with this PR?

stamalakhov · 2026-05-12T05:04:33Z

In my personal and humble opinion we don't need GPTQ for lm_head.

stamalakhov · 2026-05-12T05:30:39Z

@mhs4670go
8 bits of unshared weights for lm_head/mebeddings can't be saved to circle because of 2Gb restriction.
May be it would be better to have an option to tie weights?

mhs4670go · 2026-05-12T06:01:49Z

In my personal and humble opinion we don't need GPTQ for lm_head.

I thought you introduced lm_head support in GPTQ because it had better accuracy than the one that doesn't apply GPTQ to lm_head.

I'll make applying lm_head optional in next PR then.

mhs4670go · 2026-05-12T06:02:20Z

@mhs4670go 8 bits of unshared weights for lm_head/mebeddings can't be saved to circle because of 2Gb restriction. May be it would be better to have an option to tie weights?

Ah, okay. Let's go with 4-bits as a default one.

stamalakhov · 2026-05-12T06:06:04Z

@mhs4670go 8 bits of unshared weights for lm_head/mebeddings can't be saved to circle because of 2Gb restriction. May be it would be better to have an option to tie weights?

Ah, okay. Let's go with 4-bits as a default one.

@mhs4670go
but what about ppl of 4 bits quantized model?

mhs4670go · 2026-05-12T06:12:40Z

@mhs4670go 8 bits of unshared weights for lm_head/mebeddings can't be saved to circle because of 2Gb restriction. May be it would be better to have an option to tie weights?

Ah, okay. Let's go with 4-bits as a default one.

@mhs4670go but what about ppl of 4 bits quantized model?

mse with embedding_lm_head_weight_bits 4-bit: 11.96

I'm running the evaluation for smse right now. That'll be updated soon.

stamalakhov · 2026-05-12T06:21:48Z

@mhs4670go
But why do we need to tie weights for quantized model?
They can not be shared right now.
Am i missing something?

/cc @Torrero

mhs4670go · 2026-05-12T06:25:02Z

But why do we need to tie weights for quantized model?

Because we'll tie weights in final use case. Llama-3.2 model already comes with tied weights. Before supporting the export with shared weight, we can simulate the tied embedding with this approach.

I've posted a PR for qwen, too.

mse with embedding_lm_head_weight_bits 4-bit: 11.96
smse with embedding_lm_head_weight_bits 4-bit: 12.41

stamalakhov · 2026-05-12T06:32:35Z

Before supporting the export with shared weight, we can simulate the tied embedding with this approach.

@mhs4670go
but this PR disables all other supported options. It's enforcement. Is not it?

mhs4670go · 2026-05-12T06:36:23Z

but this PR disables all other supported options. It's enforcement. Is not it?

IIUC, embedding_weight_bits and lm_head_weight_bits options will be disabled with this PR. Do you think it would be better not to disable them and support tied embedding optionally?

stamalakhov · 2026-05-12T06:41:28Z

Do you think it would be better not to disable them and support tied embedding optionally?

@mhs4670go
Yes.
I believe right now it would be better to have option to tie them. Later when weights sharing will be supported in circle, or in case the task with full model export will cancelled. The option to tie weights will be enforced.
But it's just my opinion.
/cc @Torrero

stamalakhov · 2026-05-12T07:04:43Z

I thought you introduced lm_head support in GPTQ because it had better accuracy than the one that doesn't apply GPTQ to lm_head.

@mhs4670go
Yes.

In case they are untied, lm_head can be quantized using ordinary GPTQ.
In case they are tied, GPTQ for embedding/lm_head with calibrating Hessians for lm_head is wrong. I mean using GPTQ for tied weights is more tricky. and right now it's not supported in TICO.

IMHO

mhs4670go · 2026-05-12T07:15:59Z

In case they are tied, GPTQ for embedding/lm_head with calibrating Hessians for lm_head is wrong. I mean using GPTQ for tied weights is more tricky. and right now it's not supported in TICO.

I got your point. I think we should skip GPTQ on lm head when it's tied. I missed this point. Thank you for the opinion!

Eventually, we will use a model whose embeddings are tied. Therefore, we might as well make current GPTQ on lm head optional.

What's your take on this?

stamalakhov · 2026-05-12T07:30:49Z

What's your take on this?

I'm fine with it.
My concern is that this PR disables fully supported (for now) untied weights and forces partially supported tied weights.
So seems like i'm missing something.
Please proceed without my approval.

stamalakhov · 2026-05-12T07:32:02Z

@mhs4670go
I've changed reviewers, please proceed without my approval.

mhs4670go · 2026-05-12T07:34:12Z

Ah, I think this PR should be reconsidered later when we fully supports tie embedding as you said.

I'll just post a PR for making lm_head GPTQ optional.

stamalakhov · 2026-05-20T08:15:33Z

@mhs4670go
Thank you very much for your effort.
may be this PR could be reopened? I've just tried it for Maykeye/TinyLLama-v0 with explicit weights equalization:

    model.lm_head.weight = model.model.embed_tokens.weight

and the resulting circle had shared weights for both input embedding and lm_head. So that we can use 8 bits for lm_head/embed_tokens quantization safely.
Just a small note:
Some llama models don't use tie_word_embeddings (e.g. Maykeye/TinyLLama-v0), so may be the option to tie them would be helpful for debugging purposes at least?

stamalakhov · 2026-05-20T08:19:28Z

@mhs4670go
I mean #701 is merged, so this PR can be landed as it is, with m.b.

model.lm_head.weight = model.model.embed_tokens.weight

for forced weights equalization.

stamalakhov · 2026-05-20T10:39:52Z

For HuggingFaceTB/SmolLM2-135M it also produced nice result with shared weights.

mhs4670go · 2026-05-21T00:14:22Z

@stamalakhov Thank you for the checks! I wonder it's natural to set them tied even though they are not meant to be tied. Therefore, I'll just a PR that just validates if the model is untied when different bits are given.

[quantization] Force tie embedding in example script

7e2b1fd

This commit forces tie embedding in example script. TICO-DCO-1.0-Signed-off-by: seongwoo <mhs4670go@naver.com>

mhs4670go requested a review from stamalakhov May 12, 2026 04:53

change default number.

23abb3f

mhs4670go mentioned this pull request May 12, 2026

[quantization] Support tied embedding in Qwen3-VL GPTQ #694

Closed

stamalakhov requested review from a team and removed request for stamalakhov May 12, 2026 07:31

mhs4670go closed this May 12, 2026

stamalakhov mentioned this pull request May 12, 2026

Weights sharing #696

Closed

mhs4670go deleted the pro2 branch May 13, 2026 04:01

mhs4670go mentioned this pull request May 21, 2026

[quantization] Add tied embedding bit-width validation #720

Merged

Conversation

mhs4670go commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stamalakhov commented May 12, 2026

Uh oh!

stamalakhov commented May 12, 2026

Uh oh!

stamalakhov commented May 12, 2026

Uh oh!

mhs4670go commented May 12, 2026

Uh oh!

mhs4670go commented May 12, 2026

Uh oh!

stamalakhov commented May 12, 2026

Uh oh!

mhs4670go commented May 12, 2026

Uh oh!

stamalakhov commented May 12, 2026

Uh oh!

mhs4670go commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stamalakhov commented May 12, 2026

Uh oh!

mhs4670go commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stamalakhov commented May 12, 2026

Uh oh!

stamalakhov commented May 12, 2026

Uh oh!

mhs4670go commented May 12, 2026

Uh oh!

stamalakhov commented May 12, 2026

Uh oh!

stamalakhov commented May 12, 2026

Uh oh!

mhs4670go commented May 12, 2026

Uh oh!

stamalakhov commented May 20, 2026

Uh oh!

stamalakhov commented May 20, 2026

Uh oh!

stamalakhov commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhs4670go commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mhs4670go commented May 12, 2026 •

edited

Loading

mhs4670go commented May 12, 2026 •

edited

Loading

mhs4670go commented May 12, 2026 •

edited

Loading

stamalakhov commented May 20, 2026 •

edited

Loading