Single precision inference support for the gemma-2B model #75

chenghuaWang · 2024-04-09T07:16:43Z

What's new?

Single precision inference support for the gemma-2B model.

Op Changed

Split

A new SplitOp constructor with each_dims option.

Split(const std::vector<int> &each_dims, Chl split_dim, const std::string &name)

Support operation like(in python API):

qkv.split([q_size, kv_size, kv_size], dim=-1)

RMSNorm

A new RMSNorm constructor with add_unit_offset flag.

RMSNorm(int norm_size, float epsilon, bool add_unit_offset, std::string name)

If add_unit_offset flag is set, RMSNorm will do $output = output \times (1.f + weight)$. RMSNorm in Llama does not have an add_unit_offset operation, it only does $output = output \times weight$.

The differences between Gemma and Llama

Multiply llama's input embeddings by $\sqrt{\text{hidden size}}$ -- gemma calls it normalization and applies to all inputs(be it from vocab or passed directly)
Add 1 to weights of LlamaRMSLayerNorm. Gemma's RMSNorm returns $output \times (1.f + weight)$, llama doesn't add 1.
The token embedding layer's weight is tied with lm_head.
Gemma-2b uses MQA instead of MHA.

…_tokens, currently when quantifying Q4, embd_tokens needs to maintain FP32

chenghuaWang added 9 commits April 3, 2024 17:20

feat: init gemma model

a3cea46

feat: modeling gemma

d4ec7ff

feat: Split any length Support / Gemma modeling

413bd62

feat: gemma model done.

ed2ef6e

feat: gemma bug. token repeat

e079c3f

fix: gemma tokenizer bug

4ea6cc0

fix: rms norm with add_unit_offset

c3142c4

feat: gemma-f32 done

d4adb0c

add: gemma_vocab

140ccb1

lx200916 requested a review from yirongjie April 9, 2024 07:22

yirongjie added 2 commits April 9, 2024 08:47

typo:modle->model

5ce8bb7

fix: Due to the fact that the lm_head of gemma is a transpose of embd…

4812e74

…_tokens, currently when quantifying Q4, embd_tokens needs to maintain FP32

yirongjie approved these changes Apr 9, 2024

View reviewed changes

yirongjie merged commit 55d21fb into UbiquitousLearning:main Apr 9, 2024
1 check passed

chenghuaWang mentioned this pull request May 4, 2024

Support for the QWen1.5-0.5B model #79

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single precision inference support for the gemma-2B model #75

Single precision inference support for the gemma-2B model #75

chenghuaWang commented Apr 9, 2024

Single precision inference support for the gemma-2B model #75

Single precision inference support for the gemma-2B model #75

Conversation

chenghuaWang commented Apr 9, 2024

What's new?

Op Changed

Split

RMSNorm

The differences between Gemma and Llama