Use GemmaAttention for Gemma #72

qihqi · 2024-05-09T16:16:04Z

This way it produces more accurate results (with EOS)

{'rouge1': 36.9881, 'rouge2': 13.3464, 'rougeL': 21.7437, 'rougeLsum': 35.1489, 'gen_len': 1295948, 'gen_num': 1000}

This way it produces more accurate results

lsy323 · 2024-05-09T16:37:00Z

convert_checkpoints.py

+
    if new_key != key:
      state_dict[new_key] = state_dict.pop(key)
+  output_ckpt_dir.mkdir(parents=True, exist_ok=True)


This shouldn't be needed. Output folder will be created in _export_to_local https://github.com/google/jetstream-pytorch/blob/811d718c1f93e5ce37182e2c1ec54d3dc0b4aed7/convert_checkpoints.py#L355

lsy323 · 2024-05-09T16:37:11Z

convert_checkpoints.py

  ]
  model_config = json.loads((input_ckpt_dir / "config.json").read_text())
  for key in list(state_dict.keys()):
+    print(key)


FanhaiLu1 · 2024-05-09T17:15:32Z

default_shardings/gemma.yaml


-freqs_cis : -1 #  torch.complex64 (16384, 128)
+freqs_cis : null #  torch.complex64 (16384, 128)
+layers.*.self_attn.qkv_proj.weight: 0


Is this one only for test purpose? In gemma model, I saw the code directly read wq,wk and wv.

FanhaiLu1 · 2024-05-09T17:18:31Z

jetstream_pt/layers.py

-        self.env.apply_sharding(output, axis=2)
-      return self.wo(output)
+    output = self.attention_kernel(xq, xk, xv, mask, cache)
+    output = output.transpose(-3, -2).contiguous().view(bsz, seqlen, -1)


Nice, code is cleaner with refactoring the attention kernel.

FanhaiLu1 · 2024-05-09T17:21:43Z

jetstream_pt/third_party/gemma/model.py

+  return x_out
+
+
+class GemmaAttention(nn.Module):


In long term, we might need to extend from Attention class, large percentage of code are similar.

Checkpoint on gemma

9b89904

qihqi force-pushed the hanq_add_model branch 2 times, most recently from ab0c882 to 5db75d6 Compare May 9, 2024 16:19

Change Gemma to use Gemma Attention from model_original

3fcd49d

This way it produces more accurate results

qihqi force-pushed the hanq_add_model branch from 5db75d6 to 3fcd49d Compare May 9, 2024 16:20

qihqi requested review from FanhaiLu1 and lsy323 May 9, 2024 16:21

lsy323 approved these changes May 9, 2024

View reviewed changes

FanhaiLu1 approved these changes May 9, 2024

View reviewed changes

comments

a9fe13e

qihqi force-pushed the hanq_add_model branch from 97a44d5 to a9fe13e Compare May 9, 2024 18:09

Add doc

827d464

qihqi merged commit 57eb0e1 into main May 9, 2024

qihqi deleted the hanq_add_model branch May 9, 2024 18:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use GemmaAttention for Gemma #72

Use GemmaAttention for Gemma #72

Uh oh!

qihqi commented May 9, 2024

Uh oh!

lsy323 May 9, 2024

Uh oh!

qihqi May 9, 2024

Uh oh!

lsy323 May 9, 2024

Uh oh!

qihqi May 9, 2024

Uh oh!

FanhaiLu1 May 9, 2024

Uh oh!

qihqi May 9, 2024

Uh oh!

FanhaiLu1 May 9, 2024

Uh oh!

FanhaiLu1 May 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Use GemmaAttention for Gemma #72

Use GemmaAttention for Gemma #72

Uh oh!

Conversation

qihqi commented May 9, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants