update hybird window full attention #8467

micelvrice · 2024-05-20T01:23:10Z

PR types

PR changes

Description

paddle-bot · 2024-05-20T01:23:15Z

Thanks for your contribution!

CLAassistant · 2024-05-20T01:23:16Z

All committers have signed the CLA.

lugimzzz · 2024-05-27T06:55:30Z

llm/finetune_generation.py

@@ -60,6 +61,33 @@ def docstring_decorator(fn):
    return docstring_decorator


+def tokenizer_fn_dev_redpajama(example, tokenizer, inference_length):


数据处理的逻辑建议写到data.py里面

lugimzzz · 2024-05-27T06:56:40Z

llm/finetune_generation.py

+    return model_input
+
+
+def tokenizer_fn_train_redpajama(example, tokenizer, scaled_max_position_embeddings, model_max_position_embeddings):


model_max_position_embeddings 看起来没有用

lugimzzz · 2024-05-27T06:57:41Z

llm/finetune_generation.py

+        add_special_tokens=True,
+    )
+    ids = tokenized_source["input_ids"]
+    features = {"input_ids": ids, "labels": ids, "position_ids": list(range(len(ids)))}


这里input_ids和labels没有错开一位？

lugimzzz · 2024-05-27T06:58:54Z

llm/finetune_generation.py

@@ -71,6 +99,14 @@ class FinetuneArguments(TrainingArguments):
        default=False,
        metadata={"help": "whether to output logits in distributed status"},
    )
+    use_ssa: Optional[bool] = field(


这块放进argument.py 里TrainingArguments

lugimzzz · 2024-06-06T08:42:46Z

llm/finetune_generation.py

+    origin_forward = paddlenlp.transformers.llama.modeling.LlamaAttention.forward
+
+    # replace llama attention with shift sparse attention
+    if "llama" in model_args.model_name_or_path and training_args.use_ssa:


建议用加载出来的model判断isinstance(model，LlamaForCausalLM)

lugimzzz · 2024-06-06T09:05:18Z

llm/finetune_generation.py

@@ -198,6 +259,11 @@ def main():
        else:
            # NOTE(gongenlei): new add autotuner_benchmark
            model = AutoModelForCausalLM.from_config(model_config, dtype=dtype)
+
+        # set the last layer with full attention
+        if training_args.use_hybird_window_full_attention:


replace这块建议写成一个大的函数，if isinstance(model，LlamaForCausalLM) and training_args.use_ssa:不要分散成多处写

lugimzzz · 2024-06-06T09:07:09Z

llm/finetune_generation.py

-        if train_ds is not None
-        else None
-    )
+    if training_args.use_hybird_window_full_attention:


可以用data_args来标注是"pretrain"还是"instruct_tuning"的数据类型，现在默认的数据类型"instruct_tuning"，长文本的数据算是"pretrain"

lugimzzz · 2024-06-06T09:07:20Z

llm/finetune_generation.py

-        if dev_ds is not None
-        else None
-    )
+    if training_args.use_hybird_window_full_attention:


lugimzzz · 2024-06-06T09:08:44Z

llm/finetune_generation.py

@@ -520,6 +597,17 @@ def compute_metrics_do_generation(eval_preds):
        metrics = compute_metrics_do_generation
    else:
        metrics = compute_metrics
+    if training_args.use_hybird_window_full_attention:
+        data_collator = DataCollatorForSupervisedDataset(tokenizer=tokenizer)


为什么需要单独写一个DataCollatorForSupervisedDataset， DataCollatorForSeq2Seq哪一块不满足需求？

lugimzzz · 2024-06-06T09:16:06Z

llm/llama_attn_replace.py

+import paddle.nn.functional as F
+group_size_ratio = 1/4
+
+def ssa_forward(


这个forward实现建议在LlamaAttention forward基础上实现，现在的实现很可能不能通用适配其他模型，比如这个模型qkv是合并的（self.fuse_attention_qkv为True），另外现在的实现不能使用FA2，FA2能够节省显存提高训练速度。更简单的实现可以考虑修改LlamaAttention 的scaled_dot_product_attention函数

update hybird window full attention

1a3279d

paddle-bot bot added the contributor label May 20, 2024

paddle-bot bot assigned gongel May 20, 2024

micelvrice added 2 commits May 20, 2024 06:35

update llama replace

19ab245

add evaluation for perplexity

8b382a3

lugimzzz reviewed Jun 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update hybird window full attention #8467

update hybird window full attention #8467

micelvrice commented May 20, 2024

paddle-bot bot commented May 20, 2024

CLAassistant commented May 20, 2024 •

edited

Loading

lugimzzz May 27, 2024

lugimzzz May 27, 2024

lugimzzz May 27, 2024

lugimzzz May 27, 2024

lugimzzz Jun 6, 2024

lugimzzz Jun 6, 2024

lugimzzz Jun 6, 2024

lugimzzz Jun 6, 2024

lugimzzz Jun 6, 2024

lugimzzz Jun 6, 2024

		@@ -60,6 +61,33 @@ def docstring_decorator(fn):
		return docstring_decorator


		def tokenizer_fn_dev_redpajama(example, tokenizer, inference_length):

		return model_input


		def tokenizer_fn_train_redpajama(example, tokenizer, scaled_max_position_embeddings, model_max_position_embeddings):

update hybird window full attention #8467

Are you sure you want to change the base?

update hybird window full attention #8467

Conversation

micelvrice commented May 20, 2024

PR types

PR changes

Description

paddle-bot bot commented May 20, 2024

CLAassistant commented May 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CLAassistant commented May 20, 2024 •

edited

Loading