inference support llama3(wint8|4/a8w8) #8630

yuanlehome · 2024-06-19T13:33:19Z

PR types

New features

PR changes

Others

Description

inference support llama3(wint8|4/a8w8)

paddle-bot · 2024-06-19T13:33:24Z

Thanks for your contribution!

codecov · 2024-06-19T14:04:26Z

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 55.80%. Comparing base (65e721e) to head (4821cd6).
Report is 6 commits behind head on develop.

Files	Patch %	Lines
...dlenlp/experimental/transformers/llama/modeling.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #8630   +/-   ##
========================================
  Coverage    55.80%   55.80%           
========================================
  Files          620      620           
  Lines        96642    96642           
========================================
  Hits         53928    53928           
  Misses       42714    42714

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

DesmonDay · 2024-06-25T07:36:55Z

llm/predict/predictor.py

@@ -1213,8 +1214,8 @@ def create_predictor(
    init_chat_template(tokenizer, predictor_args.model_name_or_path, predictor_args.chat_template)

    # TODO(wj-Mcat): fix llama tokenzier pad_token bug
-    if isinstance(tokenizer, LlamaTokenizer) and not tokenizer.pad_token:
-        tokenizer.pad_token = tokenizer.unk_token
+    if (isinstance(tokenizer, LlamaTokenizer) or isinstance(tokenizer, Llama3Tokenizer)) and not tokenizer.pad_token:


这块地方可以简化为：if (isinstance(tokenizer, (LlamaTokenizer, Llama3Tokenizer)) and not tokenizer.pad_token:。isintance支持元组输入。

DesmonDay · 2024-06-25T07:49:19Z

paddlenlp/experimental/transformers/fused_transformer_layers.py

@@ -549,7 +549,7 @@ def init_weight_shape(self, config):
        self.qkv_weight_shape = (
            [(self.num_heads + 2 * self.kv_num_heads) * self.head_dim, self.embed_dim]
            if config.trans_qkvw
-            else [(self.num_heads + 2 * self.kv_num_heads) * self.head_dim, self.embed_dim]
+            else [self.embed_dim, (self.num_heads + 2 * self.kv_num_heads) * self.head_dim]


这块shape为啥前后修改了？

因为之前是错误的

DesmonDay · 2024-06-26T09:13:23Z

如讨论，目前llama3模型，在动态图非fuse场景下推理正常，在fuse场景下推理存在多进程问题。待后续排查。另外动转静时不可以设置src_length进行推理，以及高性能推理下无法正确eos。 @yuanlehome

DesmonDay

LGTM

DrownFish19 mentioned this pull request Jun 20, 2024

[Bug]: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token #8603

Open

1 task

yuanlehome force-pushed the support_llama3_inferernce branch from e40bdf1 to a81c16d Compare June 21, 2024 04:55

DesmonDay reviewed Jun 25, 2024

View reviewed changes

yuanlehome closed this Jun 25, 2024

yuanlehome force-pushed the support_llama3_inferernce branch from 833a8a7 to 7130c18 Compare June 25, 2024 14:08

inference support llama3

d7c2040

yuanlehome reopened this Jun 25, 2024

fix

4821cd6

DesmonDay approved these changes Jun 26, 2024

View reviewed changes

sijunhe merged commit faabf87 into PaddlePaddle:develop Jun 27, 2024
8 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference support llama3(wint8|4/a8w8) #8630

inference support llama3(wint8|4/a8w8) #8630

yuanlehome commented Jun 19, 2024

paddle-bot bot commented Jun 19, 2024

codecov bot commented Jun 19, 2024 •

edited

Loading

DesmonDay Jun 25, 2024

yuanlehome Jun 25, 2024

DesmonDay Jun 25, 2024

yuanlehome Jun 25, 2024

DesmonDay commented Jun 26, 2024 •

edited

Loading

DesmonDay left a comment

inference support llama3(wint8|4/a8w8) #8630

inference support llama3(wint8|4/a8w8) #8630

Conversation

yuanlehome commented Jun 19, 2024

PR types

PR changes

Description

paddle-bot bot commented Jun 19, 2024

codecov bot commented Jun 19, 2024 • edited Loading

Codecov Report

DesmonDay Jun 25, 2024

Choose a reason for hiding this comment

yuanlehome Jun 25, 2024

Choose a reason for hiding this comment

DesmonDay Jun 25, 2024

Choose a reason for hiding this comment

yuanlehome Jun 25, 2024

Choose a reason for hiding this comment

DesmonDay commented Jun 26, 2024 • edited Loading

DesmonDay left a comment

Choose a reason for hiding this comment

codecov bot commented Jun 19, 2024 •

edited

Loading

DesmonDay commented Jun 26, 2024 •

edited

Loading