[Infra] support dynamic_insert in dynamic run #10387

ckl117 · 2025-04-10T14:34:27Z

Before submitting

Lint code. If there are lint issues, please format the code first.

# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py

Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

New features

PR changes

APIs

Description

动态图支持动态插入、流式输出
动态图部分自定义算子使用pybind(通过上下文管理和装饰器来控制，有些单测即跑动态图又跑动转静)，后续支持sm90相应自定义算子
多进程消息队列隔离，给save_output算子增加输入queue_id接收pid

示例

device=4,5,6,7
model_path=deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

src_length=1024
min_length=1
max_length=1024
total_max_length=$((src_length + max_length))
max_batch_size=24
total_request_num=32

python -m paddle.distributed.launch \
    --log_dir dy_insert_logs \
    --gpus ${device} \
    llm/predict/predictor.py \
    --model_name_or_path ${model_path} \
    --mode dynamic \
    --dtype bfloat16 \
    --inference_model \
    --append_attn \
    --src_length ${src_length} \
    --min_length ${min_length} \
    --max_length ${min_length} \
    --total_max_length ${total_max_length} \
    --top_p 0.95 \
    --temperature 0.7 \
    --repetition_penalty 1.0 \
    --batch_size ${max_batch_size} \
    --total_request_num ${total_request_num} \
    --dynamic_insert

paddle-bot · 2025-04-10T14:34:32Z

Thanks for your contribution!

codecov · 2025-04-10T15:10:24Z

Codecov Report

Attention: Patch coverage is 13.95349% with 74 lines in your changes missing coverage. Please review.

Project coverage is 48.99%. Comparing base (8521f02) to head (b7748be).
Report is 186 commits behind head on develop.

Files with missing lines	Patch %	Lines
paddlenlp/utils/import_utils.py	25.53%	35 Missing ⚠️
paddlenlp/trl/llm_utils.py	0.00%	32 Missing ⚠️
...enlp/experimental/transformers/generation_utils.py	0.00%	6 Missing ⚠️
...erimental/transformers/fused_transformer_layers.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop   #10387      +/-   ##
===========================================
- Coverage    49.01%   48.99%   -0.03%     
===========================================
  Files          765      765              
  Lines       125879   125971      +92     
===========================================
+ Hits         61699    61716      +17     
- Misses       64180    64255      +75

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

DrownFish19 · 2025-04-11T08:27:17Z

csrc/gpu/ops_pybind.cu

+  m.def("f_multi_head_latent_attention", &MultiHeadLatentAttention, "MultiHeadLatentAttention");
+  m.def("f_noaux_tc", &NoauxTc, "NoauxTc");
+  m.def("f_get_block_shape_and_split_kv_block", &GetBlockShapeAndSplitKVBlock, "GetBlockShapeAndSplitKVBlock");
+  m.def("f_prefill_mla_write_cache"å, &PrefillMLAWriteCacheKernel, "PrefillMLAWriteCacheKernel");


这里有个特殊符号，需要删除

DrownFish19

LGTM

vivienfanghuagood · 2025-04-15T03:17:29Z

llm/predict/predictor.py

+_original_import = builtins.__import__
+
+
+def custom_import(name, *args, **kwargs):


这个custom_import 需要针对动态图和静态图兼容一下

增加上下文管理和装饰器，在动态图推理时才会使用pybind

…nto dy_insert

yuanlehome · 2025-04-17T08:34:48Z

llm/predict/predictor.py

+_original_import = builtins.__import__
+_imported_modules = {}
+_paddlenlp_ops_updated = False
+_original_attributes = {}
+pybind_ops_list = [
+    "update_inputs_v2",
+    "save_output",
+    "set_preids_token_penalty_multi_scores",
+    "rebuild_padding_v2",
+    "append_attention",
+    "save_output_dygraph",
+]
+
+
+def custom_import(name, *args, **kwargs):
+    global _paddlenlp_ops_updated, _imported_modules, _original_attributes
+    global pybind_ops_list
+
+    if _paddlenlp_ops_updated:
+        if name in _imported_modules:
+            return _imported_modules[name]
+
+    module = _original_import(name, *args, **kwargs)
+
+    if not _paddlenlp_ops_updated and os.getenv("USE_PYBIND", "1").lower() in ["1", "true", "t", "yes", "y"]:
+        if name == "paddlenlp_ops":
+            logger.info("Using Pybind paddlenlp_ops!")
+
+            if name not in _original_attributes:
+                bak_dict = {}
+                for ops_name in pybind_ops_list:
+                    bak_dict[ops_name] = getattr(module, ops_name, None)
+                _original_attributes[name] = bak_dict
+
+            for ops_name in pybind_ops_list:
+                pybind_ops_name = f"f_{ops_name}"
+                if hasattr(module, pybind_ops_name):
+                    setattr(module, ops_name, getattr(module, pybind_ops_name))
+
+            _paddlenlp_ops_updated = True
+
+    _imported_modules[name] = module
+    return module
+
+
+@contextmanager
+def dynamic_graph_pybind_context():
+    global _original_import, _paddlenlp_ops_updated
+    original_import = builtins.__import__
+
+    try:
+        builtins.__import__ = custom_import
+        yield
+    finally:
+        builtins.__import__ = original_import
+
+        if "paddlenlp_ops" in _original_attributes:
+            paddlenlp_ops_module = sys.modules.get("paddlenlp_ops")
+            if paddlenlp_ops_module:
+                for attr, value in _original_attributes["paddlenlp_ops"].items():
+                    setattr(paddlenlp_ops_module, attr, value)
+                _paddlenlp_ops_updated = False
+
+
+def auto_dynamic_graph_pybind(func):
+    @functools.wraps(func)
+    def wrapper(self, *args, **kwargs):
+        with dynamic_graph_pybind_context():
+            return func(self, *args, **kwargs)
+
+    return wrapper
+


这块代码放在这个文件放这里是不是不太合适？

已移动到paddlenlp/utils/import_utils.py

yuanlehome · 2025-04-17T08:35:18Z

llm/predict/predictor.py

+
+    module = _original_import(name, *args, **kwargs)
+
+    if not _paddlenlp_ops_updated and os.getenv("USE_PYBIND", "1").lower() in ["1", "true", "t", "yes", "y"]:


USE_PYBIND这个环境变量名字取得是不是太随意了？

已改为USE_PYBIND_CUSTOM_OPS

改为DYNAMIC_INFERENCE_MODE吧，之后动态图推理估计还有一些特殊逻辑，统一用这个

llm/predict/predictor.py

yuanlehome · 2025-04-17T08:40:20Z

llm/predict/predictor.py

+            self.update_model_inputs("temperature", old_config.temperature)
+        self.config = old_config
+
+    def insert(self, pos, task_id):


insert这个成员函数名字取得也太随意了

改为insert_task

yuanlehome · 2025-04-17T08:41:05Z

llm/predict/predictor.py

+    dynamic_insert: bool = field(default=False, metadata={"help": "whether use dynamic insert"})
+    total_request_num: int = field(default=None, metadata={"help": "The total number of request data"})
+    init_cache_kvs: bool = field(default=True, metadata={"help": "whether init cache_kvs"})


新增init_cache_kvs是必须的吗？

yuanlehome · 2025-04-17T08:45:47Z

csrc/gpu/cpp_extensions.cu

+  m.def("f_save_output_dygraph", &SaveOutputDygraph, "SaveOutputDygraph");
+}
+
+PYBIND11_MODULE(paddlenlp_ops_90, m) {


paddlenlp_ops_90/80/xx这些是在哪里区分调用的？

build dist wheel的时候会打出来paddlenlp_ops_90这种的东西，如果步pybind特定的调用不起来

yuanlehome · 2025-04-17T11:12:23Z

新增的环境变量和脚本参数，需要在docs里介绍下～

…nto dy_insert

JunnYu · 2025-04-18T02:22:49Z

llm/predict/predictor.py

+        result_queue = mp.Queue()
+        task_queue = mp.Queue()
+        done_event = mp.Event()
+        read_res_func = llm_utils.read_res_dynamic_insert


if self.config.output_via_mq: 这写应该是这个条件才会用到吧

Done, 已调整代码逻辑

JunnYu · 2025-04-18T02:23:28Z

llm/predict/predictor.py

+                    task_token = self.model_inputs["all_token_ids"][task_id : task_id + 1, :].cpu().numpy()
+                    task_queue.put([task_id, task_token])
+
+        logger.info(f"running spend {time.time() - s_time}")


这些改成debug级别吧

JunnYu · 2025-04-18T02:24:27Z

llm/predict/predictor.py

+    def insert_task(self, pos, task_id):
+        query_id = task_id
+        length = len(self.input_ids[query_id])
+        logger.info(f"Insert task {task_id} while query id is {query_id} inserting pos {pos}")


改成debug级别吧

JunnYu · 2025-04-18T06:12:37Z

llm/predict/predictor.py

+        ] = np.array(self.decoder_blocks[pos])
+
+    @paddle.no_grad()
+    def predict_dy_insert(self, input_texts: list[str], return_tokens=False, **kwargs):


这里还希望能够支持用户直接输入input_ids的时候直接进行推理，省去tokenizer input_texts的逻辑

support dynamic_insert in dynamic run

8db5188

ckl117 force-pushed the dy_insert branch from 5d70100 to 8db5188 Compare April 10, 2025 14:36

remove duplicate code

d832de7

ckl117 added 3 commits April 11, 2025 11:46

add save_output_dygraph op

f211ef3

check pybind op

4f8a81c

note off

03da2cc

DrownFish19 reviewed Apr 11, 2025

View reviewed changes

code check

b815455

DrownFish19 changed the title ~~support dynamic_insert in dynamic run~~ [Infra] support dynamic_insert in dynamic run Apr 11, 2025

check mla pybind function signature

51d0372

DrownFish19 previously approved these changes Apr 15, 2025

View reviewed changes

vivienfanghuagood reviewed Apr 15, 2025

View reviewed changes

check pybind and batch_size

0791231

ckl117 dismissed DrownFish19’s stale review via 0791231 April 15, 2025 04:09

ckl117 added 9 commits April 15, 2025 15:06

support multi-proc save_output and get_output

5ec1d92

check pybind block gemm

69d1915

support pybind custom_import in dynamic run

3b8ac81

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

8efc08a

…nto dy_insert

support non-msg queue stream output in dynamic insert

158f29a

check queue_id in d2s, and add sm80 sm90 for pybind

1f09cb0

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

b309df0

…nto dy_insert

check pybind for build_wheel.sh

f27b732

USE_PYBIND=False

2345320

ckl117 force-pushed the dy_insert branch from d371d37 to 2345320 Compare April 17, 2025 05:24

ckl117 added 2 commits April 17, 2025 16:08

implement custom_import as context management and decorator

59c7eb3

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

1954191

…nto dy_insert

vivienfanghuagood approved these changes Apr 17, 2025

View reviewed changes

yuanlehome reviewed Apr 17, 2025

View reviewed changes

llm/predict/predictor.py Outdated Show resolved Hide resolved

yuanlehome reviewed Apr 17, 2025

View reviewed changes

llm/predict/predictor.py Outdated Show resolved Hide resolved

yuanlehome reviewed Apr 17, 2025

View reviewed changes

code check

5c6a14b

add doc for pybind and dynamic_insert

f252148

ckl117 mentioned this pull request Apr 17, 2025

【Infer Doc】add doc for pybind and dynamic_insert #10437

Closed

2 tasks

ckl117 added 5 commits April 17, 2025 19:58

check

59b5560

change USE_PYBIND_CUSTOM_OPS to DYNAMIC_INFERENCE_MODE

6ebfad5

Merge branch 'dy_insert_doc' of https://github.com/ckl117/PaddleNLP i…

dbe46ca

…nto dy_insert

check doc

ebecd3c

code check

fb5e0ee

JunnYu reviewed Apr 18, 2025

View reviewed changes

ckl117 added 2 commits April 18, 2025 11:07

check dynamic_insert debug

c42801f

check task_queue is empty when dynamic insert exit

2211167

JunnYu previously approved these changes Apr 18, 2025

View reviewed changes

ckl117 dismissed JunnYu’s stale review via f8f1266 April 18, 2025 05:46

ckl117 force-pushed the dy_insert branch 2 times, most recently from 10e06a9 to 6dd8e3c Compare April 18, 2025 05:50

fix blocking when output_via_mq=1 in dynamic_insert

b7748be

ckl117 force-pushed the dy_insert branch from 6dd8e3c to b7748be Compare April 18, 2025 05:51

JunnYu reviewed Apr 18, 2025

View reviewed changes

yuanlehome approved these changes Apr 18, 2025

View reviewed changes

JunnYu approved these changes Apr 18, 2025

View reviewed changes

wawltor merged commit f889167 into PaddlePaddle:develop Apr 18, 2025
8 of 12 checks passed

		_original_import = builtins.__import__


		def custom_import(name, args, *kwargs):


		module = _original_import(name, args, *kwargs)

		if not _paddlenlp_ops_updated and os.getenv("USE_PYBIND", "1").lower() in ["1", "true", "t", "yes", "y"]:

[Infra] support dynamic_insert in dynamic run #10387

[Infra] support dynamic_insert in dynamic run #10387

Uh oh!

Conversation

ckl117 commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before submitting

PR types

PR changes

Description

Uh oh!

paddle-bot bot commented Apr 10, 2025

Uh oh!

codecov bot commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

DrownFish19 Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DrownFish19 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuanlehome commented Apr 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ckl117 commented Apr 10, 2025 •

edited

Loading

codecov bot commented Apr 10, 2025 •

edited

Loading

DrownFish19 Apr 11, 2025 •

edited

Loading