Skip to content

Conversation

@ckl117
Copy link
Contributor

@ckl117 ckl117 commented Apr 10, 2025

Before submitting

  • Lint code. If there are lint issues, please format the code first.
# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py
  • Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

New features

PR changes

APIs

Description

  • 动态图支持动态插入、流式输出
  • 动态图部分自定义算子使用pybind(通过上下文管理和装饰器来控制,有些单测即跑动态图又跑动转静),后续支持sm90相应自定义算子
  • 多进程消息队列隔离,给save_output算子增加输入queue_id接收pid

示例

device=4,5,6,7
model_path=deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

src_length=1024
min_length=1
max_length=1024
total_max_length=$((src_length + max_length))
max_batch_size=24
total_request_num=32

python -m paddle.distributed.launch \
    --log_dir dy_insert_logs \
    --gpus ${device} \
    llm/predict/predictor.py \
    --model_name_or_path ${model_path} \
    --mode dynamic \
    --dtype bfloat16 \
    --inference_model \
    --append_attn \
    --src_length ${src_length} \
    --min_length ${min_length} \
    --max_length ${min_length} \
    --total_max_length ${total_max_length} \
    --top_p 0.95 \
    --temperature 0.7 \
    --repetition_penalty 1.0 \
    --batch_size ${max_batch_size} \
    --total_request_num ${total_request_num} \
    --dynamic_insert 

@paddle-bot
Copy link

paddle-bot bot commented Apr 10, 2025

Thanks for your contribution!

@codecov
Copy link

codecov bot commented Apr 10, 2025

Codecov Report

Attention: Patch coverage is 13.95349% with 74 lines in your changes missing coverage. Please review.

Project coverage is 48.99%. Comparing base (8521f02) to head (b7748be).
Report is 186 commits behind head on develop.

Files with missing lines Patch % Lines
paddlenlp/utils/import_utils.py 25.53% 35 Missing ⚠️
paddlenlp/trl/llm_utils.py 0.00% 32 Missing ⚠️
...enlp/experimental/transformers/generation_utils.py 0.00% 6 Missing ⚠️
...erimental/transformers/fused_transformer_layers.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #10387      +/-   ##
===========================================
- Coverage    49.01%   48.99%   -0.03%     
===========================================
  Files          765      765              
  Lines       125879   125971      +92     
===========================================
+ Hits         61699    61716      +17     
- Misses       64180    64255      +75     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

m.def("f_multi_head_latent_attention", &MultiHeadLatentAttention, "MultiHeadLatentAttention");
m.def("f_noaux_tc", &NoauxTc, "NoauxTc");
m.def("f_get_block_shape_and_split_kv_block", &GetBlockShapeAndSplitKVBlock, "GetBlockShapeAndSplitKVBlock");
m.def("f_prefill_mla_write_cache"å, &PrefillMLAWriteCacheKernel, "PrefillMLAWriteCacheKernel");
Copy link
Collaborator

@DrownFish19 DrownFish19 Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里有个特殊符号,需要删除

@DrownFish19 DrownFish19 changed the title support dynamic_insert in dynamic run [Infra] support dynamic_insert in dynamic run Apr 11, 2025
DrownFish19
DrownFish19 previously approved these changes Apr 15, 2025
Copy link
Collaborator

@DrownFish19 DrownFish19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

_original_import = builtins.__import__


def custom_import(name, *args, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个custom_import 需要针对动态图和静态图兼容一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

增加上下文管理和装饰器,在动态图推理时才会使用pybind

Comment on lines 72 to 143
_original_import = builtins.__import__
_imported_modules = {}
_paddlenlp_ops_updated = False
_original_attributes = {}
pybind_ops_list = [
"update_inputs_v2",
"save_output",
"set_preids_token_penalty_multi_scores",
"rebuild_padding_v2",
"append_attention",
"save_output_dygraph",
]


def custom_import(name, *args, **kwargs):
global _paddlenlp_ops_updated, _imported_modules, _original_attributes
global pybind_ops_list

if _paddlenlp_ops_updated:
if name in _imported_modules:
return _imported_modules[name]

module = _original_import(name, *args, **kwargs)

if not _paddlenlp_ops_updated and os.getenv("USE_PYBIND", "1").lower() in ["1", "true", "t", "yes", "y"]:
if name == "paddlenlp_ops":
logger.info("Using Pybind paddlenlp_ops!")

if name not in _original_attributes:
bak_dict = {}
for ops_name in pybind_ops_list:
bak_dict[ops_name] = getattr(module, ops_name, None)
_original_attributes[name] = bak_dict

for ops_name in pybind_ops_list:
pybind_ops_name = f"f_{ops_name}"
if hasattr(module, pybind_ops_name):
setattr(module, ops_name, getattr(module, pybind_ops_name))

_paddlenlp_ops_updated = True

_imported_modules[name] = module
return module


@contextmanager
def dynamic_graph_pybind_context():
global _original_import, _paddlenlp_ops_updated
original_import = builtins.__import__

try:
builtins.__import__ = custom_import
yield
finally:
builtins.__import__ = original_import

if "paddlenlp_ops" in _original_attributes:
paddlenlp_ops_module = sys.modules.get("paddlenlp_ops")
if paddlenlp_ops_module:
for attr, value in _original_attributes["paddlenlp_ops"].items():
setattr(paddlenlp_ops_module, attr, value)
_paddlenlp_ops_updated = False


def auto_dynamic_graph_pybind(func):
@functools.wraps(func)
def wrapper(self, *args, **kwargs):
with dynamic_graph_pybind_context():
return func(self, *args, **kwargs)

return wrapper

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块代码放在这个文件放这里是不是不太合适?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已移动到paddlenlp/utils/import_utils.py


module = _original_import(name, *args, **kwargs)

if not _paddlenlp_ops_updated and os.getenv("USE_PYBIND", "1").lower() in ["1", "true", "t", "yes", "y"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

USE_PYBIND这个环境变量名字取得是不是太随意了?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已改为USE_PYBIND_CUSTOM_OPS

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改为DYNAMIC_INFERENCE_MODE吧,之后动态图推理估计还有一些特殊逻辑,统一用这个

self.update_model_inputs("temperature", old_config.temperature)
self.config = old_config

def insert(self, pos, task_id):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insert这个成员函数名字取得也太随意了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改为insert_task

Comment on lines 268 to 270
dynamic_insert: bool = field(default=False, metadata={"help": "whether use dynamic insert"})
total_request_num: int = field(default=None, metadata={"help": "The total number of request data"})
init_cache_kvs: bool = field(default=True, metadata={"help": "whether init cache_kvs"})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增init_cache_kvs是必须的吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除

m.def("f_save_output_dygraph", &SaveOutputDygraph, "SaveOutputDygraph");
}

PYBIND11_MODULE(paddlenlp_ops_90, m) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddlenlp_ops_90/80/xx这些是在哪里区分调用的?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build dist wheel的时候会打出来paddlenlp_ops_90这种的东西,如果步pybind特定的调用不起来

@yuanlehome
Copy link
Collaborator

新增的环境变量和脚本参数,需要在docs里介绍下~

Comment on lines 1421 to 1424
result_queue = mp.Queue()
task_queue = mp.Queue()
done_event = mp.Event()
read_res_func = llm_utils.read_res_dynamic_insert
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if self.config.output_via_mq: 这写应该是这个条件才会用到吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, 已调整代码逻辑

task_token = self.model_inputs["all_token_ids"][task_id : task_id + 1, :].cpu().numpy()
task_queue.put([task_id, task_token])

logger.info(f"running spend {time.time() - s_time}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些改成debug级别吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

def insert_task(self, pos, task_id):
query_id = task_id
length = len(self.input_ids[query_id])
logger.info(f"Insert task {task_id} while query id is {query_id} inserting pos {pos}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成debug级别吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

JunnYu
JunnYu previously approved these changes Apr 18, 2025
@ckl117 ckl117 force-pushed the dy_insert branch 2 times, most recently from 10e06a9 to 6dd8e3c Compare April 18, 2025 05:50
] = np.array(self.decoder_blocks[pos])

@paddle.no_grad()
def predict_dy_insert(self, input_texts: list[str], return_tokens=False, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里还希望能够支持用户直接输入input_ids的时候直接进行推理,省去tokenizer input_texts的逻辑

@wawltor wawltor merged commit f889167 into PaddlePaddle:develop Apr 18, 2025
8 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants