-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[Infra] support dynamic_insert in dynamic run #10387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #10387 +/- ##
===========================================
- Coverage 49.01% 48.99% -0.03%
===========================================
Files 765 765
Lines 125879 125971 +92
===========================================
+ Hits 61699 61716 +17
- Misses 64180 64255 +75 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
csrc/gpu/ops_pybind.cu
Outdated
| m.def("f_multi_head_latent_attention", &MultiHeadLatentAttention, "MultiHeadLatentAttention"); | ||
| m.def("f_noaux_tc", &NoauxTc, "NoauxTc"); | ||
| m.def("f_get_block_shape_and_split_kv_block", &GetBlockShapeAndSplitKVBlock, "GetBlockShapeAndSplitKVBlock"); | ||
| m.def("f_prefill_mla_write_cache"å, &PrefillMLAWriteCacheKernel, "PrefillMLAWriteCacheKernel"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里有个特殊符号,需要删除
DrownFish19
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
llm/predict/predictor.py
Outdated
| _original_import = builtins.__import__ | ||
|
|
||
|
|
||
| def custom_import(name, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个custom_import 需要针对动态图和静态图兼容一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
增加上下文管理和装饰器,在动态图推理时才会使用pybind
llm/predict/predictor.py
Outdated
| _original_import = builtins.__import__ | ||
| _imported_modules = {} | ||
| _paddlenlp_ops_updated = False | ||
| _original_attributes = {} | ||
| pybind_ops_list = [ | ||
| "update_inputs_v2", | ||
| "save_output", | ||
| "set_preids_token_penalty_multi_scores", | ||
| "rebuild_padding_v2", | ||
| "append_attention", | ||
| "save_output_dygraph", | ||
| ] | ||
|
|
||
|
|
||
| def custom_import(name, *args, **kwargs): | ||
| global _paddlenlp_ops_updated, _imported_modules, _original_attributes | ||
| global pybind_ops_list | ||
|
|
||
| if _paddlenlp_ops_updated: | ||
| if name in _imported_modules: | ||
| return _imported_modules[name] | ||
|
|
||
| module = _original_import(name, *args, **kwargs) | ||
|
|
||
| if not _paddlenlp_ops_updated and os.getenv("USE_PYBIND", "1").lower() in ["1", "true", "t", "yes", "y"]: | ||
| if name == "paddlenlp_ops": | ||
| logger.info("Using Pybind paddlenlp_ops!") | ||
|
|
||
| if name not in _original_attributes: | ||
| bak_dict = {} | ||
| for ops_name in pybind_ops_list: | ||
| bak_dict[ops_name] = getattr(module, ops_name, None) | ||
| _original_attributes[name] = bak_dict | ||
|
|
||
| for ops_name in pybind_ops_list: | ||
| pybind_ops_name = f"f_{ops_name}" | ||
| if hasattr(module, pybind_ops_name): | ||
| setattr(module, ops_name, getattr(module, pybind_ops_name)) | ||
|
|
||
| _paddlenlp_ops_updated = True | ||
|
|
||
| _imported_modules[name] = module | ||
| return module | ||
|
|
||
|
|
||
| @contextmanager | ||
| def dynamic_graph_pybind_context(): | ||
| global _original_import, _paddlenlp_ops_updated | ||
| original_import = builtins.__import__ | ||
|
|
||
| try: | ||
| builtins.__import__ = custom_import | ||
| yield | ||
| finally: | ||
| builtins.__import__ = original_import | ||
|
|
||
| if "paddlenlp_ops" in _original_attributes: | ||
| paddlenlp_ops_module = sys.modules.get("paddlenlp_ops") | ||
| if paddlenlp_ops_module: | ||
| for attr, value in _original_attributes["paddlenlp_ops"].items(): | ||
| setattr(paddlenlp_ops_module, attr, value) | ||
| _paddlenlp_ops_updated = False | ||
|
|
||
|
|
||
| def auto_dynamic_graph_pybind(func): | ||
| @functools.wraps(func) | ||
| def wrapper(self, *args, **kwargs): | ||
| with dynamic_graph_pybind_context(): | ||
| return func(self, *args, **kwargs) | ||
|
|
||
| return wrapper | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块代码放在这个文件放这里是不是不太合适?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已移动到paddlenlp/utils/import_utils.py
llm/predict/predictor.py
Outdated
|
|
||
| module = _original_import(name, *args, **kwargs) | ||
|
|
||
| if not _paddlenlp_ops_updated and os.getenv("USE_PYBIND", "1").lower() in ["1", "true", "t", "yes", "y"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
USE_PYBIND这个环境变量名字取得是不是太随意了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已改为USE_PYBIND_CUSTOM_OPS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改为DYNAMIC_INFERENCE_MODE吧,之后动态图推理估计还有一些特殊逻辑,统一用这个
llm/predict/predictor.py
Outdated
| self.update_model_inputs("temperature", old_config.temperature) | ||
| self.config = old_config | ||
|
|
||
| def insert(self, pos, task_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
insert这个成员函数名字取得也太随意了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改为insert_task
llm/predict/predictor.py
Outdated
| dynamic_insert: bool = field(default=False, metadata={"help": "whether use dynamic insert"}) | ||
| total_request_num: int = field(default=None, metadata={"help": "The total number of request data"}) | ||
| init_cache_kvs: bool = field(default=True, metadata={"help": "whether init cache_kvs"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
新增init_cache_kvs是必须的吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删除
| m.def("f_save_output_dygraph", &SaveOutputDygraph, "SaveOutputDygraph"); | ||
| } | ||
|
|
||
| PYBIND11_MODULE(paddlenlp_ops_90, m) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddlenlp_ops_90/80/xx这些是在哪里区分调用的?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build dist wheel的时候会打出来paddlenlp_ops_90这种的东西,如果步pybind特定的调用不起来
|
新增的环境变量和脚本参数,需要在docs里介绍下~ |
llm/predict/predictor.py
Outdated
| result_queue = mp.Queue() | ||
| task_queue = mp.Queue() | ||
| done_event = mp.Event() | ||
| read_res_func = llm_utils.read_res_dynamic_insert |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.config.output_via_mq: 这写应该是这个条件才会用到吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, 已调整代码逻辑
llm/predict/predictor.py
Outdated
| task_token = self.model_inputs["all_token_ids"][task_id : task_id + 1, :].cpu().numpy() | ||
| task_queue.put([task_id, task_token]) | ||
|
|
||
| logger.info(f"running spend {time.time() - s_time}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些改成debug级别吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
llm/predict/predictor.py
Outdated
| def insert_task(self, pos, task_id): | ||
| query_id = task_id | ||
| length = len(self.input_ids[query_id]) | ||
| logger.info(f"Insert task {task_id} while query id is {query_id} inserting pos {pos}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改成debug级别吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
10e06a9 to
6dd8e3c
Compare
| ] = np.array(self.decoder_blocks[pos]) | ||
|
|
||
| @paddle.no_grad() | ||
| def predict_dy_insert(self, input_texts: list[str], return_tokens=False, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里还希望能够支持用户直接输入input_ids的时候直接进行推理,省去tokenizer input_texts的逻辑
Before submitting
testsfolder. If there are codecov issues, please add tests cases first.PR types
New features
PR changes
APIs
Description
示例