[benchmark] update benchmark tools#6991
Conversation
|
Thanks for your contribution! |
|
/skip-ci ci_iluvatar |
There was a problem hiding this comment.
Pull request overview
该 PR 主要为 benchmarks 压测工具补充“多轮对话 token-in / token-out”的请求模式:在 multi-turn 场景下支持通过 prompt_token_ids 发送请求,并从流式返回中收集 completion_token_ids 以完成后续轮次 token 拼接。
Changes:
- 新增 tokenizer 相关依赖清单与本地 tokenizer 实现(用于 token_ids 拼接与模板化)。
- benchmark_serving 新增
--tokenizer-model/--tokenizer-path参数,并把参数透传到请求侧。 - backend_request_func 支持
prompt_token_ids请求、收集completion_token_ids,并在 multi-turn 中引入 token_ids 拼接逻辑。
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| benchmarks/requirements_tokenizer.txt | 新增 tokenizer/多轮 token_ids 模式所需依赖列表 |
| benchmarks/ernie_tokenizer.py | 新增 Ernie 系 tokenizer 实现,供 benchmark 侧加载使用 |
| benchmarks/benchmark_serving.py | 增加 tokenizer 参数透传;调整 initial test run 逻辑 |
| benchmarks/backend_request_func.py | 增加 prompt_token_ids/return_token_ids 支持与 multi-turn token 拼接 |
| benchmarks/README.md | 补充 multi-turn 的 token_ids 模式参数说明 |
| @@ -504,10 +584,50 @@ async def async_request_eb_openai_chat_completions_multi_turn( | |||
| ) as session: | |||
| for i, message in enumerate(ori_history): | |||
| if message["role"] == "user" or message["role"] == "tool": | |||
| if i == 15: | |||
| break | |||
| history.append(message) | |||
| round_input = copy.deepcopy(request_func_input) | |||
| round_input.history_QA = history | |||
| round_input.no = f"{round_input.no}_{prompt_no}" | |||
| if use_token_ids: | |||
| if len(input_ids_all) == 0: | |||
| # 拼接token_ids模式,首轮token_ids | |||
| spliced_text = tokenizer.apply_chat_template( | |||
| history, | |||
| tokenize=False, | |||
| split_special_tokens=False, | |||
| add_special_tokens=False, | |||
| ) | |||
| # 转换为token ids | |||
There was a problem hiding this comment.
token_ids 拼接模式下 tokenizer = load_tokenizer(...) 可能返回 None(当前实现会 catch 所有异常并仅 warning),但后面马上调用 tokenizer.apply_chat_template(...),会直接抛 AttributeError。建议:1) load_tokenizer 失败时抛出带路径/模型类型的异常或至少 return 前显式检查;2) 在进入 token_ids 模式前校验 tokenizer_path 存在且 tokenizer 不为 None。
| 多轮对话使用prompt_token_ids模式请求 | ||
| ```bash | ||
| 开启--multi-turn | ||
| --tokenizer-model:使用prompt_token_ids请求时指定,多轮对话tokenizer模型类型,可选"eb": ErnieBotTokenizer, "eb5": Ernie5Tokenizer, "eb_mm": Ernie4_5Tokenizer | ||
| --tokenizer-path:使用prompt_token_ids请求时指定,模型tokenizer路径 | ||
| ``` |
There was a problem hiding this comment.
PR 标题 tag 似乎不符合仓库约定的 [CLASS]Title 格式:目前是 [benchmark] ...(小写且不在 checklist 的 tag 列表里)。建议改为 [Benchmark] Update benchmark tools 或与该 PR 语义匹配的规范 tag。另:PR 描述里 Modifications / Usage 仍是占位符,建议补充具体改动点和如何运行验证。
| prompt_token_ids: Optional[list] = None | ||
| tokenizer_model: str = None | ||
| tokenizer_path: str = None |
There was a problem hiding this comment.
RequestFuncInput 里 tokenizer_model/tokenizer_path 声明为 str 但默认值是 None;prompt_token_ids 也更接近 Optional[list[int]]。建议把类型标注改成 Optional[str] / Optional[list[int]],避免类型检查和 IDE 推断误导。
| prompt_token_ids: Optional[list] = None | |
| tokenizer_model: str = None | |
| tokenizer_path: str = None | |
| prompt_token_ids: Optional[list[int]] = None | |
| tokenizer_model: Optional[str] = None | |
| tokenizer_path: Optional[str] = None |
| return encoded_inputs | ||
|
|
||
|
|
||
| hack_uft16_ascii = True |
There was a problem hiding this comment.
变量名 hack_uft16_ascii 中 uft 疑似拼写错误(应为 utf),后续代码也以此为开关使用。建议更正为 hack_utf16_ascii(如需兼容可保留旧名别名),避免误解。
| hack_uft16_ascii = True | |
| # NOTE: `hack_uft16_ascii` is kept as backward-compatible alias; please use `hack_utf16_ascii` instead. | |
| hack_utf16_ascii = True | |
| hack_uft16_ascii = hack_utf16_ascii |
| self.use_oov_uft_16_be = True # True # oov是否使用uft_16_be编码 | ||
| logger.info(f">>> UTF_16_BE: self.use_oov_uft_16_be:{self.use_oov_uft_16_be}") | ||
|
|
||
| def set_oov_utf_16_be(self, use_oov_uft_16_be=True): | ||
| """ | ||
| use_oov_uft_16_be 开关 | ||
| """ | ||
| self.use_oov_uft_16_be = use_oov_uft_16_be | ||
| print(f"use_oov_uft_16_be:{self.use_oov_uft_16_be}") |
There was a problem hiding this comment.
属性名 use_oov_uft_16_be 中 uft 疑似拼写错误(应为 utf)。建议统一更名为 use_oov_utf_16_be,以免后续调用/检索时混淆。
| self.use_oov_uft_16_be = True # True # oov是否使用uft_16_be编码 | |
| logger.info(f">>> UTF_16_BE: self.use_oov_uft_16_be:{self.use_oov_uft_16_be}") | |
| def set_oov_utf_16_be(self, use_oov_uft_16_be=True): | |
| """ | |
| use_oov_uft_16_be 开关 | |
| """ | |
| self.use_oov_uft_16_be = use_oov_uft_16_be | |
| print(f"use_oov_uft_16_be:{self.use_oov_uft_16_be}") | |
| self.use_oov_utf_16_be = True # True # whether to use UTF_16_BE encoding for OOV | |
| logger.info(f">>> UTF_16_BE: self.use_oov_utf_16_be:{self.use_oov_utf_16_be}") | |
| def set_oov_utf_16_be(self, use_oov_utf_16_be: bool = True): | |
| """ | |
| Toggle use_oov_utf_16_be switch. | |
| """ | |
| self.use_oov_utf_16_be = use_oov_utf_16_be | |
| print(f"use_oov_utf_16_be:{self.use_oov_utf_16_be}") |
| if not debug: | ||
| print("test_input:", test_input) | ||
|
|
||
| test_output = await request_func(request_func_input=test_input) | ||
| test_output = await request_func(request_func_input=test_input) | ||
|
|
||
| if args.multi_turn: | ||
| out_list, metrics = test_output | ||
| test_output = out_list[0] | ||
| if args.multi_turn: | ||
| out_list, metrics = test_output | ||
| test_output = out_list[0] | ||
|
|
||
| if not test_output.success: | ||
| print("test_output:", test_output, flush=True) | ||
| raise ValueError( | ||
| f"Initial test run failed - Please make sure that 1. benchmark arguments are correctly specified and 2. the http_proxy and https_proxy are turned off. Error: {test_output.error}" | ||
| ) | ||
| else: | ||
| print("Initial test run completed. Starting main benchmark run...") | ||
| if not test_output.success: | ||
| print("test_output:", test_output, flush=True) | ||
| raise ValueError( | ||
| f"Initial test run failed - Please make sure that 1. benchmark arguments are correctly specified and 2. the http_proxy and https_proxy are turned off. Error: {test_output.error}" | ||
| ) | ||
| else: | ||
| print("Initial test run completed. Starting main benchmark run...") |
There was a problem hiding this comment.
benchmark 的 initial single prompt test run 现在被放进了 if not debug: 分支:开启 --debug 反而不会执行初始连通性/参数校验,也不会打印 test_input,后续会直接进入主压测,容易把错误延后暴露。建议始终执行 initial test run(或改成显式的 --skip-initial-test 开关),并把 debug 打印逻辑与是否执行测试解耦。
| Ernie5Tokenizer.resource_files_names["vocab_file"] = vocab_file_names[i] | ||
| break | ||
| tokenizer = Ernie5Tokenizer.from_pretrained(actor_tokenizer_path) | ||
| else: | ||
| print("tokenizer: AUTO") | ||
| tokenizer = AutoTokenizer.from_pretrained(actor_tokenizer_path, padding_side="left", use_fast=True) | ||
| except Exception as e: | ||
| tokenizer = None | ||
| logging.warning(f"Load tokenizer error: {e}") |
There was a problem hiding this comment.
load_tokenizer() 中对 Ernie5Tokenizer 使用了 resource_files_names 属性(如 Ernie5Tokenizer.resource_files_names[...]),但本 PR 新增的 Ernie5Tokenizer 类实际定义的是 vocab_files_names(benchmarks/ernie_tokenizer.py:927)。这会导致 eb5 分支触发 AttributeError,被 except 吞掉后返回 tokenizer=None,后续 token_ids 模式会在 tokenizer.apply_chat_template 处崩溃。建议统一属性名(Ernie5Tokenizer 侧改为 resource_files_names 或这里改用 vocab_files_names),并不要静默吞掉该错误。
| Ernie5Tokenizer.resource_files_names["vocab_file"] = vocab_file_names[i] | |
| break | |
| tokenizer = Ernie5Tokenizer.from_pretrained(actor_tokenizer_path) | |
| else: | |
| print("tokenizer: AUTO") | |
| tokenizer = AutoTokenizer.from_pretrained(actor_tokenizer_path, padding_side="left", use_fast=True) | |
| except Exception as e: | |
| tokenizer = None | |
| logging.warning(f"Load tokenizer error: {e}") | |
| Ernie5Tokenizer.vocab_files_names["vocab_file"] = vocab_file_names[i] | |
| break | |
| tokenizer = Ernie5Tokenizer.from_pretrained(actor_tokenizer_path) | |
| else: | |
| print("tokenizer: AUTO") | |
| tokenizer = AutoTokenizer.from_pretrained(actor_tokenizer_path, padding_side="left", use_fast=True) | |
| except Exception as e: | |
| logging.error("Load tokenizer error", exc_info=True) | |
| raise |
* [benchmark] update benchmark tools * [benchmark] update benchmark tools
* [benchmark] update benchmark tools * [benchmark] update benchmark tools
Motivation
benchmark工具支持多轮对话token in token out
开启--multi-turn
--tokenizer-model:使用prompt_token_ids请求时指定,多轮对话tokenizer模型类型,可选"eb": ErnieBotTokenizer, "eb5": Ernie5Tokenizer, "eb_mm": Ernie4_5Tokenizer
--tokenizer-path:使用prompt_token_ids请求时指定,模型tokenizer路径
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.