Skip to content

[benchmark] update benchmark tools#6991

Merged
ZhangYulongg merged 2 commits into
PaddlePaddle:developfrom
ZhangYulongg:update_benchmark_0323
Mar 24, 2026
Merged

[benchmark] update benchmark tools#6991
ZhangYulongg merged 2 commits into
PaddlePaddle:developfrom
ZhangYulongg:update_benchmark_0323

Conversation

@ZhangYulongg

Copy link
Copy Markdown
Collaborator

Motivation

benchmark工具支持多轮对话token in token out
开启--multi-turn
--tokenizer-model:使用prompt_token_ids请求时指定,多轮对话tokenizer模型类型,可选"eb": ErnieBotTokenizer, "eb5": Ernie5Tokenizer, "eb_mm": Ernie4_5Tokenizer
--tokenizer-path:使用prompt_token_ids请求时指定,模型tokenizer路径

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot

paddle-bot Bot commented Mar 24, 2026

Copy link
Copy Markdown

Thanks for your contribution!

@paddle-bot paddle-bot Bot added the contributor External developers label Mar 24, 2026
@EmmonsCurse

Copy link
Copy Markdown
Collaborator

/skip-ci ci_iluvatar
/skip-ci ci_hpu
/skip-ci build_gpu
/skip-ci build_xpu

EmmonsCurse
EmmonsCurse previously approved these changes Mar 24, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 主要为 benchmarks 压测工具补充“多轮对话 token-in / token-out”的请求模式:在 multi-turn 场景下支持通过 prompt_token_ids 发送请求,并从流式返回中收集 completion_token_ids 以完成后续轮次 token 拼接。

Changes:

  • 新增 tokenizer 相关依赖清单与本地 tokenizer 实现(用于 token_ids 拼接与模板化)。
  • benchmark_serving 新增 --tokenizer-model/--tokenizer-path 参数,并把参数透传到请求侧。
  • backend_request_func 支持 prompt_token_ids 请求、收集 completion_token_ids,并在 multi-turn 中引入 token_ids 拼接逻辑。

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
benchmarks/requirements_tokenizer.txt 新增 tokenizer/多轮 token_ids 模式所需依赖列表
benchmarks/ernie_tokenizer.py 新增 Ernie 系 tokenizer 实现,供 benchmark 侧加载使用
benchmarks/benchmark_serving.py 增加 tokenizer 参数透传;调整 initial test run 逻辑
benchmarks/backend_request_func.py 增加 prompt_token_ids/return_token_ids 支持与 multi-turn token 拼接
benchmarks/README.md 补充 multi-turn 的 token_ids 模式参数说明

Comment on lines 559 to +602
@@ -504,10 +584,50 @@ async def async_request_eb_openai_chat_completions_multi_turn(
) as session:
for i, message in enumerate(ori_history):
if message["role"] == "user" or message["role"] == "tool":
if i == 15:
break
history.append(message)
round_input = copy.deepcopy(request_func_input)
round_input.history_QA = history
round_input.no = f"{round_input.no}_{prompt_no}"
if use_token_ids:
if len(input_ids_all) == 0:
# 拼接token_ids模式,首轮token_ids
spliced_text = tokenizer.apply_chat_template(
history,
tokenize=False,
split_special_tokens=False,
add_special_tokens=False,
)
# 转换为token ids

Copilot AI Mar 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

token_ids 拼接模式下 tokenizer = load_tokenizer(...) 可能返回 None(当前实现会 catch 所有异常并仅 warning),但后面马上调用 tokenizer.apply_chat_template(...),会直接抛 AttributeError。建议:1) load_tokenizer 失败时抛出带路径/模型类型的异常或至少 return 前显式检查;2) 在进入 token_ids 模式前校验 tokenizer_path 存在且 tokenizer 不为 None。

Copilot uses AI. Check for mistakes.
Comment thread benchmarks/README.md
Comment on lines +52 to +57
多轮对话使用prompt_token_ids模式请求
```bash
开启--multi-turn
--tokenizer-model:使用prompt_token_ids请求时指定,多轮对话tokenizer模型类型,可选"eb": ErnieBotTokenizer, "eb5": Ernie5Tokenizer, "eb_mm": Ernie4_5Tokenizer
--tokenizer-path:使用prompt_token_ids请求时指定,模型tokenizer路径
```

Copilot AI Mar 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR 标题 tag 似乎不符合仓库约定的 [CLASS]Title 格式:目前是 [benchmark] ...(小写且不在 checklist 的 tag 列表里)。建议改为 [Benchmark] Update benchmark tools 或与该 PR 语义匹配的规范 tag。另:PR 描述里 Modifications / Usage 仍是占位符,建议补充具体改动点和如何运行验证。

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +62
prompt_token_ids: Optional[list] = None
tokenizer_model: str = None
tokenizer_path: str = None

Copilot AI Mar 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RequestFuncInputtokenizer_model/tokenizer_path 声明为 str 但默认值是 None;prompt_token_ids 也更接近 Optional[list[int]]。建议把类型标注改成 Optional[str] / Optional[list[int]],避免类型检查和 IDE 推断误导。

Suggested change
prompt_token_ids: Optional[list] = None
tokenizer_model: str = None
tokenizer_path: str = None
prompt_token_ids: Optional[list[int]] = None
tokenizer_model: Optional[str] = None
tokenizer_path: Optional[str] = None

Copilot uses AI. Check for mistakes.
return encoded_inputs


hack_uft16_ascii = True

Copilot AI Mar 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

变量名 hack_uft16_asciiuft 疑似拼写错误(应为 utf),后续代码也以此为开关使用。建议更正为 hack_utf16_ascii(如需兼容可保留旧名别名),避免误解。

Suggested change
hack_uft16_ascii = True
# NOTE: `hack_uft16_ascii` is kept as backward-compatible alias; please use `hack_utf16_ascii` instead.
hack_utf16_ascii = True
hack_uft16_ascii = hack_utf16_ascii

Copilot uses AI. Check for mistakes.
Comment on lines +996 to +1004
self.use_oov_uft_16_be = True # True # oov是否使用uft_16_be编码
logger.info(f">>> UTF_16_BE: self.use_oov_uft_16_be:{self.use_oov_uft_16_be}")

def set_oov_utf_16_be(self, use_oov_uft_16_be=True):
"""
use_oov_uft_16_be 开关
"""
self.use_oov_uft_16_be = use_oov_uft_16_be
print(f"use_oov_uft_16_be:{self.use_oov_uft_16_be}")

Copilot AI Mar 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

属性名 use_oov_uft_16_beuft 疑似拼写错误(应为 utf)。建议统一更名为 use_oov_utf_16_be,以免后续调用/检索时混淆。

Suggested change
self.use_oov_uft_16_be = True # True # oov是否使用uft_16_be编码
logger.info(f">>> UTF_16_BE: self.use_oov_uft_16_be:{self.use_oov_uft_16_be}")
def set_oov_utf_16_be(self, use_oov_uft_16_be=True):
"""
use_oov_uft_16_be 开关
"""
self.use_oov_uft_16_be = use_oov_uft_16_be
print(f"use_oov_uft_16_be:{self.use_oov_uft_16_be}")
self.use_oov_utf_16_be = True # True # whether to use UTF_16_BE encoding for OOV
logger.info(f">>> UTF_16_BE: self.use_oov_utf_16_be:{self.use_oov_utf_16_be}")
def set_oov_utf_16_be(self, use_oov_utf_16_be: bool = True):
"""
Toggle use_oov_utf_16_be switch.
"""
self.use_oov_utf_16_be = use_oov_utf_16_be
print(f"use_oov_utf_16_be:{self.use_oov_utf_16_be}")

Copilot uses AI. Check for mistakes.
Comment on lines +390 to +405
if not debug:
print("test_input:", test_input)

test_output = await request_func(request_func_input=test_input)
test_output = await request_func(request_func_input=test_input)

if args.multi_turn:
out_list, metrics = test_output
test_output = out_list[0]
if args.multi_turn:
out_list, metrics = test_output
test_output = out_list[0]

if not test_output.success:
print("test_output:", test_output, flush=True)
raise ValueError(
f"Initial test run failed - Please make sure that 1. benchmark arguments are correctly specified and 2. the http_proxy and https_proxy are turned off. Error: {test_output.error}"
)
else:
print("Initial test run completed. Starting main benchmark run...")
if not test_output.success:
print("test_output:", test_output, flush=True)
raise ValueError(
f"Initial test run failed - Please make sure that 1. benchmark arguments are correctly specified and 2. the http_proxy and https_proxy are turned off. Error: {test_output.error}"
)
else:
print("Initial test run completed. Starting main benchmark run...")

Copilot AI Mar 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

benchmark 的 initial single prompt test run 现在被放进了 if not debug: 分支:开启 --debug 反而不会执行初始连通性/参数校验,也不会打印 test_input,后续会直接进入主压测,容易把错误延后暴露。建议始终执行 initial test run(或改成显式的 --skip-initial-test 开关),并把 debug 打印逻辑与是否执行测试解耦。

Copilot uses AI. Check for mistakes.
Comment on lines +216 to +224
Ernie5Tokenizer.resource_files_names["vocab_file"] = vocab_file_names[i]
break
tokenizer = Ernie5Tokenizer.from_pretrained(actor_tokenizer_path)
else:
print("tokenizer: AUTO")
tokenizer = AutoTokenizer.from_pretrained(actor_tokenizer_path, padding_side="left", use_fast=True)
except Exception as e:
tokenizer = None
logging.warning(f"Load tokenizer error: {e}")

Copilot AI Mar 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_tokenizer() 中对 Ernie5Tokenizer 使用了 resource_files_names 属性(如 Ernie5Tokenizer.resource_files_names[...]),但本 PR 新增的 Ernie5Tokenizer 类实际定义的是 vocab_files_names(benchmarks/ernie_tokenizer.py:927)。这会导致 eb5 分支触发 AttributeError,被 except 吞掉后返回 tokenizer=None,后续 token_ids 模式会在 tokenizer.apply_chat_template 处崩溃。建议统一属性名(Ernie5Tokenizer 侧改为 resource_files_names 或这里改用 vocab_files_names),并不要静默吞掉该错误。

Suggested change
Ernie5Tokenizer.resource_files_names["vocab_file"] = vocab_file_names[i]
break
tokenizer = Ernie5Tokenizer.from_pretrained(actor_tokenizer_path)
else:
print("tokenizer: AUTO")
tokenizer = AutoTokenizer.from_pretrained(actor_tokenizer_path, padding_side="left", use_fast=True)
except Exception as e:
tokenizer = None
logging.warning(f"Load tokenizer error: {e}")
Ernie5Tokenizer.vocab_files_names["vocab_file"] = vocab_file_names[i]
break
tokenizer = Ernie5Tokenizer.from_pretrained(actor_tokenizer_path)
else:
print("tokenizer: AUTO")
tokenizer = AutoTokenizer.from_pretrained(actor_tokenizer_path, padding_side="left", use_fast=True)
except Exception as e:
logging.error("Load tokenizer error", exc_info=True)
raise

Copilot uses AI. Check for mistakes.
Comment thread benchmarks/backend_request_func.py Outdated
@ZhangYulongg ZhangYulongg merged commit 6f5aa88 into PaddlePaddle:develop Mar 24, 2026
10 of 14 checks passed
mattheliu pushed a commit to mattheliu/FastDeploy that referenced this pull request Apr 1, 2026
* [benchmark] update benchmark tools

* [benchmark] update benchmark tools
xiaoguoguo626807 pushed a commit to xiaoguoguo626807/FastDeploy that referenced this pull request May 7, 2026
* [benchmark] update benchmark tools

* [benchmark] update benchmark tools
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants