操作系统及版本
openEuler 24.03
安装工具的python环境
在anaconda/miniconda创建的python虚拟环境
python版本
3.10
AISBench工具版本
3.1.20260415
AISBench执行命令
ais_bench --models vllm_api_stream_chat --datasets math500_gen_0_shot_cot_chat_prompt
模型配置文件或自定义配置文件内容
from ais_bench.benchmark.models import VLLMCustomAPIChatStream
from ais_bench.benchmark.utils.model_postprocessors import extract_non_reasoning_content
models = [
dict(
attr="service",
type=VLLMCustomAPIChatStream,
abbr='vllm-api-stream-chat',
path="/home/data3/weights/Moonlight-16B-A3B",
model="",
request_rate = 0,
retry = 2,
host_ip = "localhost",
host_port = 8006,
max_out_len = 10240,
batch_size=8,
trust_remote_code=False,
generation_kwargs = dict(
temperature = 0,
seed = 1234,
),
)
]
预期行为
No response
实际行为
评测结果偏低,在predictions目录下看到打印的origin_prompt形如:
"origin_prompt": [{"role": "HUMAN", "prompt": "What is the smallest positive perfect cube that can be written as the sum of three consecutive integers?\nPlease reason step by step, and put your final answer within \boxed{}."}],
感觉其中的role HUMAN不大对
但是在deepwiki中搜索,说是这里只是打印问题,实际已经做好HUMAN->user转换了
通过单curl请求发现role为HUMAN时,结果和评测的结果一致,感觉是没有转换成role user?
前置检查
操作系统及版本
openEuler 24.03
安装工具的python环境
在anaconda/miniconda创建的python虚拟环境
python版本
3.10
AISBench工具版本
3.1.20260415
AISBench执行命令
ais_bench --models vllm_api_stream_chat --datasets math500_gen_0_shot_cot_chat_prompt
模型配置文件或自定义配置文件内容
from ais_bench.benchmark.models import VLLMCustomAPIChatStream
from ais_bench.benchmark.utils.model_postprocessors import extract_non_reasoning_content
models = [
dict(
attr="service",
type=VLLMCustomAPIChatStream,
abbr='vllm-api-stream-chat',
path="/home/data3/weights/Moonlight-16B-A3B",
model="",
request_rate = 0,
retry = 2,
host_ip = "localhost",
host_port = 8006,
max_out_len = 10240,
batch_size=8,
trust_remote_code=False,
generation_kwargs = dict(
temperature = 0,
seed = 1234,
),
)
]
预期行为
No response
实际行为
评测结果偏低,在predictions目录下看到打印的origin_prompt形如:
"origin_prompt": [{"role": "HUMAN", "prompt": "What is the smallest positive perfect cube that can be written as the sum of three consecutive integers?\nPlease reason step by step, and put your final answer within \boxed{}."}],
感觉其中的role HUMAN不大对
但是在deepwiki中搜索,说是这里只是打印问题,实际已经做好HUMAN->user转换了
通过单curl请求发现role为HUMAN时,结果和评测的结果一致,感觉是没有转换成role user?
前置检查