-
Notifications
You must be signed in to change notification settings - Fork 14
Description
操作系统及版本
Ubuntu 22.04.5 LTS
安装工具的python环境
在anaconda/miniconda创建的python虚拟环境
python版本
3.11
AISBench工具版本
3.1.20260119
AISBench执行命令
ais_bench --models vllm_api_general_chat --datasets humaneval_gen_0_shot --merge-ds --num-prompts 5 --debug
模型配置文件或自定义配置文件内容
from ais_bench.benchmark.models import VLLMCustomAPIChat
from ais_bench.benchmark.utils.postprocess.model_postprocessors import extract_non_reasoning_content
models = [
dict(
attr="service",
type=VLLMCustomAPIChat,
abbr="vllm-api-general-chat",
path="",
model="deepseek",
stream=False,
request_rate=0,
use_timestamp=False,
retry=2,
api_key="",
host_ip="100.100.1**.***",
host_port= 8077,
url="",
max_out_len= 8000,
batch_size= 16,
trust_remote_code=False,
generation_kwargs=dict(
temperature = 1,
top_p = 0.95,
ignore_eos = False,
),
pred_postprocessor=dict(type=extract_non_reasoning_content),
)
]
预期行为
可以支持指定数据量快速测试配置与环境是否OK
实际行为
无法完成测试,报错:
root@accuracy:/home/benchmark/ais_bench/datasets# ais_bench --models vllm_api_general_chat --datasets humaneval_gen_0_shot --merge-ds --num-prompts 5 --debug
[2026-02-05 09:40:04,972] [ais_bench] [INFO] Loading vllm_api_general_chat: /home/benchmark/ais_bench/benchmark/configs/./models/vllm_api/vllm_api_general_chat.py
[2026-02-05 09:40:04,978] [ais_bench] [INFO] Loading humaneval_gen_0_shot: /home/benchmark/ais_bench/benchmark/configs/./datasets/humaneval/humaneval_gen_0_shot.py
[2026-02-05 09:40:04,981] [ais_bench] [INFO] Loading example: /home/benchmark/ais_bench/benchmark/configs/./summarizers/example.py
[2026-02-05 09:40:05,019] [ais_bench] [INFO] Current exp folder: outputs/default/20260205_093956
[2026-02-05 09:40:05,019] [ais_bench] [INFO] Keeping the first 5 prompts for dataset [openai_humaneval]
[2026-02-05 09:40:05,091] [ais_bench] [INFO] Starting inference tasks...
[2026-02-05 09:40:05,095] [ais_bench] [INFO] Partitioned into 1 tasks.
[2026-02-05 09:40:05,095] [ais_bench] [INFO] Merging datasets with the same model and inferencer...
[2026-02-05 09:40:05,136] [ais_bench] [INFO] Launch TasksMonitor, PID: 57321, Refresh interval: 0.5, Run in background: True
[2026-02-05 09:40:12,965] [ais_bench] [INFO] Debug mode, print progress directly
[2026-02-05 09:40:12,967] [ais_bench] [INFO] Task [vllm-api-general-chat/openai_humaneval]
[2026-02-05 09:40:13,024] [ais_bench] [INFO] Zero Retriever initialized, returning empty shot case for all queries
[2026-02-05 09:40:13,025] [ais_bench] [INFO] Apply ice template finished
[2026-02-05 09:40:13,027] [ais_bench] [INFO] Start warmup, run with concurrency: 16
Warmup: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:06<00:00, 6.92s/case]
[2026-02-05 09:40:19,951] [ais_bench] [INFO] Warmup finished Total Count: 1 Success Count: 1 Failed Count: 0
[2026-02-05 09:40:19,952] [ais_bench] [INFO] Dataset needed memory size: 0.00319767 MB
[2026-02-05 09:40:19,952] [ais_bench] [INFO] Memory usage check passed: 1.44% < 80% (Available: 985.75 GB)
[2026-02-05 09:40:19,954] [ais_bench] [INFO] Traffic request rate: 0 RPS with burstiness 1.0.
[2026-02-05 09:40:19,954] [ais_bench] [INFO] Request rate (0.0) or ramp end rps (None) < 0.1, sending all requests simultaneously
[2026-02-05 09:40:19,955] [ais_bench] [INFO] Debug mode, run with concurrency: 16
[2026-02-05 09:40:20,056] [ais_bench] [INFO] All subprocesses have finished deserializing the first batch of data
[2026-02-05 09:40:20,155] [ais_bench] [INFO] Starting progress bar Total data num: 5 Finished data num: 0 Left data num: 5
Progress: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:13<00:00, 2.60s/case]
POST=5 (0.0/s) RECV=5 (1.0/s) FAIL=0 (0.0/s) FINISH=5 (1.0/s)
[2026-02-05 09:40:33,176] [ais_bench] [INFO] Api infer task time elapsed: 20.21s
[2026-02-05 09:40:34,227] [ais_bench] [INFO] Inference tasks completed.
[2026-02-05 09:40:34,228] [ais_bench] [INFO] Starting evaluation tasks...
[2026-02-05 09:40:34,232] [ais_bench] [INFO] Partitioned into 1 tasks.
[2026-02-05 09:40:34,248] [ais_bench] [INFO] Launch TasksMonitor, PID: 57589, Refresh interval: 0.5, Run in background: True
[2026-02-05 09:40:41,998] [ais_bench] [INFO] Debug mode, print progress directly
[2026-02-05 09:40:42,052] [ais_bench] [INFO] Running 1-th replica of evaluation
Reading samples...
5it [00:00, 1029.98it/s]
Traceback (most recent call last):
File "/home/benchmark/ais_bench/benchmark/tasks/openicl_eval.py", line 521, in
raise e
File "/home/benchmark/ais_bench/benchmark/tasks/openicl_eval.py", line 518, in
evaluator.run()
File "/home/benchmark/ais_bench/benchmark/tasks/openicl_eval.py", line 98, in run
self._score()
File "/home/benchmark/ais_bench/benchmark/tasks/openicl_eval.py", line 283, in _score
result = icl_evaluator.evaluate(k, n, copy.deepcopy(test_set), **preds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/benchmark/ais_bench/benchmark/openicl/icl_evaluator/icl_base_evaluator.py", line 284, in evaluate
results = self.score(**current_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/benchmark/ais_bench/benchmark/datasets/humaneval.py", line 106, in score
score = evaluate_functional_correctness(out_dir, self.k, n_workers=4, timeout=3.0, problem_file=HUMAN_EVAL)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/human_eval/evaluation.py", line 73, in evaluate_functional_correctness
assert len(completion_id) == len(problems), "Some problems are not attempted."
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Some problems are not attempted.
[2026-02-05 09:40:43,378] [ais_bench] [INFO] Evaluation tasks completed.
[2026-02-05 09:40:43,378] [ais_bench] [INFO] Summarizing evaluation results...
dataset version metric mode vllm-api-general-chat
openai_humaneval -
前置检查
- 我已读懂主页文档的快速入门,无法解决问题
- 我已检索过FAQ,无重复问题
- 我已搜索过现有Issue,无重复问题
- 我已更新到最新版本,问题仍存在