[Bug] GPQA答案解析成null错误导致精度异常（MiniMax-M2.7量化版GPQA数据集返回**Answer:**A无法解析）

### 操作系统及版本

Linux w24 5.10.0-216.0.0.115.oe2203sp4.aarch64

### 安装工具的python环境

在anaconda/miniconda创建的python虚拟环境

### python版本

3.10

### AISBench工具版本

3.1.20260330

### AISBench执行命令

ais_bench --models vllm_api_general_chat --datasets gpqa_gen_0_shot_cot_chat_prompt.py --mode all --dump-eval-details --merge-ds --debug

### 模型配置文件或自定义配置文件内容

from ais_bench.benchmark.models import VLLMCustomAPIChat
from ais_bench.benchmark.utils.postprocess.model_postprocessors import extract_non_reasoning_content

models = [
    dict(
        attr="service",
        type=VLLMCustomAPIChat,
        abbr="vllm-api-general-chat",
        path="path/MiniMax-M2.7-w8a8",
        model="minimax27",
        stream=False,
        request_rate=0,
        use_timestamp=False,
        retry=2,
        host_ip="localhost",
        host_port=8015,
        max_out_len=65536,
        batch_size=32,
        trust_remote_code=False,
        generation_kwargs=dict(
            seed=None,
        ),
        pred_postprocessor=dict(type=extract_non_reasoning_content),
    )
]

### 预期行为

形如“**Answer:** C”、“**Answer:** **C**”等输出应能被成功解析。

### 实际行为

“**Answer:** C”、“**Answer:** **C**” 被解析成了null，导致此题算作错题，影响了精度结果：

<img width="489" height="104" alt="Image" src="https://github.com/user-attachments/assets/083780fc-e151-4df0-b5b0-a8579dc12d09" />

<img width="285" height="381" alt="Image" src="https://github.com/user-attachments/assets/c403aa6f-c68a-4bff-8f83-efa251d11d12" />
附一次完整测试结果：

[GPQA_diamond.json](https://github.com/user-attachments/files/26674928/GPQA_diamond.json)

### 前置检查

- [x] 我已读懂主页文档的快速入门，无法解决问题
- [x] 我已检索过FAQ，无重复问题
- [x] 我已搜索过现有Issue，无重复问题
- [x] 我已更新到最新版本，问题仍存在

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] GPQA答案解析成null错误导致精度异常（MiniMax-M2.7量化版GPQA数据集返回Answer:A无法解析） #245

操作系统及版本

安装工具的python环境

python版本

AISBench工具版本

AISBench执行命令

模型配置文件或自定义配置文件内容

预期行为

实际行为

前置检查

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] GPQA答案解析成null错误导致精度异常（MiniMax-M2.7量化版GPQA数据集返回**Answer:**A无法解析） #245

Description

操作系统及版本

安装工具的python环境

python版本

AISBench工具版本

AISBench执行命令

模型配置文件或自定义配置文件内容

预期行为

实际行为

前置检查

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug] GPQA答案解析成null错误导致精度异常（MiniMax-M2.7量化版GPQA数据集返回Answer:A无法解析） #245