Skip to content

[Bug] SWEBench Verified mini测试时发现模型未运行在正确的环境中 #304

@SJTUyh

Description

@SJTUyh

操作系统及版本

ubuntu 20.04

安装工具的python环境

在anaconda/miniconda创建的python虚拟环境

python版本

3.10

AISBench工具版本

3.1.20260415

AISBench执行命令

ais_bench mini_swe_agent_swe_bench_verified_mini.py

模型配置文件或自定义配置文件内容

与配置文件内容无关

预期行为

mini-swe-agent在容器执行过程中执行在指定的conda环境下。

实际行为

mini-swe-agent在容器执行过程中似乎不是执行在容器中指定的conda环境下。
证据如下:

  1. 执行环境中缺失docutils库,但是这个库在 容器中指定的conda环境内是有的
    {
      "content": "<returncode>1</returncode>\n<output>\nTraceback (most recent call last):\n  File \"<string>\", line 5, in <module>\n  File \"/testbed/sphinx/ext/autodoc/__init__.py\", line 20, in <module>\n    from docutils.statemachine import StringList\nModuleNotFoundError: No module named 'docutils'\n</output>",
      "extra": {
        "raw_output": "Traceback (most recent call last):\n  File \"<string>\", line 5, in <module>\n  File \"/testbed/sphinx/ext/autodoc/__init__.py\", line 20, in <module>\n    from docutils.statemachine import StringList\nModuleNotFoundError: No module named 'docutils'\n",
        "returncode": 1,
        "timestamp": 1777302400.3996165,
        "exception_info": ""
      },
      "tool_call_id": "chatcmpl-tool-a6b00b6448b119a1",
      "role": "tool"
    },
    {
      "content": "Let me install the dependencies.</think>",
      "role": "assistant",
      "tool_calls": [
        {
          "function": {
            "arguments": "{\"command\": \"pip install docutils -q\"}",
            "name": "bash"
          },
          "id": "chatcmpl-tool-bf8350a013476e7f",
          "type": "function"
        }
      ],
  1. 部分用例中模型有完整执行pip3 list的记录,这些记录显示的pip3 package与容器中非conda环境下一致:
{
      "content": "<returncode>0</returncode>\n<output>\nPackage                 Version\n----------------------- ----------\narchspec                0.2.1\nboltons                 23.0.0\nBrotli                  1.0.9\ncertifi                 2023.11.17\ncffi                    1.16.0\ncharset-normalizer      2.0.4\nconda                   23.11.0\nconda-content-trust     0.2.0\nconda-libmamba-solver   23.12.0\nconda-package-handling  2.2.0\nconda_package_streaming 0.9.0\ncryptography            41.0.7\ndistro                  1.8.0\nidna                    3.4\njsonpatch               1.32\njsonpointer             2.1\nlibmambapy              1.5.3\nmenuinst                2.0.1\npackaging               23.1\npip                     23.3.1\nplatformdirs            3.10.0\npluggy                  1.0.0\npycosat                 0.6.6\npycparser               2.21\npyOpenSSL               23.2.0\nPySocks                 1.7.1\nrequests                2.31.0\nruamel.yaml             0.17.21\nsetuptools              68.2.2\ntqdm                    4.65.0\ntruststore              0.8.0\nurllib3                 1.26.18\nwheel                   0.41.2\nzstandard               0.19.0\n</output>",
      "extra": {
        "raw_output": "Package                 Version\n----------------------- ----------\narchspec                0.2.1\nboltons                 23.0.0\nBrotli                  1.0.9\ncertifi                 2023.11.17\ncffi                    1.16.0\ncharset-normalizer      2.0.4\nconda                   23.11.0\nconda-content-trust     0.2.0\nconda-libmamba-solver   23.12.0\nconda-package-handling  2.2.0\nconda_package_streaming 0.9.0\ncryptography            41.0.7\ndistro                  1.8.0\nidna                    3.4\njsonpatch               1.32\njsonpointer             2.1\nlibmambapy              1.5.3\nmenuinst                2.0.1\npackaging               23.1\npip                     23.3.1\nplatformdirs            3.10.0\npluggy                  1.0.0\npycosat                 0.6.6\npycparser               2.21\npyOpenSSL               23.2.0\nPySocks                 1.7.1\nrequests                2.31.0\nruamel.yaml             0.17.21\nsetuptools              68.2.2\ntqdm                    4.65.0\ntruststore              0.8.0\nurllib3                 1.26.18\nwheel                   0.41.2\nzstandard               0.19.0\n",
        "returncode": 0,
        "timestamp": 1775764867.9952104,
        "exception_info": ""
      },
      "tool_call_id": "chatcmpl-tool-97be5297181e9e9f",
      "role": "tool"
    },

影响面机会所有的case都会受到影响,被迫让被测模型在一个不恰当的执行环境中解决问题,会影响精度

前置检查

  • 我已读懂主页文档的快速入门,无法解决问题
  • 我已检索过FAQ,无重复问题
  • 我已搜索过现有Issue,无重复问题
  • 我已更新到最新版本,问题仍存在

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcontent_check_passedissue content check passed

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions