Skip to content

feat(config): add temperature/top_k/top_p generation_kwargs to SWE-bench examples#377

Merged
SJTUyh merged 3 commits into
AISBench:masterfrom
zhongzhouTan-coder:zhongzhouTan-coder/add-generation-kwargs-swebench-examples
Jun 30, 2026
Merged

feat(config): add temperature/top_k/top_p generation_kwargs to SWE-bench examples#377
SJTUyh merged 3 commits into
AISBench:masterfrom
zhongzhouTan-coder:zhongzhouTan-coder/add-generation-kwargs-swebench-examples

Conversation

@zhongzhouTan-coder

Copy link
Copy Markdown
Collaborator

Summary

Add commented-out temperature, top_p, top_k defaults to generation_kwargs in all 8 SWE-bench and SWE-bench Pro example config files. This shows users where to configure inference randomness for reproducible results.

Problem

All 8 SWE-bench example configs had generation_kwargs=dict() (empty), while other benchmark examples (e.g., api_examples/infer_vllm_api_general_chat.py) properly demonstrate temperature, top_k, top_p. Users were unaware they could/should configure these generation parameters for SWE-bench.

Analysis

The merge pipeline is already correct:

  • _get_minisweagent_config() converts generation_kwargsmodel.model_kwargs
  • recursive_merge (swebench) and merge_nested_dicts (swebench_pro) both do proper deep merge of model_kwargs into the mini-swe-agent YAML config
  • No code changes needed — just the config examples

Changes

Updated all 8 files with commented-out defaults:

generation_kwargs=dict(
    # temperature=0.0,   # Set 0 for deterministic output; omit or set >0 for diversity
    # top_p=1.0,
    # top_k=-1,
),

Files changed:

  • ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_verified.py
  • ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_full.py
  • ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_lite.py
  • ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_verified_mini.py
  • ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_multilingual.py
  • ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_multilingual_mini.py
  • ais_bench/configs/swe_bench_pro_examples/mini_swe_agent_swe_bench_pro_full.py
  • ais_bench/configs/swe_bench_pro_examples/mini_swe_agent_swe_bench_pro_mini.py

Closes #376

…nch examples

Add commented-out temperature, top_p, top_k defaults to generation_kwargs
in all 8 SWE-bench and SWE-bench Pro example config files. This shows
users where to configure inference randomness for reproducible results.

The _get_minisweagent_config() function already forwards these kwargs
to mini-swe-agent's LiteLLM model config via model_kwargs, and both
recursive_merge (swebench) and merge_nested_dicts (swebench_pro)
correctly deep-merge them into the YAML config.

Closes AISBench#376
Copilot AI review requested due to automatic review settings June 27, 2026 08:50

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates several SWE-bench configuration files to include commented-out generation parameters (temperature, top_p, and top_k) inside generation_kwargs to serve as configuration examples. There are no review comments, so I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@zhongzhouTan-coder zhongzhouTan-coder self-assigned this Jun 27, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the SWE-bench and SWE-bench Pro example configuration files to explicitly show where to set generation sampling controls (temperature, top_p, top_k) via generation_kwargs, improving discoverability and reproducibility guidance without changing runtime behavior (the dict remains effectively empty unless users uncomment values).

Changes:

  • Replaced generation_kwargs=dict() with a multi-line generation_kwargs=dict(...) block containing commented-out defaults for temperature, top_p, and top_k across 8 example configs.
  • Added an inline comment clarifying how temperature affects determinism vs diversity.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_verified.py Adds commented-out sampling params under generation_kwargs for the verified SWE-bench example.
ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_full.py Adds commented-out sampling params under generation_kwargs for the full SWE-bench example.
ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_lite.py Adds commented-out sampling params under generation_kwargs for the lite SWE-bench example.
ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_verified_mini.py Adds commented-out sampling params under generation_kwargs for the verified-mini SWE-bench example.
ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_multilingual.py Adds commented-out sampling params under generation_kwargs for the multilingual SWE-bench example.
ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_multilingual_mini.py Adds commented-out sampling params under generation_kwargs for the multilingual-mini SWE-bench example.
ais_bench/configs/swe_bench_pro_examples/mini_swe_agent_swe_bench_pro_full.py Adds commented-out sampling params under generation_kwargs for the SWE-bench Pro full example.
ais_bench/configs/swe_bench_pro_examples/mini_swe_agent_swe_bench_pro_mini.py Adds commented-out sampling params under generation_kwargs for the SWE-bench Pro mini example.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…arbitrary params

Add timeout=200 to generation_kwargs in all 8 SWE-bench/SWE-bench Pro
example configs. Also add a comment explaining that generation_kwargs
supports arbitrary generation parameters, consistent with regular model
tasks — addressing reviewer feedback from PR AISBench#377.
Copilot AI review requested due to automatic review settings June 30, 2026 02:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

@SJTUyh SJTUyh merged commit 37829a6 into AISBench:master Jun 30, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[需求] SWE-bench 示例配置缺少 temperature/top_k/top_p 等 generation_kwargs 示范

3 participants