feat(config): add temperature/top_k/top_p generation_kwargs to SWE-bench examples#377
Conversation
…nch examples Add commented-out temperature, top_p, top_k defaults to generation_kwargs in all 8 SWE-bench and SWE-bench Pro example config files. This shows users where to configure inference randomness for reproducible results. The _get_minisweagent_config() function already forwards these kwargs to mini-swe-agent's LiteLLM model config via model_kwargs, and both recursive_merge (swebench) and merge_nested_dicts (swebench_pro) correctly deep-merge them into the YAML config. Closes AISBench#376
There was a problem hiding this comment.
Code Review
This pull request updates several SWE-bench configuration files to include commented-out generation parameters (temperature, top_p, and top_k) inside generation_kwargs to serve as configuration examples. There are no review comments, so I have no feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Pull request overview
This PR updates the SWE-bench and SWE-bench Pro example configuration files to explicitly show where to set generation sampling controls (temperature, top_p, top_k) via generation_kwargs, improving discoverability and reproducibility guidance without changing runtime behavior (the dict remains effectively empty unless users uncomment values).
Changes:
- Replaced
generation_kwargs=dict()with a multi-linegeneration_kwargs=dict(...)block containing commented-out defaults fortemperature,top_p, andtop_kacross 8 example configs. - Added an inline comment clarifying how
temperatureaffects determinism vs diversity.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_verified.py | Adds commented-out sampling params under generation_kwargs for the verified SWE-bench example. |
| ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_full.py | Adds commented-out sampling params under generation_kwargs for the full SWE-bench example. |
| ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_lite.py | Adds commented-out sampling params under generation_kwargs for the lite SWE-bench example. |
| ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_verified_mini.py | Adds commented-out sampling params under generation_kwargs for the verified-mini SWE-bench example. |
| ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_multilingual.py | Adds commented-out sampling params under generation_kwargs for the multilingual SWE-bench example. |
| ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_multilingual_mini.py | Adds commented-out sampling params under generation_kwargs for the multilingual-mini SWE-bench example. |
| ais_bench/configs/swe_bench_pro_examples/mini_swe_agent_swe_bench_pro_full.py | Adds commented-out sampling params under generation_kwargs for the SWE-bench Pro full example. |
| ais_bench/configs/swe_bench_pro_examples/mini_swe_agent_swe_bench_pro_mini.py | Adds commented-out sampling params under generation_kwargs for the SWE-bench Pro mini example. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…arbitrary params Add timeout=200 to generation_kwargs in all 8 SWE-bench/SWE-bench Pro example configs. Also add a comment explaining that generation_kwargs supports arbitrary generation parameters, consistent with regular model tasks — addressing reviewer feedback from PR AISBench#377.
Summary
Add commented-out
temperature,top_p,top_kdefaults togeneration_kwargsin all 8 SWE-bench and SWE-bench Pro example config files. This shows users where to configure inference randomness for reproducible results.Problem
All 8 SWE-bench example configs had
generation_kwargs=dict()(empty), while other benchmark examples (e.g.,api_examples/infer_vllm_api_general_chat.py) properly demonstratetemperature,top_k,top_p. Users were unaware they could/should configure these generation parameters for SWE-bench.Analysis
The merge pipeline is already correct:
_get_minisweagent_config()convertsgeneration_kwargs→model.model_kwargsrecursive_merge(swebench) andmerge_nested_dicts(swebench_pro) both do proper deep merge ofmodel_kwargsinto the mini-swe-agent YAML configChanges
Updated all 8 files with commented-out defaults:
Files changed:
ais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_verified.pyais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_full.pyais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_lite.pyais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_verified_mini.pyais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_multilingual.pyais_bench/configs/swe_bench_examples/mini_swe_agent_swe_bench_multilingual_mini.pyais_bench/configs/swe_bench_pro_examples/mini_swe_agent_swe_bench_pro_full.pyais_bench/configs/swe_bench_pro_examples/mini_swe_agent_swe_bench_pro_mini.pyCloses #376