Skip to content

Conversation

@simonrosenberg
Copy link
Collaborator

@simonrosenberg simonrosenberg commented Jan 27, 2026

Summary

This PR creates a single source of truth for all SWTBench hyperparameters and constant values by introducing benchmarks/swtbench/constants.py.

Fixes #364

Changes

1. Created benchmarks/swtbench/constants.py

This new file contains all constant values organized into logical categories:

  • Type Definitions: TargetType literal type alias for build targets
  • Docker/Image Related Constants: Docker image prefixes, agent server image base, prebaked registry, build targets, image tags
  • Build Mode: BuildMode enum with API and CLI options
  • Dataset Related Constants: Default dataset, split, model name
  • Environment Variable Names: All environment variable names used in the module
  • Default Values: Default values for various parameters (skip build, runtime API URL, startup timeout, workers, etc.)
  • File/Directory Paths: Repository directory, evaluation results directory, report filenames
  • Git/Repository Related Constants: SWT-bench repo URL, git user configuration
  • Patch Processing Constants: Files to remove from patches (as immutable tuple)
  • Environment Setup Commands: Default environment setup commands (as immutable tuple)

2. Updated all swtbench modules to import from constants.py

  • run_infer.py
  • eval_infer.py
  • build_eval_env_images.py
  • image_utils.py

3. Type Safety Improvements

  • All constants use typing.Final annotations for immutability
  • Mutable lists converted to immutable tuples:
    • SETUP_FILES_TO_REMOVE
    • BUILD_MODE_CHOICES
    • DEFAULT_ENV_SETUP_COMMANDS
  • BuildMode enum for type-safe build mode selection
  • TargetType literal type for build targets (matches openhands.sdk.workspace.TargetType)
  • Numeric constants use proper int types instead of strings:
    • DEFAULT_STARTUP_TIMEOUT: 600 (int)
    • DEFAULT_EVAL_WORKERS: 12 (int)

Benefits

  • Single Source of Truth: All hyperparameters are now in one place, making it easy to review and modify them
  • Better Maintainability: Changes to constants only need to be made in one location
  • Improved Discoverability: Developers can easily find all configurable values
  • Type Safety: Constants are properly typed with Final annotations and appropriate types
  • Immutability: Collection constants use tuples to prevent accidental mutation
  • Pyright Compliance: All type checks pass with strict mode

This commit creates a single source of truth for all SWTBench
hyperparameters and constant values by:

1. Creating benchmarks/swtbench/constants.py with all constants:
   - Docker/Image related constants (prefixes, registries, build targets)
   - Dataset related constants (default dataset, split, model name)
   - Environment variable names
   - Default values for various parameters
   - File/directory paths (repo dir, evaluation results dir, report filename)
   - Git/repository related constants
   - Patch processing constants
   - Environment setup commands

2. Updating all swtbench modules to import from constants.py:
   - run_infer.py
   - eval_infer.py
   - build_eval_env_images.py
   - image_utils.py

This makes it easier to check and modify hyperparameters for the
benchmark as they are now centralized in one location.

Fixes #364

Co-authored-by: openhands <openhands@all-hands.dev>
@simonrosenberg simonrosenberg force-pushed the openhands/swtbench-constants-refactor branch from e275ab3 to 328490e Compare January 28, 2026 10:16
…ility

- Add typing.Final annotations to all constants for type safety
- Convert mutable lists to immutable tuples:
  - SETUP_FILES_TO_REMOVE
  - BUILD_MODE_CHOICES
  - DEFAULT_ENV_SETUP_COMMANDS
- Add BuildMode enum for type-safe build mode selection
- Convert string numeric constants to proper int types:
  - DEFAULT_STARTUP_TIMEOUT: '600' -> 600
  - DEFAULT_EVAL_WORKERS: '12' -> 12
- Update callers to handle type changes:
  - eval_infer.py: workers parameter now int, add type=int to argparse
  - run_infer.py: convert tuple to list for env_setup_commands,
    convert int to str for os.getenv default

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Jan 28, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Pre-commit checks

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #372 at branch `openhands/swtbench-constants-refactor`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

- Add TargetType alias to constants.py matching openhands.sdk.workspace.TargetType
- Update DEFAULT_BUILD_TARGET to use TargetType instead of str
- Update run_infer.py to use TargetType for build_target variable and function parameter

Co-authored-by: openhands <openhands@all-hands.dev>
@OpenHands OpenHands deleted a comment from openhands-ai bot Jan 28, 2026
@OpenHands OpenHands deleted a comment from openhands-ai bot Jan 28, 2026
@simonrosenberg
Copy link
Collaborator Author

@OpenHands
/codereview-roasted
Make sure this PR doesn't alter any constant or behavior
Suggest next step refactoring for cleaner code

@openhands-ai
Copy link

openhands-ai bot commented Jan 28, 2026

I'm on it! simonrosenberg can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Jan 28, 2026

🔴 Code Review: PR #372 - SWTBench Constants Refactor

Taste Rating: 🟡 Acceptable - Works but has one behavioral change that needs attention


[CRITICAL ISSUES] (Must fix - these break fundamental principles)

[benchmarks/swtbench/eval_infer.py, Lines 246-248] ⚠️ Breaking Change: The workers parameter type changed from str to int:

# Original:
def run_swtbench_evaluation(..., workers: str = "12") -> None:

# New:
def run_swtbench_evaluation(..., workers: int = DEFAULT_EVAL_WORKERS) -> None:

While this works at runtime (Python is dynamically typed), it's a function signature change that could break type-checked callers. The argparse also changed from accepting a string to type=int.

Recommendation: If this is intentional (improving type correctness), document it in the PR description as a breaking change. Otherwise, keep workers: str and use DEFAULT_EVAL_WORKERS_STR: Final[str] = "12".


[IMPROVEMENT OPPORTUNITIES] (Should fix - violates good taste)

[benchmarks/swtbench/constants.py, Line 35] 🔍 Magic Number: IMAGE_NAME_SEPARATOR = "1776" - This cryptic value needs a comment explaining its origin. Is it a date? A convention from SWE-bench? Future maintainers will be confused.

[benchmarks/swtbench/constants.py, Lines 38-49] 🔧 Inconsistent Enum Usage: You define BuildMode enum but then use .value everywhere:

DEFAULT_BUILD_MODE: Final[str] = BuildMode.CLI.value  # Why not just BuildMode.CLI?
BUILD_MODE_CHOICES: Final[Tuple[str, ...]] = tuple(m.value for m in BuildMode)

Consider using the enum directly and converting to string only at the argparse boundary.

[benchmarks/swtbench/constants.py, Line 13] 🔧 Inconsistent Type Definition: TargetType is a Literal while BuildMode is an Enum. For consistency, consider making TargetType an enum too, or document why they differ.


[STYLE NOTES] (Minor observations)

[benchmarks/swtbench/constants.py] 📝 The heavy use of section separators (# ====) is verbose. Consider using a single blank line between sections - the code is already well-organized.

[benchmarks/swtbench/eval_infer.py, Line 214]SETUP_FILES_TO_REMOVE changed from list to tuple - this is fine since remove_files_from_patch accepts any iterable.


VERDICT:

Needs minor rework: The workers type change from str to int is a behavioral/API change that should be explicitly acknowledged or reverted.


KEY INSIGHT:

The refactoring successfully centralizes constants without changing their values, but inadvertently "improves" the workers parameter type from str to int - which, while arguably better, is a breaking change that violates the "don't alter behavior" requirement.


Suggested Next Steps for Cleaner Code:

  1. Fix the workers type issue - either revert to str or document as intentional breaking change
  2. Add docstring to IMAGE_NAME_SEPARATOR explaining the "1776" convention
  3. Consider grouping related constants into @dataclass(frozen=True) for better organization:
    @dataclass(frozen=True)
    class DockerConfig:
        swebench_prefix: str = "docker.io/swebench/"
        swtbench_prefix: str = "docker.io/swtbench/"
        agent_server_base: str = "ghcr.io/all-hands-ai/agent-server"
  4. Unify type definitions - either all Literals or all Enums for consistency
  5. Consider adding a __all__ export list to make the public API explicit

View full conversation

Remove the BuildMode enum class and replace with simple string constants.
The enum was only used to generate choices and default values, which can
be done more simply with a tuple and string constant.

Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regroup all swtbench hyper parameters in a single source of truth benchmarks/swtbench/constants.py

3 participants