Skip to content

Add V1 harness type aliases#1425

Open
xeophon wants to merge 1 commit into
mainfrom
harness-type-aliases
Open

Add V1 harness type aliases#1425
xeophon wants to merge 1 commit into
mainfrom
harness-type-aliases

Conversation

@xeophon
Copy link
Copy Markdown
Member

@xeophon xeophon commented May 20, 2026

Summary

  • add a first-class V1 harness type alias registry so generic harness: vf.HarnessConfig fields can resolve TOML aliases like harness = "pi" and [eval.harness] type = "terminus2"
  • register aliases for existing packaged harness configs so reusable taskset environments can select OpenCode, mini-swe-agent, Pi, RLM, or Terminus 2 by name
  • document the TOML forms and add regression coverage for eval config normalization, alias coercion, command harness construction, and generated environment guides

Testing

  • uv run pytest tests/test_v1_config_extension.py tests/test_v1_harbor_cli.py tests/test_eval_cli.py -q
  • uv run pre-commit run --all-files
  • git diff --check
  • live 1x1 smoke: harness = "pi" with prime eval run completed with reward 1.0
  • live 1x1 smoke: [eval.harness] type = "pi" with prime eval run completed with reward 1.0

Note

Medium Risk
Adds new alias-based coercion paths for v1 harness config parsing from TOML/CLI, which can affect environment loading and validation behavior for existing configs. Also tweaks generated shell snippets for packaged harnesses, with low runtime risk but potential for regressions in command execution if misquoted.

Overview
Enables selecting packaged v1 harness configs by name when an environment keeps EnvConfig.harness: vf.HarnessConfig generic, supporting both harness = "pi" and [eval.harness] type = "terminus2" style TOML.

Implements a first-class config type-alias registry (config_type_alias) and extends EnvConfig validation to resolve these aliases (including collision checks and type validation). Updates eval TOML normalization to pass string harness selections through, adds regression tests for alias selection, and adjusts OpenCode/Pi harness shell guards to be POSIX-compatible ([ -z ... ]).

Reviewed by Cursor Bugbot for commit b3f0633. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add V1 harness type alias resolution to EnvConfig

  • EnvConfig.validate_child_config now resolves harness from a bare alias string (e.g., "pi") or a mapping with a type key (e.g., {type = "terminus2"}) when harness is annotated as the base HarnessConfig.
  • config_utils.py adds _config_aliases, a per-class alias registry, and a new config_type_alias() lookup function; alias collisions raise TypeError at registration time.
  • env_config_utils.py is updated to pass a bare string harness value through to env_args.config.harness without table normalization, supporting TOML shorthand like harness = "pi".
  • Shell setup scripts in the OpenCode and Pi harnesses are fixed to use POSIX-compatible [ -z ] instead of [[ -z ]].

Macroscope summarized b3f0633.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 063a28f. Configure here.

Comment thread docs/evaluation.md
@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 20, 2026

Approvability

Verdict: Needs human review

This PR introduces new runtime configuration resolution logic for harness type aliases, which affects how TOML configs are parsed and instantiated. While the changes are well-tested and the review comments' concerns appear addressed in code, the new config coercion logic and alias registration system warrant human review to verify the behavior matches expectations.

You can customize Macroscope's approvability policy. Learn more.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 063a28f5f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread verifiers/v1/utils/config_utils.py Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 163d2de10f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +72 to +73
key = "".join(char for char in alias.lower() if char.isalnum())
_CONFIG_TYPE_ALIASES[(base_cls, key)] = config_cls
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject alias collisions instead of overwriting registrations

register_config_owner now writes every normalized alias directly into the process-global _CONFIG_TYPE_ALIASES map, so a later class with the same alias silently replaces the earlier mapping. In practice, if any custom harness normalizes to an existing packaged name (for example, another pi-like alias), harness = "pi" will instantiate whichever class was imported last, making config resolution import-order dependent across environments.

Useful? React with 👍 / 👎.

Comment thread verifiers/v1/config.py Outdated
Comment on lines +345 to +346
return config_type_alias(
cast(str, data.pop("type")), HarnessConfig
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate harness.type is a string before alias lookup

The type field is cast to str and passed to config_type_alias without a runtime check. If TOML supplies a non-string value (e.g. type = 123), alias normalization calls .lower() on a non-string and raises an internal exception instead of a clear config validation error, which makes malformed eval configs fail with an opaque traceback.

Useful? React with 👍 / 👎.

mikasenghaas
mikasenghaas previously approved these changes May 20, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b3f0633ea7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 65 to +68
_CONFIG_OWNERS[key] = owner_cls
for alias in (owner_cls.__name__, config_cls.__name__.removesuffix("Config")):
key = "".join(char for char in alias.lower() if char.isalnum())
_CONFIG_TYPE_ALIASES.setdefault((base_cls, key), config_cls)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Defer registry writes until alias validation succeeds

register_config_owner mutates global registries before collision checks complete: it stores _CONFIG_OWNERS and default type aliases first, then may raise on an explicit alias conflict. In a class definition like _config_aliases = ("pi",), __init_subclass__ raises, but the earlier entries remain, so the rejected harness can still be resolved by its class-name alias and even instantiated later via Env(config=...). This leaves process-wide config resolution in a partially committed state after a failed registration and makes subsequent harness lookup behavior inconsistent.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants