-
Notifications
You must be signed in to change notification settings - Fork 77
Simplify evaluation workflow by removing benchmarks build polling #1267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
aff4343
Simplify run-eval workflow by removing polling logic
openhands-agent edeab93
Add eval_branch parameter for testing feature branches
openhands-agent 12b9613
Move model configs to SDK and pass full configs to evaluation
openhands-agent 7b1bc9a
Add authorization validation for workflow_dispatch
openhands-agent fd7cc3d
Add trigger_reason propagation to evaluation workflow
openhands-agent c64080a
Remove model stubs JSON and use models.json as single source of truth
openhands-agent e7c1787
Address code review comments
openhands-agent 8ecec55
Add tests for find_models_by_id() function
openhands-agent a00a3ed
change file name
simonrosenberg 1eb3c5c
Merge branch 'main' into openhands/orchestration-refactor
simonrosenberg d17b85d
Implement evaluation workflow improvements
openhands-agent 644805b
Add benchmarks_branch parameter to support feature branch testing
openhands-agent 7782d48
Fix module name in workflow from resolve_model_configs to resolve_mod…
openhands-agent fc143ed
Rename resolve_model_configs.py to resolve_model_config.py for consis…
openhands-agent 36755b7
update step name for clarity
simonrosenberg df3d693
Merge branch 'main' into openhands/orchestration-refactor
simonrosenberg 45afbc3
Add temporary pr_number input to workflow_dispatch for testing PR com…
simonrosenberg 31a5464
Rename test file to match module name
simonrosenberg a64cf88
Simplify SDK SHA resolution and remove temporary parameters
simonrosenberg 87d29d5
Fix initial checkout to handle short SHA references
simonrosenberg e5bbafe
Add temporary pr_number input for testing clickable links
simonrosenberg 506abb6
Revert "Add temporary pr_number input for testing clickable links"
simonrosenberg 9898814
Merge branch 'main' into openhands/orchestration-refactor
simonrosenberg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,112 @@ | ||
| #!/usr/bin/env python3 | ||
| """ | ||
| Resolve model IDs to full model configurations. | ||
|
|
||
| Reads: | ||
| - MODEL_IDS: comma-separated model IDs | ||
|
|
||
| Outputs to GITHUB_OUTPUT: | ||
| - models_json: JSON array of full model configs with display names | ||
| """ | ||
|
|
||
| import json | ||
| import os | ||
| import sys | ||
|
|
||
|
|
||
| # Model configurations dictionary | ||
| MODELS = { | ||
| "claude-sonnet-4-5-20250929": { | ||
| "id": "claude-sonnet-4-5-20250929", | ||
| "display_name": "Claude Sonnet 4.5", | ||
| "llm_config": { | ||
| "model": "litellm_proxy/claude-sonnet-4-5-20250929", | ||
| "temperature": 0.0, | ||
| }, | ||
| }, | ||
| "claude-haiku-4-5-20251001": { | ||
| "id": "claude-haiku-4-5-20251001", | ||
| "display_name": "Claude Haiku 4.5", | ||
| "llm_config": { | ||
| "model": "litellm_proxy/claude-haiku-4-5-20251001", | ||
| "temperature": 0.0, | ||
| }, | ||
| }, | ||
| "gpt-5-mini-2025-08-07": { | ||
| "id": "gpt-5-mini-2025-08-07", | ||
| "display_name": "GPT-5 Mini", | ||
| "llm_config": { | ||
| "model": "litellm_proxy/gpt-5-mini-2025-08-07", | ||
| "temperature": 1.0, | ||
| }, | ||
| }, | ||
| "deepseek-chat": { | ||
| "id": "deepseek-chat", | ||
| "display_name": "DeepSeek Chat", | ||
| "llm_config": {"model": "litellm_proxy/deepseek/deepseek-chat"}, | ||
| }, | ||
| "kimi-k2-thinking": { | ||
| "id": "kimi-k2-thinking", | ||
| "display_name": "Kimi K2 Thinking", | ||
| "llm_config": {"model": "litellm_proxy/moonshot/kimi-k2-thinking"}, | ||
| }, | ||
| } | ||
|
|
||
|
|
||
| def error_exit(msg: str, exit_code: int = 1) -> None: | ||
| """Print error message and exit.""" | ||
| print(f"ERROR: {msg}", file=sys.stderr) | ||
| sys.exit(exit_code) | ||
|
|
||
|
|
||
| def get_required_env(key: str) -> str: | ||
| """Get required environment variable or exit with error.""" | ||
| value = os.environ.get(key) | ||
| if not value: | ||
| error_exit(f"{key} not set") | ||
| return value | ||
|
|
||
|
|
||
| def find_models_by_id(model_ids: list[str]) -> list[dict]: | ||
| """Find models by ID. Fails fast on missing ID. | ||
|
|
||
| Args: | ||
| model_ids: List of model IDs to find | ||
|
|
||
| Returns: | ||
| List of model dictionaries matching the IDs | ||
|
|
||
| Raises: | ||
| SystemExit: If any model ID is not found | ||
| """ | ||
| resolved = [] | ||
| for model_id in model_ids: | ||
| if model_id not in MODELS: | ||
| available = ", ".join(sorted(MODELS.keys())) | ||
| error_exit( | ||
| f"Model ID '{model_id}' not found. Available models: {available}" | ||
| ) | ||
| resolved.append(MODELS[model_id]) | ||
| return resolved | ||
|
|
||
|
|
||
| def main() -> None: | ||
| model_ids_str = get_required_env("MODEL_IDS") | ||
| github_output = get_required_env("GITHUB_OUTPUT") | ||
|
|
||
| # Parse requested model IDs | ||
| model_ids = [mid.strip() for mid in model_ids_str.split(",") if mid.strip()] | ||
|
|
||
| # Resolve model configs | ||
| resolved = find_models_by_id(model_ids) | ||
|
|
||
| # Output as JSON | ||
| models_json = json.dumps(resolved, separators=(",", ":")) | ||
| with open(github_output, "a", encoding="utf-8") as f: | ||
| f.write(f"models_json={models_json}\n") | ||
|
|
||
| print(f"Resolved {len(resolved)} model(s): {', '.join(model_ids)}") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.