Skip to content

refactor: unify Scene/Role/Turn types into _types.py (ENG-47)#262

Merged
devin-ai-integration[bot] merged 1 commit into
refactor/v0.4from
devin/1778882210-eng-47-unify-types
May 16, 2026
Merged

refactor: unify Scene/Role/Turn types into _types.py (ENG-47)#262
devin-ai-integration[bot] merged 1 commit into
refactor/v0.4from
devin/1778882210-eng-47-unify-types

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented May 15, 2026

Summary

Phase 1 of the v0.4 type refactor (ENG-47). Creates one canonical set of Role, Scene, and Turn dataclasses in src/benchflow/_types.py, replacing the duplicate definitions that lived in trial.py and _scene.py.

New fields (all default to None — no runtime behavior changes):

  • Role.timeout_sec / Role.idle_timeout_sec — per-role timeout support
  • Role.skills_dir — per-role skills directory
  • Scene.parallel_group — scenes with the same group will execute concurrently

Import changes:

  • trial.py imports Role, Scene, Turn from _types.py (re-exports for backward compat)
  • _scene.py renames its internal RoleSceneRole (has different fields: instruction, tools); keeps backward-compat alias Role = SceneRole
  • __init__.py re-exports canonical types from _types.py; adds TrialRole = Role and TrialScene = Scene aliases
  • runtime.py and trial_yaml.py import from _types.py

Not touched (per ENG-47 scope): sdk.py, runtime.py logic, trial.py logic, sandbox/reward/verifier code.

Review & Testing Checklist for Human

  • Verify from benchflow import Role, Scene, Turn resolves to the _types.py versions (canonical types)
  • Verify from benchflow.trial import Role, Scene, Turn still works (re-export)
  • Verify from benchflow._scene import Role still works (backward-compat alias for SceneRole)
  • Run smoke test: uv run bench run --source-repo benchflow-ai/skillsbench --source-path tasks/jax-computing-basics --agent gemini --model gemini-3.1-flash-lite-preview -e daytona

Notes

  • All 812 existing tests pass, ruff lint clean, ty typecheck clean.
  • The _scene.py::SceneRole is intentionally kept separate from the canonical Role because it carries runtime-specific fields (instruction, tools) that don't belong on the declarative type.
  • This is Phase 1 only — no dependencies on ENG-46/48/49.

Link to Devin session: https://app.devin.ai/sessions/6d33e83b375c4c16976a11b9805c0e13
Requested by: @xdotli


Open in Devin Review

Create src/benchflow/_types.py as the single canonical source for the
declarative Role, Scene, and Turn dataclasses.

Changes:
- New _types.py with Role (adds timeout_sec, idle_timeout_sec,
  skills_dir), Scene (adds parallel_group), and Turn
- trial.py imports from _types.py instead of defining its own copies
- _scene.py renames its internal Role to SceneRole (different fields:
  instruction, tools) with backward-compat alias
- __init__.py re-exports canonical types from _types.py; adds
  TrialRole/TrialScene backward-compat aliases
- runtime.py and trial_yaml.py updated to import from _types.py

New fields default to None — no runtime behavior changes.
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown
Contributor Author

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Test Report: ENG-47 — Unified Scene/Role/Turn Types

All 4 tests passed. Shell-based testing (no browser UI).

Results
Test Result
Import path verification (all 3 paths → _types.py) ✅ Passed
Backward-compat aliases (TrialRole, TrialScene, _scene.Role) ✅ Passed
New fields default to None (timeout_sec, idle_timeout_sec, parallel_group) ✅ Passed
Integration test (jax-computing-basics, gemini, Daytona) ✅ Passed
Integration test output
Task: jax-computing-basics
Agent: gemini-cli
Rewards: {'reward': 0.0}
Tool calls: 14
Exit code: 0

Full pipeline ran: bench runScene.single()TrialConfigTrial → gemini agent on Daytona. No import errors, no type errors.

CI note

The test CI job failure is pre-existing on mainruff format check on metrics.py, unrelated to this PR.

Devin session

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Closing — this is now included in the combined refactor branch: #272 (refactor/v0.4main). All changes from this PR are preserved there.

@xdotli xdotli closed this May 16, 2026
@devin-ai-integration devin-ai-integration Bot changed the base branch from main to refactor/v0.4 May 16, 2026 01:05
@devin-ai-integration devin-ai-integration Bot merged commit 2a45331 into refactor/v0.4 May 16, 2026
1 of 4 checks passed
@xdotli xdotli deleted the devin/1778882210-eng-47-unify-types branch May 17, 2026 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant