Conversation
There was a problem hiding this comment.
Pull request overview
Ensures test runs don’t leave behind output/adhoc artifacts by routing “adhoc” outputs/logs to session-scoped temporary directories and cleaning them up at teardown.
Changes:
- Add a
default_adhoc_parent()helper (env-overridable viaVERA_ADHOC_PARENT) for single-run outputs and unscoped judge logs. - Update
judge.pyandLLMJudgeto use the new adhoc parent default. - Configure pytest session setup/teardown to set
VERA_ADHOC_PARENT/VERA_JUDGE_LOGS_ROOTto temp dirs and remove them after the session.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/judge/test_judge_cli.py | Updates CLI unit test assertions around single-run output folder naming. |
| tests/conftest.py | Adds session-scoped temp directories + env var overrides; cleans up on session finish. |
| judge/utils.py | Introduces default_adhoc_parent() to centralize adhoc output root selection. |
| judge/llm_judge.py | Routes unscoped log default under default_adhoc_parent(). |
| judge.py | Uses default_adhoc_parent() for -c mode when --output is not provided; updates help text. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert ja[0] == "judge_instance" | ||
| assert ja[1] == "conversation_data" | ||
| assert str(ja[2]).startswith("output/adhoc/single_") | ||
| out_run = Path(ja[2]) |
There was a problem hiding this comment.
This test no longer verifies that single-file runs are created under the configured adhoc parent (the behavior this PR is introducing via VERA_ADHOC_PARENT). As written, it would still pass if main() continued writing to the real output/adhoc directory, since it only checks the folder basename. Consider asserting Path(ja[2]).parent equals the temp adhoc directory (e.g., Path(os.environ["VERA_ADHOC_PARENT"])) or _judge_script.default_adhoc_parent() to prevent regressions and ensure no repo-local outputs are created during tests.
| out_run = Path(ja[2]) | |
| out_run = Path(ja[2]) | |
| assert out_run.parent == _judge_script.default_adhoc_parent() |
emily-vanark
left a comment
There was a problem hiding this comment.
Does what it says on the tin. Huzzah! (All the tests pass, no krufty new files left behind in my output folder!)
Description
When running tests, the output directory had leftover test assets
This PR ensures a clean setup/teardown
Issue
closes #152