support --resume to rerun skipped conversations#127
support --resume to rerun skipped conversations#127jgieringer merged 9 commits intofail-after-retryfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a --resume mode to conversation generation to continue incomplete runs by reusing an existing run folder and skipping persona/run pairs that already have transcripts.
Changes:
- Add
--resumeflag togenerate.pyand propagate it intoConversationRunner. - Implement resume logic in
ConversationRunnerto detect existing transcript files and skip regeneration for those persona/run pairs. - Add unit + integration tests covering resume folder reuse/mismatch validation and resume skipping behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
generate.py |
Adds --resume, parses/validates run folder metadata, and reuses existing run folder/run_id in resume mode. |
generate_conversations/runner.py |
Adds resume option; indexes existing transcripts and returns “skipped” results for already-present persona/run pairs. |
tests/unit/generate_conversations/test_generate_cli.py |
Verifies resume reuses the provided run folder and validates metadata mismatches. |
tests/integration/test_conversation_runner.py |
Adds integration coverage for resume skip behavior (and non-resume behavior) including underscore-containing persona names. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
sator-labs
left a comment
There was a problem hiding this comment.
Ok this is very useful thank you
Sorry for being annoying about centralizing the utils, but I think it's a good time to do that!
| raise ValueError( | ||
| "Resume mode requires --run-id to match the run folder name when set." | ||
| ) | ||
| elif run_id is None: |
There was a problem hiding this comment.
should force also be here somewhere?
There was a problem hiding this comment.
How do you mean?
There was a problem hiding this comment.
i feel the if should also consider the force == True case?
There was a problem hiding this comment.
Like add a param called force and if true, overwrite all things?
There was a problem hiding this comment.
oh sorry i got confused. there is a force parameter but it's only for tsv, not for rebuilding this folder
sorry my bad
emily-vanark
left a comment
There was a problem hiding this comment.
Added a couple comments / questions, but in general, when @sator-labs is happy, I'll be happy.
sator-labs
left a comment
There was a problem hiding this comment.
LGTM modulo couple minor questions
| not in ["filename", "run_id", "judge_model", "judge_instance", "judge_id"] | ||
| ] | ||
| pd.DataFrame(results, columns=columns).to_csv( | ||
| pd.DataFrame(results, columns=cast(Any, columns)).to_csv( |
There was a problem hiding this comment.
why are you casting to Any?
There was a problem hiding this comment.
There was a cursor fix to shut-up an error:
Argument of type "list[str]" cannot be assigned to parameter "columns" of type "Axes | None" in function "__init__"
Type "list[str]" is not assignable to type "Axes | None"
"list[str]" is not assignable to "ExtensionArray"
"list[str]" is not assignable to "ndarray[_AnyShape, dtype[Any]]"
"list[str]" is not assignable to "Index"
"list[str]" is not assignable to "Series"
"list[str]" is incompatible with protocol "SequenceNotStr[Unknown]"
"index" is an incompatible type
Type "(value: str, start: SupportsIndex = 0, stop: SupportsIndex = sys.maxsize, /) -> int" is not assignable to type "(value: Any, /, start: int = 0, stop: int = ...) -> int"
Cursor:
The cast is essentially saying "trust me, this is fine" to suppress a Pyright/mypy error without actually changing runtime behavior. It's a common workaround for pandas' somewhat awkward type stubs where Axes is a broad union type that doesn't always align cleanly with list[str].
| raise ValueError( | ||
| "Resume mode requires --run-id to match the run folder name when set." | ||
| ) | ||
| elif run_id is None: |
There was a problem hiding this comment.
i feel the if should also consider the force == True case?
Love this! creating a ticket. |
Description
If 80% of conversations are error-free, and we want the remaining 20%, we should be able to point generate.py to the existing folder, and it complete the remaining 20% without rerunning the other 80%.
This PR adds:
--resumeflag to generate.py & judge.py - this flag is mainly to add intention to the run, but tells the scripts to only look for and generate/judge missing conversations/evaluations