✨ feat: validate model availability before run execution#416
Open
Marco Russo (marcorusso97) wants to merge 5 commits into
Open
✨ feat: validate model availability before run execution#416Marco Russo (marcorusso97) wants to merge 5 commits into
Marco Russo (marcorusso97) wants to merge 5 commits into
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Contributor
Raffaele Paolino (RPaolino)
left a comment
There was a problem hiding this comment.
The hardcoded _ATTACK_MODEL_ROLE_PATHS looks fragile for a few reasons:
- every attack change now requires also updating a central dictionary in
AttackOrchestrator - not every attack uses every configured model in every execution path, so static path-based preflight can over-check and fail runs that would actually work
- different semantic roles can resolve to the exact same effective model configuration, but they are still treated as separate checks, which can lead to duplicated probing and noisier failures
Concrete examples:
h4rm3l:decorator_llmis currently always considered a required role for the attack type if it is present in config, but whether it is actually used depends on the selected transformation pipeline. With the current implementation, a configured-but-unuseddecorator_llmwould still be availability-checked and could block the run.baseline: judge / judges are statically included in_ATTACK_MODEL_ROLE_PATHS, butbaselinecan also use non-LLM evaluation paths, e.g. regexp-based jailbreak detection. In those cases, a configured judge may not be effectively required for the run, yet the current preflight would still check it.autodan_turbo: an unreachable embedder falls back to localbag-of-wordsembedding and continues the run. This means a run can succeed even when the configured remote embedder is unavailable (maybe then, do not check for availability of the embedder)tap: if theon_topic_judgeis not specified, it defaults tojudges[0]; hence, the same model will have different roles and would be checked several times.
I would prefer an attack-owned API, something like get_effective_model_roles, where each attack:
- declares which model roles are actually needed for the current run,
- resolve models configuration, taking into account also default values,
- can skip optional/inactive roles for a given run,
- can collapse roles that share the same effective configuration before probing.
Then the orchestrator would only need to perform eventual deduplication (sincetarget_modelmay not be visible to the attack class, or different attacks may use similar models) and availability checks.
Contributor
Author
What I Changed
Outcome
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a model availability preflight in the attack orchestrator, so runs are aborted before execution starts when required model endpoints are unreachable.
What changed
How the healthcheck works
Tests
Extended orchestrator tests now cover:
Why this is useful