✨ feat: validate model availability before run execution by marcorusso97 · Pull Request #416 · AISecurityLab/hackagent

Marco Russo (marcorusso97) · 2026-06-03T12:50:50Z

Summary

This PR introduces a model availability preflight in the attack orchestrator, so runs are aborted before execution starts when required model endpoints are unreachable.

What changed

Added pre-run model availability validation before Attack/Run DB records are created.
Added per-attack role mapping to discover all required model roles (target plus attack-specific roles).
Added robust attack type normalization for preflight role resolution (including alias handling such as AutoDANTurbo -> autodan_turbo).
Added live preflight progress output for each role:
- Checking () ... OK/KO
Added internal noise suppression during probes:
- temporarily silences internal logs and stdout/stderr emitted by provider libraries during health probes.
Added aggregated, user-friendly error formatting for unreachable models:
- Unreachable models:
  - role=... identifier=... endpoint=... error=...
Updated failure behavior:
- on preflight failure, log a configuration error and gracefully stop the run early, instead of proceeding.
- run startup is blocked and no Attack/Run records are created in this case.

How the healthcheck works

Prepare attack parameters and resolve goals.
Build a list of required targets:
- always include target model from the existing router.
- include attack-specific roles from the role-path map (for example attacker/scorer/summarizer/embedder, judge variants, decorator role, etc.).
- include category_classifier unless explicit intent taxonomy labels are already provided.
For each target, run a lightweight probe:
- for existing target: use the already registered router.
- for configured role models: create a temporary router from role config.
- issue a minimal request:
  - one user message: healthcheck
  - max_tokens=1
  - temperature=0.0
Probe result handling:
- if router initialization fails or request raises, mark KO with the captured error.
- if response has error_message, mark KO.
- if response is non-dict, treat as inconclusive-pass (to avoid false negatives with custom adapters/tests).
Print per-role progress and final status (OK/KO).
If any target is unreachable:
- build one aggregated multiline error report listing role, identifier, endpoint, and error.
- abort before creating Attack/Run records.
If all checks pass, continue with normal run creation and execution.

Tests

Extended orchestrator tests now cover:

unreachable model message content and formatting.
multi-model aggregation in a single preflight failure report.
no Attack/Run DB creation when preflight fails.
attack-type normalization regression coverage for AutoDAN aliases.

Why this is useful

Prevents expensive or noisy runs when dependencies are misconfigured.
Gives immediate, actionable feedback on exactly which model endpoint is failing.
Improves reliability and UX with clear preflight visibility and graceful early abort.

codecov · 2026-06-03T13:36:44Z

Codecov Report

❌ Patch coverage is 83.66762% with 57 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
hackagent/attacks/orchestrator.py	82.02%	48 Missing ⚠️
hackagent/attacks/techniques/h4rm3l/attack.py	79.16%	5 Missing ⚠️
hackagent/attacks/techniques/baseline/attack.py	84.61%	2 Missing ⚠️
hackagent/attacks/techniques/tap/attack.py	89.47%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Raffaele Paolino (RPaolino)

The hardcoded _ATTACK_MODEL_ROLE_PATHS looks fragile for a few reasons:

every attack change now requires also updating a central dictionary in AttackOrchestrator
not every attack uses every configured model in every execution path, so static path-based preflight can over-check and fail runs that would actually work
different semantic roles can resolve to the exact same effective model configuration, but they are still treated as separate checks, which can lead to duplicated probing and noisier failures

Concrete examples:

h4rm3l: decorator_llm is currently always considered a required role for the attack type if it is present in config, but whether it is actually used depends on the selected transformation pipeline. With the current implementation, a configured-but-unused decorator_llm would still be availability-checked and could block the run.
baseline: judge / judges are statically included in _ATTACK_MODEL_ROLE_PATHS, but baseline can also use non-LLM evaluation paths, e.g. regexp-based jailbreak detection. In those cases, a configured judge may not be effectively required for the run, yet the current preflight would still check it.
autodan_turbo: an unreachable embedder falls back to local bag-of-words embedding and continues the run. This means a run can succeed even when the configured remote embedder is unavailable (maybe then, do not check for availability of the embedder)
tap: if the on_topic_judge is not specified, it defaults to judges[0]; hence, the same model will have different roles and would be checked several times.

I would prefer an attack-owned API, something like get_effective_model_roles, where each attack:

declares which model roles are actually needed for the current run,
resolve models configuration, taking into account also default values,
can skip optional/inactive roles for a given run,
can collapse roles that share the same effective configuration before probing.
Then the orchestrator would only need to perform eventual deduplication (since target_model may not be visible to the attack class, or different attacks may use similar models) and availability checks.

Marco Russo (marcorusso97) · 2026-06-05T13:16:57Z

What I Changed

Added an attack-owned role API: get_effective_model_roles.
Updated orchestrator preflight to:
- Prefer attack-owned role resolution.
- Fall back to static mapping only when needed.
- Deduplicate by effective model key (identifier, endpoint, agent_type).
- Aggregate role labels for clearer progress and error output.
- Support optional-role policy (skip optional by default, probe only on explicit opt-in).
Implemented attack-specific effective-role logic:
- Baseline: judge checks only when evaluator_type requires LLM judges.
- TAP: on_topic_judge fallback handled and deduplicated with judge when shared.
- AutoDAN-Turbo: embedder marked optional by default, with opt-in to require it.
- h4rm3l: decorator_llm checked only when the effective program includes LLM-assisted decorators.
Fixed default category classifier preflight behavior:
- If intents are not used and category_classifier is not explicitly provided, preflight now validates the default classifier configuration.

Outcome

Preflight now matches real runtime dependencies per attack.
Missing required checks and unnecessary checks were both reduced.
Default classifier validation is enforced in goals/dataset flows.
Probe output is cleaner and easier to interpret.
Changes are covered by updated unit tests and targeted preflight script runs.

✨ feat: validate model availability before run execution

10b0913

Marco Russo (marcorusso97) requested a review from Raffaele Paolino (RPaolino) June 3, 2026 12:50

Marco Russo (marcorusso97) linked an issue Jun 3, 2026 that may be closed by this pull request

Validate model availability before run execution #377

Open

Copilot started work on behalf of Marco Russo (marcorusso97) June 3, 2026 13:21 View session

Copilot stopped work on behalf of Marco Russo (marcorusso97) due to an error June 3, 2026 13:22
The session was cancelled by the user.

🐛 fix: fixed google adk integration test

98f0403

💚 ci: validate commit messages against pr head sha only

af3dba7

Raffaele Paolino (RPaolino) reviewed Jun 5, 2026

View reviewed changes

🐛 fix: optimized preflight check on models

a212034

📝 docs: documented preflight parameters

467f89c

Nicola Franco (franconicola) deployed to 377-validate-model-availability-before-run-execution - Docs PR #416 June 5, 2026 13:30 — with Render View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ feat: validate model availability before run execution#416

✨ feat: validate model availability before run execution#416
Marco Russo (marcorusso97) wants to merge 5 commits into
mainfrom
377-validate-model-availability-before-run-execution

Marco Russo (marcorusso97) commented Jun 3, 2026

Uh oh!

codecov Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

Raffaele Paolino (RPaolino) left a comment

Uh oh!

Marco Russo (marcorusso97) commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Marco Russo (marcorusso97) commented Jun 3, 2026

Summary

What changed

How the healthcheck works

Tests

Why this is useful

Uh oh!

codecov Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Raffaele Paolino (RPaolino) left a comment

Choose a reason for hiding this comment

Uh oh!

Marco Russo (marcorusso97) commented Jun 5, 2026

What I Changed

Outcome

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Jun 3, 2026 •

edited

Loading