Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
run: uv sync --group dev

- name: Check commit messages in PR
run: uv run cz check --rev-range ${{ github.event.pull_request.base.sha }}..${{ github.sha }}
run: uv run cz check --rev-range ${{ github.event.pull_request.base.sha }}..${{ github.event.pull_request.head.sha }}

python-checks:
name: Linting and Formatting
Expand Down
7 changes: 7 additions & 0 deletions docs/docs/attacks/autodan_turbo.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,13 @@ AutoDAN-Turbo uses a top-level `embedder` config for strategy retrieval. This ro
| `embedder.endpoint` | Endpoint used by the embedder router | `http://localhost:11434` |
| `embedder.agent_type` | Router adapter type for the embedder | `OLLAMA` |

### Preflight Controls (Advanced)

| Parameter | Scope | Description | Default |
|-----------|-------|-------------|---------|
| `_preflight_require_embedder` | AutoDAN-Turbo | When `true`, `embedder` is treated as required during preflight availability checks. | `false` |
| `_preflight_probe_optional_roles` | Global (all attacks) | When `true`, preflight also probes roles marked optional by attack-specific role resolution. | `false` |

### Role Models

| Role | Required keys |
Expand Down
14 changes: 14 additions & 0 deletions docs/docs/attacks/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,20 @@ All attacks support loading goals from AI safety benchmarks like **AgentHarm**,

:::tip Shared Category Classifier
All attacks accept a top-level `category_classifier` config block to classify each goal at tracking time. You can customize model, endpoint, and adapter type directly in `attack_config`.

Preflight behavior:

- If you provide `intents` (with explicit labels), category-classifier preflight is skipped.
- If you use `goals` or `dataset` and do **not** provide `category_classifier`, HackAgent preflights the default classifier config automatically.
:::

:::note Advanced Preflight Flags
HackAgent also supports internal preflight-control flags in `attack_config`:

- `_preflight_probe_optional_roles` (all attacks): when `true`, preflight probes optional roles too. By default optional roles are skipped.
- `_preflight_require_embedder` (AutoDAN-Turbo only): when `true`, `embedder` is treated as required in preflight.

These are advanced/debug controls and are usually not needed in standard runs.
:::

---
Expand Down
Loading
Loading