Skip to content

UX improvements#408

Merged
Nicola Franco (franconicola) merged 25 commits into
mainfrom
feat/dashboard-improvements
Jun 1, 2026
Merged

UX improvements#408
Nicola Franco (franconicola) merged 25 commits into
mainfrom
feat/dashboard-improvements

Conversation

@RPaolino
Copy link
Copy Markdown
Contributor

Summary

Major overhaul of the local dashboard and introduction of guardrails infrastructure for the attack pipeline.

Dashboard Improvements

  • Modular attack cards — Extracted all attack-specific goal card rendering (parse + render methods) from the monolithic _page.py.
  • SVG plot downloads — All ECharts (risk distribution, vulnerability radar, robustness scores) can now be exported as SVG.
  • Run & goal filters — Added status/category/search filtering in the History tab goal list.
  • Copy to clipboard — Added copy buttons for prompts and responses across all attack cards.
  • Guardrail visualization — Before/after guardrail blocks are rendered as distinct visual banners with category and explanation.
  • Consistent tables — Unified table styling between Dashboard and History tabs.
  • Comparison panel — Improved multi-run comparison visualization.
  • Layout fixes — History tab uses full panel width; removed unused Report tab.

Fixes #354

Raffaele Paolino (RPaolino) and others added 23 commits May 25, 2026 09:03
- Add GuardrailExtractor for parsing guardrail events from agent responses
- Integrate before/after guardrail detection in router
- Track guardrail events in coordinator and tracker
- Update all attack techniques to handle guardrail-blocked responses:
  baseline, advprefix, bon, cipherchat, flipattack, h4rm3l, pap
- Export guardrail utilities from attacks.shared
- Replace guardrail_blocked/guardrail_event with adapter_type: guardrail
- Add is_guardrail_response() and get_guardrail_info() to response_utils
- Update router to emit structured agent_specific_data (side, categories, reasoning)
- Migrate all 10 attack techniques to use canonical detection helper
- Update tracker to detect guardrail responses via adapter_type
- Switch guardrail.py to JSON-structured output parsing with keyword fallback
- PAIR: pass full guardrail response dict to add_interaction_trace so
  the dashboard can detect and render guardrail blocks per iteration
- TAP: return descriptive guardrail marker string from _query_target
  instead of None so blocked iterations show guardrail info in traces
- Add guardrail event rendering in trace views (before/after blocks)
- Add two-panel History run dialog with config chips and metrics
- Add attack-specific trace parsing and rendering for all attack types
- Add category/subcategory grouping in goal lists
- Add compact goal cards with color-coded borders
When goal_batch_workers > 1, each goal gets its own attack instance with
_goal_index_offset. The tracker creates goal contexts at that offset, but
generation.execute() and evaluation.execute() used enumerate(goals) starting
at 0 to look up contexts. For any goal with offset != 0 this returned None,
silently skipping Candidate/Summary traces and tap_judge evaluations.
- Return the structured guardrail response dict instead of string-encoding
  it as [GUARDRAIL:xxx], so tracker and dashboard handle it properly
- Pass empty string to judges for guardrail-blocked responses (score 0)
- Remove [:500] slice on response in trace recording (tracker handles dicts)
Ensures the goal index offset is passed through to both TAP pipeline
steps so multi-batch goal evaluation uses the correct tracker context.
- Call _update_tracker() after _sync_to_server() so each prefix gets an
  evaluation trace with its score in the DB
- Embed prefix text in evaluation trace metadata so the dashboard can
  attribute jailbreaks to specific prefixes
AutoDAN-Turbo:
- Read phase/subphase from content (not step_name) for DB-loaded traces
- Skip bookend traces (PHASE_START/END, SKIP_FINALIZED)
- Detect WARMUP_SUMMARY via phase+subphase instead of step_name
- Group epochs under iteration sub-headers in the renderer

Guardrail display:
- Add legacy [GUARDRAIL:xxx] string-pattern fallback in extractor
- Add guardrail categories to trace data and rendering templates
- Improve guardrail event rendering with structured pre blocks
- Propagate _guardrail_categories through all parsing paths
History tab — Run list:
- Replace pagination with infinite scroll ("Load more" button)
- Add filter bar: search, agent, attack type, and status dropdowns
- Load all runs upfront and filter client-side for instant feedback

History tab — Run detail dialog:
- Add goal filter bar with search, status, and category dropdowns
- Preserve original goal numbering when filters are applied
Two bugs caused per-prefix/per-template detail rows to always display
'Mitigated' even when the goal was successfully jailbroken:

1. AdvPrefix: The Evaluation step's config_keys was missing '_tracker',
   so no evaluation traces were created. The dashboard matches completion
   traces to evaluation traces by prefix string to determine which rows
   are jailbreaks — without traces, all rows defaulted to 'mitigated'.

2. Baseline: The dashboard's _parse_baseline_traces hardcoded the
   evaluator name 'baseline_pattern_evaluator', but when using LLM judges
   (the default), the evaluator name is 'baseline_llm_judge'. The eval
   trace was never matched, so all rows defaulted to 'mitigated'.

for depth_level in sorted(by_depth.keys()):
depth_nodes = by_depth[depth_level]
_ds = (depth_stats or {}).get(depth_level, {})
if score_raw is not None:
try:
step["score"] = float(score_raw)
except (TypeError, ValueError):
if score_delta_raw is not None:
try:
step["score_delta"] = float(score_delta_raw)
except (TypeError, ValueError):
Comment thread hackagent/router/tracking/coordinator.py Fixed
@franconicola Nicola Franco (franconicola) temporarily deployed to feat/dashboard-improvements - Docs PR #408 May 30, 2026 19:27 — with Render Destroyed
@franconicola Nicola Franco (franconicola) merged commit 91a7e0c into main Jun 1, 2026
23 of 24 checks passed
@franconicola Nicola Franco (franconicola) deleted the feat/dashboard-improvements branch June 1, 2026 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comparison and export of results from dashboard

2 participants