MML Attack by RPaolino · Pull Request #411 · AISecurityLab/hackagent

Raffaele Paolino (RPaolino) · 2026-06-01T07:38:23Z

Summary

Implements the Multi-Modal Linkage jailbreak attack for Vision-Language Models, based on:

Wang et al., "Jailbreak Large Vision-Language Models Through Multi-Modal Linkage" (2024) — arXiv:2412.00473

The MML attack encodes harmful prompts into images using visual transformations, then pairs them with text prompts that instruct a VLM to decode and act on the hidden content.

Encoding modes:

Mode	Description
`word_replacement`	Replaces key words with random substitutes, renders to image, provides a dictionary for reconstruction
`mirror`	Renders text in an image, flips it horizontally
`rotate`	Renders text in an image, rotates 180°
`base64`	Encodes the prompt in Base64, renders the encoded string in an image
`mixed`	Combines word replacement, mirroring, and rotation

Prompt styles: game (villain's lair scenario) and control (neutral list-filling).

Changes

hackagent/attacks/techniques/mml/ — new attack package:
- attack.py — MMLAttack orchestrator (BaseAttack subclass) with VLM target validation
- config.py — DEFAULT_MML_CONFIG + Pydantic MMLConfig/MMLParams models
- generation.py — prompt construction and image-encoded generation step
- evaluation.py — response evaluation step
- image_encoder.py — image rendering + encode_word_replacement, encode_mirror, encode_rotate, encode_base64, encode_mixed
- prompts.py — all prompt templates for each encoding mode × prompt style
hackagent/attacks/registry.py — registers mml attack
hackagent/cli/commands/attack.py — CLI support for MML
hackagent/cli/tui/attack_specs.py — TUI attack spec for MML
hackagent/router/tracking/coordinator.py — injects result_id from tracker into generation results for server sync
hackagent/server/dashboard/_page.py — dashboard visualization fix for num_workers > 1
tests/unit/attacks/mml/ — comprehensive unit tests (attack, config, generation, image encoder, prompts)

Fixes #350

- Add 'mixed' encoding mode (word_replacement + mirror + rotation) - Add encode_mixed() to image_encoder with combined transformations - Add MIXED_GAME_PROMPT and MIXED_CONTROL_PROMPT templates - Update MMLParams Literal type to include 'mixed' - Add _warn_if_not_vlm() validation in MMLAttack - Inject result_id from tracker into generation results for server sync - Update docs attack index with MML entry

+        ]
+
+    @with_tui_logging(logger_name="hackagent.attacks", level=logging.INFO)
+    def run(self, goals: List[str]) -> List[Dict]:


+            metadata = self.agent_router.backend_agent.metadata
+            if isinstance(metadata, dict):
+                model_name = metadata.get("name") or metadata.get("model_name")
+        except AttributeError:


+from .prompts import get_prompt_template
+
+if TYPE_CHECKING:
+    from hackagent.router.tracking import Tracker


Raffaele Paolino (RPaolino) added 4 commits May 28, 2026 09:55

feat: added mml attack

d2f249e

fix: visualization working for num_workers>1

d93a2b7

fix: cli, tui now support mml-attack

70f82a3

Nicola Franco (franconicola) deployed to feat/mml-attack - Docs PR #411 June 1, 2026 07:38 — with Render View deployment

github-code-quality Bot found potential problems Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MML Attack#411

MML Attack#411
Raffaele Paolino (RPaolino) wants to merge 4 commits into
mainfrom
feat/mml-attack

Raffaele Paolino (RPaolino) commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Raffaele Paolino (RPaolino) commented Jun 1, 2026

Summary

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants