cas: add technique tags to existing probes by leondz · Pull Request #1691 · NVIDIA/garak

leondz · 2026-04-21T20:55:39Z

This PR applies technique tags to existing probes, as part of exposing technique & intent information (see garak / Context Aware Scanning)

We used the modified demon typology, already included in garak/data/tags.misp.tsv

Verification

Should be able to use demon as a taxonomy when specifying runs and reading reports
Plugin structure and probe structure tests pass

Signed-off-by: Leon Derczynski <lderczynski@nvidia.com>

patriciapampanelli · 2026-04-24T17:12:39Z

-        "demon:Language:Prompt_injection:Stop_sequences",
-        "demon:Language:Prompt_injection:Ignore_previous_instructions",
+        "demon:Language:Prompt_injection:Strong_arm_attack",  # includes coercive disablement and reminder threats to force continued compliance
+        "demon:Language:Prompt_injection:Ignore_previous_instructions",  # ablation tests "ignore previous instructions" variants


after promoting Ignore_previous_instructions to _DAN_DEFAULTS["tags"], the same tag is still in Ablation_Dan_11_0.extra_tags. Since the metaclass merges via base_tags + extra_tags (no dedup), doesn't it end up duplicated in self.tags?

Correct, this is will already be in on the class resulting in a duplicate entry.

jmartin-tech

A few that might need some tweaks. It would be valuable to enumerate what each taxonomy prefix in tags.misp.tsv targets.

jmartin-tech · 2026-04-27T20:34:32Z

 demon:Language:Stylizing:Synonymous_language	Synonymous language	Varying prompt slightly in form but not meaning
 demon:Language:Stylizing:Capitalizing	Capitalizing	USING CAPS
-demon:Language:Stylizing:Give_examples	Give examples	Issuing examples of the target behaviour
+demon:Language:Stylizing:Give_examples	Give examples	Issue examples of the target behaviour in the attack itself


demon prefix needs to be documented as a clear technique taxonomy

jmartin-tech · 2026-04-28T18:26:18Z

-        "demon:Language:Prompt_injection:Stop_sequences",
-        "demon:Language:Prompt_injection:Ignore_previous_instructions",
+        "demon:Language:Prompt_injection:Strong_arm_attack",  # includes coercive disablement and reminder threats to force continued compliance
+        "demon:Language:Prompt_injection:Ignore_previous_instructions",  # ablation tests "ignore previous instructions" variants


Correct, this is will already be in on the class resulting in a duplicate entry.

jmartin-tech · 2026-04-28T18:33:05Z

        "quality:Security:PromptStability",
        "cwe:1427",
-        "demon:Fictionalizing:Roleplaying:User_persona",
+        "demon:Language:Prompt_injection:Ignore_previous_instructions",  # attacker-supplied scene config overrides the model's normal safeguards


Is this really a technique applied by in this class? I would agree the goal is to ignore training and system instructions however as a technique I would expect the prompt would need to actually inject an explicit request that some other instruction be ignored to call this technique applied.

I could see possibly:

demon:Stratagems:Meta-prompting:Ask_for_examples # the attacker-supplied scene requests data that the system is explicitly expected to protect

This one is tricky for me. Reading the prompt literally, I think it's mostly asking for a Dr. House scene/script. That said, I can see Ask_for_examples as a secondary interpretation if we treat the generated scene as a demonstration of the target behavior. Much less obvious to me than other probes with the same tag, like API Key.

Looking at the grandma family, I'd be just as unsure since they follow the same pattern.

Line 101 below keeps the User_persona entry matching with grandma, I think the questions are:

Does Ignore_previous_instructions apply? As I noted IMO the answer to this is no.

Does Ask_for_examples apply? I am suggesting it does but can accept omitting it as an inferred vs explicit technique.

I agree Ignore_previous_instructions doesn't quite fit here. I think Ask_for_examples could apply, but if we add it, would it make sense to apply it to the grandma too for consistency? I think it wraps an example request inside a persona, so it feels like the same pattern.

jmartin-tech · 2026-04-28T18:48:06Z

        "owasp:llm01",
        "quality:Security:PromptStability",
        "payload:jailbreak",
+        "demon:Language:Code_and_encode:Data_presentation",  # encodes instructions as typographic images to bypass text-based alignment


This seems reasonable, is there a modality change shift based technique we could add to the taxonomy?

leondz added 5 commits April 15, 2026 14:22

first pass of technique tags in probes

b5961fb

Signed-off-by: Leon Derczynski <lderczynski@nvidia.com>

justify technique tags

256f3a6

Signed-off-by: Leon Derczynski <lderczynski@nvidia.com>

fix category for unreal computing

f41353a

Signed-off-by: Leon Derczynski <lderczynski@nvidia.com>

clarify where Give_examples should be demonstrated

bb7f0a7

Signed-off-by: Leon Derczynski <lderczynski@nvidia.com>

manual review of demon technique tags

79f0494

Signed-off-by: Leon Derczynski <lderczynski@nvidia.com>

leondz requested review from erickgalinkin and patriciapampanelli April 21, 2026 20:55

leondz added the probes Content & activity of LLM probes label Apr 21, 2026

amend technique tag typo

0255794

Signed-off-by: Leon Derczynski <lderczynski@nvidia.com>

leondz requested a review from jmartin-tech April 22, 2026 20:10

leondz added this to garak / Context Aware Scanning Apr 23, 2026

patriciapampanelli moved this to Up next in garak / Context Aware Scanning Apr 23, 2026

patriciapampanelli moved this from Up next to In Progress in garak / Context Aware Scanning Apr 23, 2026

patriciapampanelli mentioned this pull request Apr 23, 2026

feature: persist technique tags per-attempt in report.jsonl #1703

Open

leondz mentioned this pull request Apr 23, 2026

feature: complete technique & intent tag coverage across probes #1706

Open

patriciapampanelli reviewed Apr 24, 2026

View reviewed changes

patriciapampanelli mentioned this pull request Apr 27, 2026

feature/persist technique tags #1719

Open

2 tasks

jmartin-tech requested changes Apr 28, 2026

View reviewed changes

patriciapampanelli moved this from In Progress to In Review in garak / Context Aware Scanning Apr 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cas: add technique tags to existing probes#1691

cas: add technique tags to existing probes#1691
leondz wants to merge 6 commits intoNVIDIA:feature/technique_intentfrom
leondz:feature/ti_technique_tags_in_probes

leondz commented Apr 21, 2026 •

edited by patriciapampanelli

Loading

Uh oh!

patriciapampanelli Apr 24, 2026

Uh oh!

jmartin-tech Apr 28, 2026

Uh oh!

jmartin-tech left a comment

Uh oh!

jmartin-tech Apr 27, 2026

Uh oh!

jmartin-tech Apr 28, 2026

Uh oh!

jmartin-tech Apr 28, 2026

Uh oh!

patriciapampanelli Apr 30, 2026 •

edited

Loading

Uh oh!

jmartin-tech Apr 30, 2026

Uh oh!

patriciapampanelli Apr 30, 2026

Uh oh!

jmartin-tech Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

leondz commented Apr 21, 2026 • edited by patriciapampanelli Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verification

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmartin-tech left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patriciapampanelli Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leondz commented Apr 21, 2026 •

edited by patriciapampanelli

Loading

patriciapampanelli Apr 30, 2026 •

edited

Loading