cas: technique & intent minimal pilot #1598

leondz · 2026-02-03T14:45:36Z

Minimal end-to-end implementation of a technique & intent probe

What's new

define IntentProbe class for a probe that can consume an intent config, summon stubs from the intent service, and conduct basic probing
demo class: probes.grandma.GrandmaIntent
add cas section to _config
add --intents cli arg, mapping to config.cas.intents
add link from intents to detectors
add trait typology defining intents/traits
allow evaluators.base.Evaluator to switch between both non-intent and intent probe result sets
add intent service
allow intent stubs to be taken from txt, json, and via code
add cas.py for managing trait/policy structures

Testing

run the tests
from cli: python -m garak --config ti_pilot.yml where ti_pilot.yml is:

---
cas:
  intent_spec: S005

run:
  generations: 1

plugins:
  probe_spec: grandma.GrandmaIntent

(may need to flush plugin cache first)

in code:

import garak._config
import garak._plugins
import garak.intentservice
garak._config.load_config()
garak._config.cas.intent_spec = "T"
garak.intentservice.load()
i = garak._plugins.load_plugin("probes.grandma.GrandmaIntent")
i.prompts
i.prompt_notes

…ocs with intro paragraph

…bstitution properly

…rm detection requirement

Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>

jmartin-tech

Initial thoughts, comments reflect recommendation based on this being an iterative change that does not yet reflect the final usage pattern.

I would suggest the interactions with the intentservice should be limited to:

the harness during runtime for detector instantiation
probe initialization (as a read only function)

jmartin-tech · 2026-02-05T17:20:53Z

garak/resources/garak.core.yaml

+cas:
+  intent_spec:
+


The other spec types are all placed under plugin, long term I would suggest enhancement of the supported format for probe_spec is desired. Since this influences probe selection.

a pattern that's been on the cards for a while is to:

remove plugins and replace with its major constituent parts at top level

move run-level config to run

currently the plan is to place everything CAS-related under cas (e.g. typology files, hierarchy usage in e.g. detector selection). this gets us landed without distraction.

if reworking config structure is part of CAS scope, I'd prefer to rework the whole config setup, cf:

config: move plugin-specific config to top-level #520

Reorganise _config - move run-specific items out of plugins and into run #931

feature: support run intent specification #1436

generators: move read timeout up to global/base.Generator config item #715

config: support list-type data for probe_spec and detector_spec in json and yaml configs #1563

but I don't hear a strong reason to have this much-needed rework block landing cas draft - it's out of scope of this PR

alternative cas-specific pattern is to move intent_spec to run , where it can be quickly configured by probes, and allow probe-specific overrides. plugins as a home for intent_spec doesn't afford this and isn't the right place for probe_spec in the first place, really

jmartin-tech · 2026-02-05T19:28:37Z

garak/intentservice.py

As we start to build out services it might be best to consolidate them in a namespace organize them better, consider moving this and the langservice into garak/services/intentservice.py.

yeah, considered this at the time. but n=2 is too low to start, for me (PRs welcome)

jmartin-tech · 2026-02-05T19:32:16Z

garak/intentservice.py

+intents = {}
+intent_detectors = {}


Prefer a single global state object, possibly wrapped as a custom class. Using the "module" level to hold multiple state objects gets tricky to track.

here intentservice follows the pattern in langservice for langproviders - perhaps that var could do with some documentation regarding usage

jmartin-tech · 2026-02-05T19:39:51Z

garak/intentservice.py

        intents = json.load(intents_file)


+def _load_intent_detectors(detectors_path=None) -> None:


This looks to be loading a mapping of intent taxonomy buckets to the set of detectors that may work with that particular classifier.

Given this and thinking of later usage, should the probe also be able to specify an additional filter that should be taken into account when retrieving detectors for an attempt based on an intent?

I suspect that many techniques will impact the detectors that are viable and may even need to augment or chain response processing to be suitable for detectors that are directly associated to the taxonomy.

Given this and thinking of later usage, should the probe also be able to specify an additional filter that should be taken into account when retrieving detectors for an attempt based on an intent?

I don't follow this, can you unpack? Especially lost at what agency the probe has in this process

I suspect some techniques implemented in probes will make certain detectors no longer viable unless used in combination with some previous detector or processing. Maybe this is something that the _post_process_attempt() hook is the solution for, which may be a way to ensure probe can own if this is in fact the case.

jmartin-tech · 2026-02-05T19:51:43Z

garak/intentservice.py

    # search path: cas/intent_text/xxx_*.txt
-    stub_filepaths = cas_data_path / "intent_stubs" / f"{intent_code}*.txt"
+    stub_dir = cas_data_path / "intent_stubs"
+    stub_glob = stub_dir.glob(f"{intent_code}*.txt")


It would be helpful for this PR to come with an example stub file or two, this would offer more clarity on the expected format.

The readme in that intent_stubs path offers format information, however I would also hazard that the project should support multiline stub prompts and txt file extension suggests each file contains a single stub per line. A more structured format might be desired to avoid users needing to use raw \n to format entires in a file.

I would also hazard that the project should support multiline stub prompts

I couldn't agree more. The format should be general enough that we can support multi-turn conversations in input.

It would be helpful for this PR to come with an example stub file or two, this would offer more clarity on the expected format.

These are already in the target upstream branch, see #1481

Multiple sources are supported, because our data is in multiple formats:

py bearing a method returning iterable

one-per-line txt

descr (falling back to title) in intent typology

Agree some examples would be nice - this runs already with the basic items in the typology.

The format should be general enough that we can support multi-turn conversations in input.

We should get to that before CAS is done, agree. Let's put it on the roadmap for another PR - this one already has enough novel material re: probing

Discussion has suggested that yaml format using a list of OpenAI style conversation may be a good primitive here that may be easily crafted.

A possible suggested format, naming of the top level list and entry values might needs some tweaking for Python.

T999-sample.yaml

stubs: - stub_1: - role: user content: | my super mean start with complex formatting like more than one line of content - role: assistant content: faked conversation response - role: user content: expanded goal - stub_2: - role: user content: my super nice goal

jmartin-tech · 2026-02-05T20:00:54Z

garak/intentservice.py


-def get_intent_stubs(intent_specifier: str) -> List[str]:
+def _validate_intent_specifier(intent_specifier: str) -> bool:
+    return re.fullmatch("[CTMS]([0-9]{3}([a-z]+)?)?", intent_specifier)


Consider extracting this regex into a CONSTANT.

What's the advantage when consumed once? Are you thinking of compilation overhead?

jmartin-tech · 2026-02-05T20:04:35Z

garak/intentservice.py

+def intent_to_detectors(intent_specifier: str) -> Set[str] | None:
+    """return most specific set of detectors applicable to a single intent"""
+
+    global intent_detectors


No need for global directive when accessing for read only. Comment is for consistency with other code that accesses module globals as read only.

Any use of the global directive should likely be given extra scrutiny.

following pattern in langservice - PR welcome if another setup suits

This is not a follow patterns thing, just a python thing that should probably be more consistent. If we have other places in the code that are using the global directive in a method that just uses read access to the value we don't need to call global, and it would follow least privilege principals better.

jmartin-tech · 2026-02-05T20:14:37Z

garak/probes/base.py

+
+    def __init__(self, config_root=_config):
+        super().__init__(config_root)
+        self._populate_intents(config_root.cas.intent_spec)


There is no guarantee that config_root is _config. This should not be accessed by a probe.

I would prefer to see the the enabled intents be activated in the intentservice when it initializes.

A probe can simply call garak.intentservice.get_intent_stubs() at most passing in the compatible intents for the probe as a filter if they have been defined.

Or alternatively, an IntentProbe calls the intentservice and uses the returned set or filters the returned set itself based on the intent categories the technique is compatible with.

Further in the execution it looks like the current pattern is for the probe to iterate over each activated intent from the global spec itself. I don't think this should not be the responsibility of the probe.

There is no guarantee that config_root is _config. This should not be accessed by a probe.

Is this a deviation to how Configurable is used currently across all first-class plugins? Please indicate the change you're suggesting, it's not clear to me

I would prefer to see the the enabled intents be activated in the intentservice when it initializes.

This locks us out of per-probe intent specification, disprefer

Or alternatively, an IntentProbe calls the intentservice and uses the returned set or filters the returned set itself based on the intent categories the technique is compatible with.

That's interesting. How would those categories be described? How does this deal with changes in the typology?

If a probe cannot support a specific intent typology the probe class would have this description, passing the description as a filter on the stubs to be returned handles that. If we don't want to pass the filter then the probe will need to do the filtering based on metadata from the return of get_intent_stubs(). I believe this meshes well with this comment about defining a class that get_intent_stubs() returns.

Also the idea that stubs returned should already be garak.Attempt.Conversation objects suggests even more strongly that a class to capture the source intent classifier may be appropriate.

jmartin-tech · 2026-02-05T20:29:43Z

garak/intentservice.py

+    return re.fullmatch("[CTMS]([0-9]{3}([a-z]+)?)?", intent_specifier)
+
+
+def expand_intent_specifier_leaves(intent_specifier: str) -> List[str]:


I suspect, this could be a private helper called once when the service initializes with the return value populating a single runtime state object for active intent categories.

Suggested change

def expand_intent_specifier_leaves(intent_specifier: str) -> List[str]:

def _expand_intent_specifier_leaves(intent_specifier: str) -> List[str]:

jmartin-tech · 2026-02-05T20:35:34Z

garak/evaluators/base.py

-                            )
-                            + "\n"  # generator,probe,prompt,trigger,result,detector,score,run id,attemptid,
-                        )
+        if "intent" not in attempts[0].notes:  # not an intent probe


I am not a fan of relying on notes to determine this, would it be reasonable check the plugin class type instead?

Looking further, this does not seem like something the evaluator should know about, detectors have already been executed against the attempts based on the intent service recommendation. Evaluation should not need treat them differently.

I am not a fan of relying on notes to determine this, would it be reasonable check the plugin class type instead?

Are we consistently guaranteed access to that given an Attempt object?

Looking further, this does not seem like something the evaluator should know about, detectors have already been executed against the attempts based on the intent service recommendation. Evaluation should not need treat them differently.

What's your alternative suggestion for managing evaluation and printing of detector results when the detector(s) used varies between each attempt?

Discussed separately, the processing changes to this class target at a specific idiosyncrasy that has been identified where Attempts for a single probe are no longer expected to have a result for all detectors that have been used by the probe. I would hazard this should simply be handled more abstractly and all Attempts should be treated as having this possible idiosyncrasy.

leondz · 2026-02-05T20:56:08Z

Lots of this is as expected, good. Ahead of tomorrow can you clarify which parts you think must be in the landing PR and are within scope of CAS, so we can start negotiating on that? I assume that many potential directions highlighted (eg namespaced services) are not immediate requirements from anyone's side, but rather eventual target patterns

ABeltramo

I agree with most of what has been said already by Jeffrey, I'm also a bit confused by the intent_detectors.

Probes already have a notion of primary_detector, I would expect that the intent_detectors would be an addition to that, but the code seems to suggest it's a filter instead. How would that fit with the base Probe.probe calling basically primary_detector.detect?

ABeltramo · 2026-02-06T08:58:11Z

garak/harnesses/base.py

+                    detectors = garak.intentservice.intent_to_detectors(intent_observed)
+                    if detectors is None:
+                        logging.warning(
+                            "No detectors specified for intent %s" % intent_observed
+                        )


This should default on using the input detectors (same as when the probe isn't an instance of IntentProbe above) when the intentservice doesn't have any recommendation.

What do you mean by input detectors?

This code should only be executed when dealing with an IntentProbe's results, if not, that's a bug

What do you mean by input detectors?

The Harness run method takes detectors in input:

def run(self, model, probes, detectors, evaluator, announce_probe=True)

which are then used in the old non-intent codepath:

if not isinstance(probe, garak.probes.base.IntentProbe): for d in detectors: self._run_detector(attempt_results, d)

The new code would override that input detectors

detectors = garak.intentservice.intent_to_detectors(intent_observed)

but I suppose that's intended behaviour looking at your other replies..

The only issue with that, now that I look at it, is that if you have a mixture of IntentProbes and good old Probes overriding the detectors would mess up with the next step in the loop for probe in probes.
Ex:

probes = ["DAN", "GrandmaIntent", "Malware"] # ... for probe in probes: # DAN will use input `detectors` # GrandmaIntent would override `detectors` # Malware would use the overriden `detectors` from GrandmaIntent

This actually points out an edge not yet managed, what happens when a detector_spec is provided and intent based probes are activated for the run?

Do the intent probes use the detectors in detector_spec entries for every Attempt or will they diverge from the existing behavior?

If diverging how should they determine when detector_spec is to override the intent's detector?

Does this change how we think of detector_spec in general?

ABeltramo · 2026-02-06T08:59:43Z

garak/data/cas/intent_detectors.json

@@ -0,0 +1,23 @@
+{


Is the plan here to have complete coverage of the intents in trait_typology or is this just a list of "additional suggested" detectors?

The plan is to map all the detectors we have, into the intent typology. This PR focuses on getting a basic pattern for intent probing in place, rather than adding detectors

ABeltramo · 2026-02-06T09:05:07Z

garak/probes/base.py

+                self.stubs.extend(expanded_stubs)
+                self.stub_intents.extend([intent] * len(expanded_stubs))


Wouldn't it be better to have a proper Stub object which wraps the original intent + the output prompt? It would simplify the code and make the connection more explicit

could be. the output prompt is predicated on both intents and also what the probe does with them, at which point they're no longer stubs (i.e. no longer independent of probe).

can you say more about the pattern you have in mind?

I'm mixing up terminology, let me rephrase it. We have something like:

intent -- N --> stub -- N --> prompt

right?
At the moment you are making a few arrays to keep track of the input intent from the prompt:

"1,2" # Intents in input config_root.cas.intent_spec ["A", "B", "C"] # stubs ["1", "1", "2"] # stub_intents ["q", "w", "e", "r", "t"] # prompts ["1", "1", "1", "2", "2"] # prompt_notes

so that ultimately you can link back in the attempt the original intent.

The first obvious issue is that you lose the connection between prompt and stub this way; which specific stub generated this prompt? In the example above was w generated from A or B?

Wouldn't it be all much easier if we make a Stub (put better naming here if needed) data class? Something like

@dataclass class Stub: intent: str stub: str

so that the code becomes

def _expand_stub(self, stub: Stub) -> List[Stub] # ... def prompts_from_stub(self, stub: Stub) -> List[str]

And the final _build_prompts becomes something like:

def _build_prompts(self): """In the most basic case, consume self.stubs and populate self.prompts""" self.prompts = [] self.prompt_notes = [] for i, stub in enumerate(self.stubs): prompts = self.prompts_from_stub(stub) self.prompts.extend(prompts) self.prompt_notes.extend([stub] * len(prompts))

If we want to keep self.prompts as a str, otherwise we could use a more generic Conversation and pass the Stub in the notes and get rid of self.prompt_notes altogether

leondz · 2026-02-06T10:20:10Z

Probes already have a notion of primary_detector, I would expect that the intent_detectors would be an addition to that, but the code seems to suggest it's a filter instead.

Detector selection is predicated on the intent used. Intent-based probes generate many attempts, each attempt having one intent but with the attempts representing many different intents overall. So, the probe:detector coupling no longer makes sense - the coupling is intent:detector (1:), and the probe:intent relationship is also 1:

How would that fit with the base Probe.probe calling basically primary_detector.detect?

Harnesses orchestrate this not probes

jmartin-tech

The shift of responsibility to intentserivce as the authority on active intents look good.

The remaining issue here is building out the stub object and ensuring setting the file format expectations for those stubs.

ThIs PR may be viable to land with a fast follow PR that refactors that stub format and processing if you would like to make that a targeted PR.

jmartin-tech · 2026-02-09T14:37:07Z

garak/probes/base.py

+            if self.skip_root_intents and len(intent) == 1:
+                continue


Given that self.intents is now set as the list of active intents from the service filtered by self.blocked_intent_spec is this still needed? Is intentservice.get_applicable_intents() expected to emit intent group levels or act like parse_plugin_spec() and return an expanded leaf list of active intents?

leondz added 30 commits February 2, 2026 12:41

add policy module, trait typology

33014aa

add policy test, policy info tool

e9ec219

rm policy probe test

991da24

pull iterator mods, rm 'policy' mention, correct typology datapath

f19e44d

move policy test to context aware scanning dir

cf43ae2

strip out mentions of policy probes, link context-aware scanning in d…

0b52bf3

…ocs with intro paragraph

note cas inop

c47ca50

add doc

fbecd80

move leaf node descrs to imperative case

5b51bc1

intent service start

ae4ed38

basic intent service

1562749

split intent stub access to typology, text files, modules

df40d38

add support for code-based intent stub provision

5f48712

tests for validating intent structure

88864c3

ass cas intent tests

892a3e0

test include service load

986277a

express intent around intent stub strings beginning w verb

a5b3921

fix type signature

7ea539e

be flexible about naming of multiple stub files for same intent code

4990aaa

relax intent stub naming reqs

cab42f2

update langservice type declarations

001487a

descr (& marker) file for intent stub dir

4ce71ed

update tests to handle intent stubs readme; test & err msgs now do su…

bec5d10

…bstitution properly

don't black json

f976587

base class + trial e2e for intent probe

dc03670

move prompt construction up; add demo probe

5de9215

couple notes w. prompts, couple intent codes with stubs

eabc02b

allow skipping some intents/traits

59e7900

intentservice doc stub

5283f87

fix arg type

2df0dcb

leondz added 6 commits February 2, 2026 12:44

rename _apply_technique

5eca920

support dynamic detector execution for intent probes having non-unifo…

2cf0807

…rm detection requirement

add intent-to-detectors mapping

a0f02e6

move eval for one detector to own method

f4b3910

switch evaluator between integrated & intent mode

a5230fc

add docstring for grandmaintent probe

998b7da

leondz added the architecture Architectural upgrades label Feb 3, 2026

leondz changed the title ~~Feature/ti pilot 2511~~ cas: technique & intent minimal pilot Feb 3, 2026

leondz added 3 commits February 3, 2026 15:48

Merge branch 'feature/technique_intent' into feature/ti_pilot_2511

98965f8

Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>

define cas config section, relax case-sensistive condition

17e5f46

trim whitespace blocking match

fcb9756

ABeltramo mentioned this pull request Feb 4, 2026

Automated red teaming leondz/garacc#3

Open

leondz requested review from ABeltramo, erickgalinkin and jmartin-tech February 5, 2026 13:13

jmartin-tech requested changes Feb 5, 2026

View reviewed changes

ABeltramo reviewed Feb 6, 2026

View reviewed changes

leondz added 4 commits February 9, 2026 11:36

move responsibility for intent selection & provision to intentservice

5cda92e

update cas tests, add intentprobe test

3726377

update docstrings

d011f44

support json list-of-str stub data format

7de571e

leondz requested a review from jmartin-tech February 9, 2026 11:30

jmartin-tech requested changes Feb 9, 2026

View reviewed changes

		intents = json.load(intents_file)


		def _load_intent_detectors(detectors_path=None) -> None:

		return re.fullmatch("[CTMS]([0-9]{3}([a-z]+)?)?", intent_specifier)


		def expand_intent_specifier_leaves(intent_specifier: str) -> List[str]:

	def expand_intent_specifier_leaves(intent_specifier: str) -> List[str]:
	def _expand_intent_specifier_leaves(intent_specifier: str) -> List[str]:

		self.stubs.extend(expanded_stubs)
		self.stub_intents.extend([intent] * len(expanded_stubs))

cas: technique & intent minimal pilot #1598

Are you sure you want to change the base?

cas: technique & intent minimal pilot #1598

Uh oh!

Conversation

leondz commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's new

Testing

Uh oh!

jmartin-tech left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leondz Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ABeltramo Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leondz Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leondz commented Feb 5, 2026

Uh oh!

ABeltramo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leondz commented Feb 3, 2026 •

edited

Loading

leondz Feb 6, 2026 •

edited

Loading

jmartin-tech Feb 6, 2026 •

edited

Loading

ABeltramo Feb 6, 2026 •

edited

Loading

leondz Feb 6, 2026 •

edited

Loading

ABeltramo Feb 6, 2026 •

edited

Loading

jmartin-tech left a comment •

edited

Loading