-
Notifications
You must be signed in to change notification settings - Fork 780
cas: technique & intent minimal pilot #1598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/technique_intent
Are you sure you want to change the base?
cas: technique & intent minimal pilot #1598
Conversation
…ocs with intro paragraph
…bstitution properly
…rm detection requirement
Signed-off-by: Leon Derczynski <leonderczynski@gmail.com>
jmartin-tech
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial thoughts, comments reflect recommendation based on this being an iterative change that does not yet reflect the final usage pattern.
I would suggest the interactions with the intentservice should be limited to:
- the harness during runtime for detector instantiation
- probe initialization (as a read only function)
| cas: | ||
| intent_spec: | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other spec types are all placed under plugin, long term I would suggest enhancement of the supported format for probe_spec is desired. Since this influences probe selection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a pattern that's been on the cards for a while is to:
- remove plugins and replace with its major constituent parts at top level
- move run-level config to run
currently the plan is to place everything CAS-related under cas (e.g. typology files, hierarchy usage in e.g. detector selection). this gets us landed without distraction.
if reworking config structure is part of CAS scope, I'd prefer to rework the whole config setup, cf:
- config: move plugin-specific config to top-level #520
- Reorganise
_config- move run-specific items out ofpluginsand intorun#931 - feature: support run intent specification #1436
- generators: move read timeout up to global/
base.Generatorconfig item #715 - config: support list-type data for
probe_specanddetector_specin json and yaml configs #1563
but I don't hear a strong reason to have this much-needed rework block landing cas draft - it's out of scope of this PR
alternative cas-specific pattern is to move intent_spec to run , where it can be quickly configured by probes, and allow probe-specific overrides. plugins as a home for intent_spec doesn't afford this and isn't the right place for probe_spec in the first place, really
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we start to build out services it might be best to consolidate them in a namespace organize them better, consider moving this and the langservice into garak/services/intentservice.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, considered this at the time. but n=2 is too low to start, for me (PRs welcome)
garak/intentservice.py
Outdated
| intents = {} | ||
| intent_detectors = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer a single global state object, possibly wrapped as a custom class. Using the "module" level to hold multiple state objects gets tricky to track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here intentservice follows the pattern in langservice for langproviders - perhaps that var could do with some documentation regarding usage
garak/intentservice.py
Outdated
| intents = json.load(intents_file) | ||
|
|
||
|
|
||
| def _load_intent_detectors(detectors_path=None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to be loading a mapping of intent taxonomy buckets to the set of detectors that may work with that particular classifier.
Given this and thinking of later usage, should the probe also be able to specify an additional filter that should be taken into account when retrieving detectors for an attempt based on an intent?
I suspect that many techniques will impact the detectors that are viable and may even need to augment or chain response processing to be suitable for detectors that are directly associated to the taxonomy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this and thinking of later usage, should the probe also be able to specify an additional filter that should be taken into account when retrieving detectors for an attempt based on an intent?
I don't follow this, can you unpack? Especially lost at what agency the probe has in this process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect some techniques implemented in probes will make certain detectors no longer viable unless used in combination with some previous detector or processing. Maybe this is something that the _post_process_attempt() hook is the solution for, which may be a way to ensure probe can own if this is in fact the case.
| # search path: cas/intent_text/xxx_*.txt | ||
| stub_filepaths = cas_data_path / "intent_stubs" / f"{intent_code}*.txt" | ||
| stub_dir = cas_data_path / "intent_stubs" | ||
| stub_glob = stub_dir.glob(f"{intent_code}*.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful for this PR to come with an example stub file or two, this would offer more clarity on the expected format.
The readme in that intent_stubs path offers format information, however I would also hazard that the project should support multiline stub prompts and txt file extension suggests each file contains a single stub per line. A more structured format might be desired to avoid users needing to use raw \n to format entires in a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also hazard that the project should support multiline stub prompts
I couldn't agree more. The format should be general enough that we can support multi-turn conversations in input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful for this PR to come with an example stub file or two, this would offer more clarity on the expected format.
These are already in the target upstream branch, see #1481
Multiple sources are supported, because our data is in multiple formats:
- py bearing a method returning iterable
- one-per-line txt
- descr (falling back to title) in intent typology
Agree some examples would be nice - this runs already with the basic items in the typology.
The format should be general enough that we can support multi-turn conversations in input.
We should get to that before CAS is done, agree. Let's put it on the roadmap for another PR - this one already has enough novel material re: probing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion has suggested that yaml format using a list of OpenAI style conversation may be a good primitive here that may be easily crafted.
A possible suggested format, naming of the top level list and entry values might needs some tweaking for Python.
T999-sample.yaml
stubs:
- stub_1:
- role: user
content: |
my super mean start
with complex formatting like more than one
line of content
- role: assistant
content: faked conversation response
- role: user
content: expanded goal
- stub_2:
- role: user
content: my super nice goal
garak/intentservice.py
Outdated
|
|
||
| def get_intent_stubs(intent_specifier: str) -> List[str]: | ||
| def _validate_intent_specifier(intent_specifier: str) -> bool: | ||
| return re.fullmatch("[CTMS]([0-9]{3}([a-z]+)?)?", intent_specifier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider extracting this regex into a CONSTANT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the advantage when consumed once? Are you thinking of compilation overhead?
| def intent_to_detectors(intent_specifier: str) -> Set[str] | None: | ||
| """return most specific set of detectors applicable to a single intent""" | ||
|
|
||
| global intent_detectors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for global directive when accessing for read only. Comment is for consistency with other code that accesses module globals as read only.
Any use of the global directive should likely be given extra scrutiny.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
following pattern in langservice - PR welcome if another setup suits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a follow patterns thing, just a python thing that should probably be more consistent. If we have other places in the code that are using the global directive in a method that just uses read access to the value we don't need to call global, and it would follow least privilege principals better.
garak/probes/base.py
Outdated
|
|
||
| def __init__(self, config_root=_config): | ||
| super().__init__(config_root) | ||
| self._populate_intents(config_root.cas.intent_spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no guarantee that config_root is _config. This should not be accessed by a probe.
I would prefer to see the the enabled intents be activated in the intentservice when it initializes.
A probe can simply call garak.intentservice.get_intent_stubs() at most passing in the compatible intents for the probe as a filter if they have been defined.
Or alternatively, an IntentProbe calls the intentservice and uses the returned set or filters the returned set itself based on the intent categories the technique is compatible with.
Further in the execution it looks like the current pattern is for the probe to iterate over each activated intent from the global spec itself. I don't think this should not be the responsibility of the probe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no guarantee that config_root is _config. This should not be accessed by a probe.
Is this a deviation to how Configurable is used currently across all first-class plugins? Please indicate the change you're suggesting, it's not clear to me
I would prefer to see the the enabled intents be activated in the intentservice when it initializes.
This locks us out of per-probe intent specification, disprefer
Or alternatively, an IntentProbe calls the intentservice and uses the returned set or filters the returned set itself based on the intent categories the technique is compatible with.
That's interesting. How would those categories be described? How does this deal with changes in the typology?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a probe cannot support a specific intent typology the probe class would have this description, passing the description as a filter on the stubs to be returned handles that. If we don't want to pass the filter then the probe will need to do the filtering based on metadata from the return of get_intent_stubs(). I believe this meshes well with this comment about defining a class that get_intent_stubs() returns.
Also the idea that stubs returned should already be garak.Attempt.Conversation objects suggests even more strongly that a class to capture the source intent classifier may be appropriate.
garak/intentservice.py
Outdated
| return re.fullmatch("[CTMS]([0-9]{3}([a-z]+)?)?", intent_specifier) | ||
|
|
||
|
|
||
| def expand_intent_specifier_leaves(intent_specifier: str) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect, this could be a private helper called once when the service initializes with the return value populating a single runtime state object for active intent categories.
| def expand_intent_specifier_leaves(intent_specifier: str) -> List[str]: | |
| def _expand_intent_specifier_leaves(intent_specifier: str) -> List[str]: |
| ) | ||
| + "\n" # generator,probe,prompt,trigger,result,detector,score,run id,attemptid, | ||
| ) | ||
| if "intent" not in attempts[0].notes: # not an intent probe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan of relying on notes to determine this, would it be reasonable check the plugin class type instead?
Looking further, this does not seem like something the evaluator should know about, detectors have already been executed against the attempts based on the intent service recommendation. Evaluation should not need treat them differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan of relying on notes to determine this, would it be reasonable check the plugin class type instead?
Are we consistently guaranteed access to that given an Attempt object?
Looking further, this does not seem like something the evaluator should know about, detectors have already been executed against the attempts based on the intent service recommendation. Evaluation should not need treat them differently.
What's your alternative suggestion for managing evaluation and printing of detector results when the detector(s) used varies between each attempt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed separately, the processing changes to this class target at a specific idiosyncrasy that has been identified where Attempts for a single probe are no longer expected to have a result for all detectors that have been used by the probe. I would hazard this should simply be handled more abstractly and all Attempts should be treated as having this possible idiosyncrasy.
|
Lots of this is as expected, good. Ahead of tomorrow can you clarify which parts you think must be in the landing PR and are within scope of CAS, so we can start negotiating on that? I assume that many potential directions highlighted (eg namespaced services) are not immediate requirements from anyone's side, but rather eventual target patterns |
ABeltramo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with most of what has been said already by Jeffrey, I'm also a bit confused by the intent_detectors.
Probes already have a notion of primary_detector, I would expect that the intent_detectors would be an addition to that, but the code seems to suggest it's a filter instead. How would that fit with the base Probe.probe calling basically primary_detector.detect?
| detectors = garak.intentservice.intent_to_detectors(intent_observed) | ||
| if detectors is None: | ||
| logging.warning( | ||
| "No detectors specified for intent %s" % intent_observed | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should default on using the input detectors (same as when the probe isn't an instance of IntentProbe above) when the intentservice doesn't have any recommendation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by input detectors?
This code should only be executed when dealing with an IntentProbe's results, if not, that's a bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by input detectors?
The Harness run method takes detectors in input:
def run(self, model, probes, detectors, evaluator, announce_probe=True)which are then used in the old non-intent codepath:
if not isinstance(probe, garak.probes.base.IntentProbe):
for d in detectors:
self._run_detector(attempt_results, d)The new code would override that input detectors
detectors = garak.intentservice.intent_to_detectors(intent_observed)but I suppose that's intended behaviour looking at your other replies..
The only issue with that, now that I look at it, is that if you have a mixture of IntentProbes and good old Probes overriding the detectors would mess up with the next step in the loop for probe in probes.
Ex:
probes = ["DAN", "GrandmaIntent", "Malware"]
# ...
for probe in probes:
# DAN will use input `detectors`
# GrandmaIntent would override `detectors`
# Malware would use the overriden `detectors` from GrandmaIntentThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually points out an edge not yet managed, what happens when a detector_spec is provided and intent based probes are activated for the run?
Do the intent probes use the detectors in detector_spec entries for every Attempt or will they diverge from the existing behavior?
If diverging how should they determine when detector_spec is to override the intent's detector?
Does this change how we think of detector_spec in general?
| @@ -0,0 +1,23 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan here to have complete coverage of the intents in trait_typology or is this just a list of "additional suggested" detectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan is to map all the detectors we have, into the intent typology. This PR focuses on getting a basic pattern for intent probing in place, rather than adding detectors
| self.stubs.extend(expanded_stubs) | ||
| self.stub_intents.extend([intent] * len(expanded_stubs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to have a proper Stub object which wraps the original intent + the output prompt? It would simplify the code and make the connection more explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be. the output prompt is predicated on both intents and also what the probe does with them, at which point they're no longer stubs (i.e. no longer independent of probe).
can you say more about the pattern you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mixing up terminology, let me rephrase it. We have something like:
intent -- N --> stub -- N --> prompt
right?
At the moment you are making a few arrays to keep track of the input intent from the prompt:
"1,2" # Intents in input config_root.cas.intent_spec
["A", "B", "C"] # stubs
["1", "1", "2"] # stub_intents
["q", "w", "e", "r", "t"] # prompts
["1", "1", "1", "2", "2"] # prompt_notesso that ultimately you can link back in the attempt the original intent.
The first obvious issue is that you lose the connection between prompt and stub this way; which specific stub generated this prompt? In the example above was w generated from A or B?
Wouldn't it be all much easier if we make a Stub (put better naming here if needed) data class? Something like
@dataclass
class Stub:
intent: str
stub: strso that the code becomes
def _expand_stub(self, stub: Stub) -> List[Stub]
# ...
def prompts_from_stub(self, stub: Stub) -> List[str]And the final _build_prompts becomes something like:
def _build_prompts(self):
"""In the most basic case, consume self.stubs and populate self.prompts"""
self.prompts = []
self.prompt_notes = []
for i, stub in enumerate(self.stubs):
prompts = self.prompts_from_stub(stub)
self.prompts.extend(prompts)
self.prompt_notes.extend([stub] * len(prompts))If we want to keep self.prompts as a str, otherwise we could use a more generic Conversation and pass the Stub in the notes and get rid of self.prompt_notes altogether
Detector selection is predicated on the intent used. Intent-based probes generate many attempts, each attempt having one intent but with the attempts representing many different intents overall. So, the probe:detector coupling no longer makes sense - the coupling is intent:detector (1:), and the probe:intent relationship is also 1:
Harnesses orchestrate this not probes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shift of responsibility to intentserivce as the authority on active intents look good.
The remaining issue here is building out the stub object and ensuring setting the file format expectations for those stubs.
ThIs PR may be viable to land with a fast follow PR that refactors that stub format and processing if you would like to make that a targeted PR.
| if self.skip_root_intents and len(intent) == 1: | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that self.intents is now set as the list of active intents from the service filtered by self.blocked_intent_spec is this still needed? Is intentservice.get_applicable_intents() expected to emit intent group levels or act like parse_plugin_spec() and return an expanded leaf list of active intents?
Minimal end-to-end implementation of a technique & intent probe
What's new
IntentProbeclass for a probe that can consume an intent config, summon stubs from the intent service, and conduct basic probingprobes.grandma.GrandmaIntentcassection to_config--intentscli arg, mapping toconfig.cas.intentsevaluators.base.Evaluatorto switch between both non-intent and intent probe result setscas.pyfor managing trait/policy structuresTesting
python -m garak --config ti_pilot.ymlwhereti_pilot.ymlis:(may need to flush plugin cache first)