![NVIDIA Header](assets/header.png)

# **Securing Agentic AI Developer Day: Garak Demo**

`garak` is a tool for evaluating weaknesses in large language models (LLMs) and LLM-based systems. As AI systems become more complex, securing them against potential attacks become critical. Garak helps identify weaknesses that could be exploited in these models or systems, providing insights into potential security flaws before and after deployment.

In this demo, we will explore how `garak` can be applied to an LLM directly. Our goal is to demonstrate how you can leverage `garak` to analyze potential security vulnerabilities in AI agents.

## Notebook Contents

- [Securing Agentic AI Developer Day: Garak Demo](#securing-agentic-ai-developer-day-garak-demo)
- [Fetch API Key](#fetch-api-key)
- [Generators and Probes](#generators-and-probes)
  - [Generators](#generators)
  - [Probes](#probes)
- [Creating a test](#creating-a-test)
  - [Choosing a Generator](#choosing-a-generator)
  - [Choosing a Probe](#choosing-a-probe)
  - [Running a Test Probe](#running-a-test-probe)
- [Configs: Setting Up Your Environment](#configs-setting-up-your-environment)
  - [Breaking it Down](#breaking-it-down)
- [Evaluating `garak` Results](#evaluating-garak-results)
- [Conclusion](#conclusion)
- [References](#references)

## Fetch API Key 
Run the following cell block to fetch the API Key in the setup phase. 

In [None]:
from dotenv import load_dotenv
load_dotenv()

## Generators and Probes
The power of `garak` comes from the combination of generators, probes and detectors. The **probe** produces the inputs data and the **generator** obtains a reaction from the target and the **probe** evaluates the target's reaction to that data. This collaboration allows you to identify weakness such as how the model might fail under adversarial or stressful conditions. They allow us to comprehensively test the security and robustness of the model.

### Generators
A **generator** is a tool that accepts specific inputs or data that are fed into the model. These inputs can range from typical usage scenarios to adversarial examples designed to challenge the model's behavior. By using the right generator, we can test how the model responds to a variety of input conditions.

To view all of the available generators in `garak` we use the `--list_generators` flag. This command will display all of the available generators that you can use to test a model.

Run the following command to see all of the available generators:

In [None]:
! garak --list_generators

### Probes
A **probe** is a predefined test designed to evaluate how the model responds to inputs created by the generators. Probes are used to simulate various types of attacks or tests to identify weakneesses in the model.  Probes and are paired with detectors flag indicators of weakness from the generator reaction to inputs provided by the probe.

To view all of the available probes in `garak` we use the `--list_probes` flag. This command will display all of the available probes that you can use to test a model.

Run the following command to see all of the available probes:

In [None]:
! garak --list_probes

## Creating a test

Based on the list of probes and generators, select a pair that suits the security issue that you want to test.

## Choosing a Generator

Based on the list of generators, select a generator that matches the execution stack you want to test.

For this demo, we will use `garak.generators.nim`, a pre-configured generator from the list of generators available in `garak`.
- Target Model: `meta/llama-3.3-70b-instruct`
- Generator: `nim`


To specify this model and generator in `garak`, we'll need the following flags:
```
garak --model_name meta/llama-3.3-70b-instruct --model_type nim
```

## Choosing a Probe
`garak` includes a wide range of probes. You can either choose to run specific probes to evaluate targeted weaknesses or run a broad scan to uncover general weaknesses.

If you run `garak` without specifying a probe, it will execute a **default** scan. This scan runs a comprehensive set of probes to evaluate the model's security across multiple dimensions. This is a quick way to get a broad understanding of how the model behaves under different conditions.

If you want to focus on a specific area or weakness, you can select individual probes by using the `-p` flag followed by one or two probe names. This allows you to be more precise in your testing and focus on particular aspects of the model.

### Running a Test Probe

Let's try running a simple test probe to see how the model handles a basic security check. Use the following command to execute a quick scan.

In [None]:
! garak --model_name meta/llama-3.3-70b-instruct --model_type nim --probes test.Test --generations 1

## Configs: Setting Up Your Envionment

If we are running a large number of probes or need to specify additional generator arguments, we can use a config file.

These config files are written in YAML format and help streamline the CLI invocation. While using a config file is more simpler, all of the same options can still be passed via the CLI if preferred.

Here is an example config:

```
---
system:
  parallel_attempts: 2

run:
  generations: 3

plugins:
  probe_spec: xss.MarkdownImageExfil,suffix.GCGCached,dan.DanInTheWildMini,latentinjection.LatentInjectionResume
  extended_detectors: false
  probes:
    encoding:
      payloads:
        - default
        - xss
  model_type: nim
  model_name: meta/llama-3.3-70b-instruct

reporting:
  report_prefix: guardrails_demo
```

### Breaking it Down

```
system:
  parallel_attempts: 2
```
This will parallelize our attempts (running 2 at once) to speed up the process -- since thousands of attempts are made, this can reduce run time by quite a bit, but needs to be balanced with our available compute and memory capacity.

```
run:
  generations: 3
```
Since LLMs generally will not produce the exact same output given the same input at each generation, we typically want multiple generations so we can average out the behavior.
Setting `generations` to 1 will be faster but provide a less full picture of behavior, setting `generations` very high will provide a fuller picture of a model's typical behavior, but will take longer.

```
plugins:
```
The plugins heading handles all `plugins` -- generators, probes, buffs, and detectors.

```
  probe_spec: xss.MarkdownImageExfil,suffix.GCGCached,dan.DanInTheWildMini,latentinjection.LatentInjectionResume
  extended_detectors: false
```
The probes that we choose here are all categories that can expose potential security risks in agentic systems.
* `xss.MarkdownImageExfil` tells us about potential risks related to [cross site scripting](https://owasp.org/www-community/attacks/xss/)
* `suffix.GCGCached` is an adversarial suffix attack
* `dan.DanInTheWildMini` is a selection of known jailbreak prompts that have been effective in the wild
* The `latentinjection` probe looks at potential prompt injection risks associated with third party data.
By setting `extended_detectors: false`, we reduce the overhead of our detectors, running only the primary detectors for each probe.

```
  model_type: nim
  model_name: meta/llama-3.3-70b-instruct
```
Here, we specify that we're using the `nim` generator with the `meta/llama-3.3-70b-instruct` model.
If we want to specify additional parameters, we can do so under the `generators/nim` part of the YAML file.

In some cases, like with `huggingface` generators, we will want to specify, for example, that `trust_remote_code` should be `True`, but can specify other things like to use `fp16` or `bf16`, `max_tokens`, or any other parameter the generator accepts.

```
reporting:
  report_prefix: garak_demo
```
This part of the config tells us to prefix the report with the string `garak_demo` instead of the run ID, making it easier to find our outputs.

This config is saved locally under `demo.conf`, so let's run it by passing the config to `garak` via the `--config` CLI argument.

**NOTE**: garak runs can take a while! Feel free to peruse the [garak user guide](https://docs.garak.ai/garak) or look at the [probe docs](https://reference.garak.ai/en/latest/probes.html) while you wait.

In [None]:
!garak --config demo.conf

## Evaluating `garak` Results
After running your selected generators and probes with `garak` it's time to evaluate the results. The feedback you receive will provide insights into the security and performance of the model.

The raw data for all probes and evaluations is in the `report.jsonl` file -- a JSON lines file containing all of the prompts, responses, and detector results for your run.

This is aggregated in a reader-friendly format in the `report.html` file.

`garak` uses the [XDG Base Directory](https://specifications.freedesktop.org/basedir-spec/latest/) specification. Here is where results will be located depending on your OS:
- **Linux**:  `/home/{your_username}/.local/share/garak/garak_runs/`.
- **MacOS** `/Users/{your_username}/.local/share/garak/garak_runs`.
- **Windows** `%USERPROFILE%/.local/share/garak/garak_runs`.

If you are running locally, you can copy the report over or simply point your browser to the location on disk. 
If you've kept the report prefix and are running on Linux, that should be `/home/{your_username}/.local/share/garak/garak_runs/garak_demo.report.html`. 
If you are on another operating system, swap out the XDG base directory as appropriate.


In [None]:
! cp /root/.local/share/garak/garak_runs/garak_demo.report.html .

## Conclusion
In this notebook, we explored Garak, a tool for assessing the security of large language models (LLMs) and AI systems. By using probes and generators, we demonstrated how Garak helps identify weaknesses such as adversarial inputs, prompt injections, and performance issues.

For agentic AI workflows, this is especially important, as these systems must operate securely and predictably in dynamic environments. Garak allows us to proactively test these models, ensuring they can handle unexpected inputs and perform safely within defined boundaries.

By integrating Garak into agentic AI development, we can create more resilient, secure systems that minimize risk and maintain trust, laying the groundwork for reliable and safe AI-driven workflows.

## References

1. [Garak GitHub Repository](https://github.com/NVIDIA/garak): The official repository for Garak, containing the source code, examples, and documentation.

2. [Garak Documentation](https://github.com/NVIDIA/garak/blob/main/README.md): Detailed documentation on how to use Garak, including setup, probes, and generators.
