# Garak-based Red Teaming Notebook

This notebook demonstrates how to use [Garak](https://github.com/chatsim/garak) to run red-teaming probes against a custom text generation model.

## Overview
- **Ray** is used for distributed execution.
- **MLflow** is used for experiment tracking.
- **Garak** is used to generate and run probes.

### Prerequisites
1. A running Ray cluster that you can connect to with `ray://localhost:10001` or your own Ray instance.
2. An MLflow server to track experiment data (`MLFLOW_ADDRESS`).
3. The Python packages required by `runtime_env` (such as `torch`, `transformers`, `garak`, `mlflow`).

### Usage
1. Adjust the `RAY_CLUSTER_ADDRESS` and `MLFLOW_ADDRESS` constants to match your environment.
2. Run each cell in sequence.
3. Ray will parallelize the execution of Garak probes.
4. Artifacts and logs from Garak runs will be uploaded to MLflow.
5. At the end, a combined JSONL file and an HTML report (`final_report.html`) will be generated locally.

## 1. Environment Initialization
In the next cell, we:
- Configure Ray to connect to the running cluster.
- Specify environment variables for MLflow.
- Shut down any existing Ray instance and re-initialize with the new settings.
Make sure to update `RAY_CLUSTER_ADDRESS` and `MLFLOW_ADDRESS` as needed.

In [1]:
import ray
import os 

RAY_CLUSTER_ADDRESS = "ray://localhost:10001"  # exposed using kubectl -n raycluster port-forward svc/raycluster-kuberay-head-svc 10001 &
MLFLOW_ADDRESS = 'http://mlflow-tracking.mlflow.svc'

os.environ["RAY_CHDIR_TO_TRIAL_DIR"] = "0"

ray.shutdown()

ray.init(
        address=RAY_CLUSTER_ADDRESS,
        log_to_driver=False,
        runtime_env={
            "pip": ["torch", "transformers", "garak", "mlflow"],
            "env_vars": {
                'MLFLOW_TRACKING_URI': MLFLOW_ADDRESS
            },
            "working_dir": "."
        }
    )

## 2. Imports and Helper Functions
Here, we:
- Import `garak`, `mlflow`, and other necessary libraries.
- (Optionally) reload the `custom_generator` module.
- Define a helper function `combine_jsonl_from_dir` to combine multiple Garak report outputs into a single JSONL file.
- Define a Ray-remote function `run_probe` which:
  1. Runs Garak's CLI to probe a custom model.
  2. Logs artifacts to MLflow.
- Define a Ray-remote function `run_red_teaming` which:
  1. Starts an MLflow run.
  2. Distributes multiple probes using Ray.
  3. Downloads artifacts and combines them into a single JSONL.
  4. Creates an HTML digest of the final results and logs it to MLflow.


In [54]:
import garak
import garak.cli
from garak.report import Report
import custom_generator
import importlib
from garak.command import write_report_digest
import mlflow
import shutil
import json
from datetime import datetime

importlib.reload(custom_generator)


def combine_jsonl_from_dir(directory, output_file):
    jsonl_files = [f for f in os.listdir(directory) if f.endswith("report.jsonl")]  # Get all JSONL files
    jsonl_files = [os.path.join(directory, f) for f in jsonl_files]  # Full file paths

    if not jsonl_files:
        print("⚠️ No JSONL files found in the directory.")
        return

    with open(output_file, "w") as outfile:
        for file in jsonl_files:
            with open(file, "r") as infile:
                for line in infile:
                    line = line.strip()  # Remove extra whitespace
                    
                    if not line:  # Ignore empty lines
                        continue
                    
                    try:
                        json.loads(line)  # Validate JSON
                        outfile.write(line + "\n")  # Ensure each JSON object is on a separate line
                    except json.JSONDecodeError as e:
                        print(f"❌ Skipping malformed JSON in {file}: {e} → {line}")

    print(f"✅ Combined {len(jsonl_files)} JSONL files into {output_file}")


@ray.remote
def run_probe(probe_name, mlflow_runid):

    garak_runs_dir = '/home/ray/.local/share/garak/garak_runs/' #ToDo: find a way to parametrise this
    
    # Optionally, you could remove old runs here:
    # for item in os.scandir(garak_runs_dir):
    #     (shutil.rmtree if item.is_dir() else os.unlink)(item.path)
    
    cli_command = '--parallel_requests 1 --model_type function --model_name custom_generator#generate_response --probes {probe_name} '
    cli_command = cli_command.format(probe_name=probe_name)
    garak.cli.main(cli_command.split())
    
    with mlflow.start_run(run_id=mlflow_runid):  # Use shared MLflow run ID
        mlflow.log_artifacts(garak_runs_dir) 


@ray.remote
def run_red_teaming(probes_list):
    mlflow.set_experiment("garak_runs")
    with mlflow.start_run(run_name=datetime.now().strftime("%Y-%m-%d_%H-%M-%S")) as run:
        mlflow_runid = run.info.run_id  # Get the Run ID to share across Ray workers
        print(f"📝 MLflow Run ID: {mlflow_runid}")
        
        futures = [run_probe.remote(probe_name, mlflow_runid) for probe_name in probes_list]
        results = ray.get(futures)
        
        mlflow.artifacts.download_artifacts(run_id=mlflow_runid, dst_path='./combined_logs')  
        combine_jsonl_from_dir('./combined_logs', 'combined_logs.jsonl')
        
        write_report_digest('combined_logs.jsonl', './final_report.html')
        print(os.listdir())
        print("HTML contents writtten to final_report.html")
        mlflow.log_artifact('./final_report.html')

## 3. Running the Red Teaming Probes
In this cell, we specify a list of probe names (e.g., `grandma.Substances`, `grandma.Slurs`, etc.) and call the `run_red_teaming` function asynchronously using `Ray`.

Once the probes complete, their output is combined into a single JSONL file, and an HTML report (`final_report.html`) is generated.

In [55]:
runs = run_red_teaming.remote(['grandma.Substances', 'grandma.Slurs', 'grandma.Win10', 'lmrc.Bullying', 'lmrc.Profanity'])
ray.get(runs)