# Interactive Distributed Red Teaming

- This notebook demonstrates how to perform distributed Red Teaming using garak + Ray + MLFlow.
- A version of this code, with tweaks to transform it into a Kubeflow pipeine, is available in [distrt_pipeline.py](distrt_pipeline.py).
- [](kfp_launcher.ipynb) then takes the [distrt_pipeline.py](distrt_pipeline.py), transforms it into a [kfp yaml](madrigal_pipeline.yaml) file and submits it to Kubeflow to run.

### Prerequisites
1. A running KubeRay cluster that you can connect to with `ray://localhost:10001` or your own Ray instance.
2. An MLflow server to track experiment data (`MLFLOW_ADDRESS`).

### Usage
1. Adjust the `RAY_CLUSTER_ADDRESS` and `MLFLOW_ADDRESS` constants to match your environment.
2. Run each cell in sequence.
3. Ray will parallelize the execution of Garak probes.
4. Artifacts and logs from Garak runs will be uploaded to MLflow.
5. At the end, a combined JSONL file and an HTML report (`final_report.html`) will be generated locally.

## 1. Environment Initialization
In the next cell, we:
- Configure Ray to connect to the running cluster.
- Specify environment variables for MLflow.
- Shut down any existing Ray instance and re-initialize with the new settings.
Make sure to update `RAY_CLUSTER_ADDRESS` and `MLFLOW_ADDRESS` as needed.

In [1]:
import ray
import os 

RAY_CLUSTER_ADDRESS = "ray://localhost:10001"  # exposed using kubectl -n raycluster port-forward svc/raycluster-kuberay-head-svc 10001 &
MLFLOW_ADDRESS = 'http://mlflow-tracking.mlflow.svc'

os.environ["RAY_CHDIR_TO_TRIAL_DIR"] = "0"

ray.shutdown()

ray.init(
        address=RAY_CLUSTER_ADDRESS,
        log_to_driver=False,
        runtime_env={
            "pip": ["torch", "transformers", "garak", "mlflow"],
            "env_vars": {
                'MLFLOW_TRACKING_URI': MLFLOW_ADDRESS
            },
            "working_dir": "."
        }
    )

## 2. Imports and Helper Functions
Here, we:
- Import `garak`, `mlflow`, and other necessary libraries.
- (Optionally) reload the `custom_generator` module.
- Define a helper function `combine_jsonl_from_dir` to combine multiple Garak report outputs into a single JSONL file.
- Define a Ray-remote function `run_probe` which:
  1. Runs Garak's CLI to probe a custom model.
  2. Logs artifacts to MLflow.
- Define a Ray-remote function `run_red_teaming` which:
  1. Starts an MLflow run.
  2. Distributes multiple probes using Ray.
  3. Downloads artifacts and combines them into a single JSONL.
  4. Creates an HTML digest of the final results and logs it to MLflow.


In [None]:
import os
import json
import shutil
import importlib
from datetime import datetime

import ray
import mlflow

import garak
import garak.cli
from garak.command import write_report_digest
from garak.report import Report

import custom_generator

In [54]:
# Reload custom_generator to ensure any dynamic code changes are captured
importlib.reload(custom_generator)


def combine_jsonl_from_dir(directory: str, output_file: str) -> None:
    """
    Combine multiple JSONL files from a given directory into a single output file.

    Args:
        directory (str): The directory containing the .jsonl files to combine.
        output_file (str): The path to the consolidated JSONL file.
    """

    # Collect all files with the suffix "report.jsonl" from the directory
    jsonl_files = [
        f for f in os.listdir(directory) 
        if f.endswith("report.jsonl")
    ]
    # Convert to absolute paths
    jsonl_files = [os.path.join(directory, f) for f in jsonl_files]

    if not jsonl_files:
        print("⚠️ No JSONL files found in the directory.")
        return

    with open(output_file, "w") as outfile:
        for file_path in jsonl_files:
            with open(file_path, "r") as infile:
                for line in infile:
                    line = line.strip()
                    
                    # Skip empty lines
                    if not line:
                        continue
                    
                    # Validate JSON. If valid, write it to the output file.
                    try:
                        json.loads(line)
                        outfile.write(line + "\n")
                    except json.JSONDecodeError as e:
                        print(f"❌ Skipping malformed JSON in {file_path}: {e} → {line}")

    print(f"✅ Combined {len(jsonl_files)} JSONL files into {output_file}")

In [None]:


@ray.remote
def run_probe(probe_name: str, mlflow_runid: str) -> None:
    """
    Execute a single Garak probe using a custom model, then log the artifacts to MLflow.
    
    Args:
        probe_name (str): Name of the probe to run.
        mlflow_runid (str): Existing MLflow run ID to which the artifacts are logged.
    """

    # TODO: Make this parametrizable if needed
    garak_runs_dir = "/home/ray/.local/share/garak/garak_runs/"

    # Optionally remove old runs before the new run (commented out by default):
    # for item in os.scandir(garak_runs_dir):
    #     (shutil.rmtree if item.is_dir() else os.unlink)(item.path)

    # Construct CLI command
    cli_command = (
        "--parallel_requests 1 "
        "--model_type function "
        "--model_name custom_generator#generate_response "
        "--probes {probe_name}"
    )
    cli_command = cli_command.format(probe_name=probe_name)

    # Run Garak CLI
    garak.cli.main(cli_command.split())

    # Log artifacts to the shared MLflow run
    with mlflow.start_run(run_id=mlflow_runid):
        mlflow.log_artifacts(garak_runs_dir)


@ray.remote
def run_red_teaming(probes_list: list) -> None:
    """
    Run red teaming tests on a list of probes in parallel using Ray and log artifacts to MLflow.
    
    Args:
        probes_list (list): A list of probe names to run.
    """

    # Create or use an existing experiment named "garak_runs"
    mlflow.set_experiment("garak_runs")

    # Start a new MLflow run
    with mlflow.start_run(run_name=datetime.now().strftime("%Y-%m-%d_%H-%M-%S")) as mlflow_run:
        mlflow_runid = mlflow_run.info.run_id
        print(f"📝 MLflow Run ID: {mlflow_runid}")

        # Kick off Ray tasks to run each probe in parallel, passing the shared MLflow run ID
        futures = [run_probe.remote(probe_name, mlflow_runid) for probe_name in probes_list]
        ray.get(futures)  # Wait for all tasks to complete

        # Download artifacts produced by each probe run into a local directory
        mlflow.artifacts.download_artifacts(run_id=mlflow_runid, dst_path="./combined_logs")

        # Combine the resulting JSONL files for further processing/reporting
        combine_jsonl_from_dir("./combined_logs", "combined_logs.jsonl")

        # Generate an HTML report from the combined logs
        write_report_digest("combined_logs.jsonl", "./final_report.html")
        
        # For debugging: see current directory contents
        print(os.listdir())
        
        print("HTML contents written to final_report.html")

        # Log the final HTML report to MLflow
        mlflow.log_artifact("./final_report.html")

## 3. Running the Red Teaming Probes
In this cell, we specify a list of probe names (e.g., `grandma.Substances`, `grandma.Slurs`, etc.) and call the `run_red_teaming` function asynchronously using `Ray`.

Once the probes complete, their output is combined into a single JSONL file, and an HTML report (`final_report.html`) is generated.

In [55]:
runs = run_red_teaming.remote(['grandma.Substances', 'grandma.Slurs', 'grandma.Win10', 'lmrc.Bullying', 'lmrc.Profanity'])
ray.get(runs)