Skip to content

dqyaa/miaf

Repository files navigation

This is a comprehensive and well-structured set of files. You have successfully moved from a conceptual "Immune System" to a functional Red-Teaming and Defense Pipeline.

Regarding your question about the sandbox: Now that I see run_pipeline.py, build_baseline.py, and your hook managers, my advice is even more specific.

  1. Do you need a "High Security" Online Sandbox? No. An online high-security sandbox (like a remote protected cloud) is not necessary for this specific code.

Why? Your code performs "internal" analysis. It uses forward_hooks to look at the "brain" (hidden states) of the LLM. It doesn't give the LLM the power to delete files, access your webcam, or browse the web. The "attacks" you are running are purely text-based prompts designed to trick the model's logic, not to exploit your computer's operating system.

  1. The Recommended Local Setup A Conda environment on your local machine is the perfect "sandbox" for this project. It isolates your dependencies (the specific versions in your requirements.txt) without the complexity of a virtual machine.

Step-by-step Setup:

Create the environment:

Bash conda create -n miaf_env python=3.9 -y conda activate miaf_env Install dependencies:

Bash pip install -r requirements.txt GPU Check: Since you are using IDEA-CCNL/Ziya-LLaMA-7B-v1, you will need an NVIDIA GPU with at least 16GB-24GB of VRAM (or use 4-bit/8-bit quantization in load_model.py to fit it on smaller cards).

  1. Critical Execution Order To make your files work together, you must run them in this specific order:

Step A: Generate the "Healthy" Baseline Run build_baseline.py. This uses the BASELINE_TEXTS to calculate what "normal" hidden states look like. It will save a baseline.pkl.

Without this, your Mahalanobis and Isolation Forest detectors will have no reference for what is "normal."

Step B: Test Detection Run run_detection.py. Input a normal sentence, then input an aggressive "Ignore all instructions" prompt to see if the Z-score jumps.

Step C: Run the Full Attack/Defense Loop Run run_pipeline.py. This orchestrates the AttackerModel generating a threat, the ForwardHookManager catching it, and the RuleBasedController suppressing it.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages