GitHub - dqyaa/miaf

This is a comprehensive and well-structured set of files. You have successfully moved from a conceptual "Immune System" to a functional Red-Teaming and Defense Pipeline.

Regarding your question about the sandbox: Now that I see run_pipeline.py, build_baseline.py, and your hook managers, my advice is even more specific.

Do you need a "High Security" Online Sandbox? No. An online high-security sandbox (like a remote protected cloud) is not necessary for this specific code.

Why? Your code performs "internal" analysis. It uses forward_hooks to look at the "brain" (hidden states) of the LLM. It doesn't give the LLM the power to delete files, access your webcam, or browse the web. The "attacks" you are running are purely text-based prompts designed to trick the model's logic, not to exploit your computer's operating system.

The Recommended Local Setup A Conda environment on your local machine is the perfect "sandbox" for this project. It isolates your dependencies (the specific versions in your requirements.txt) without the complexity of a virtual machine.

Step-by-step Setup:

Create the environment:

Bash conda create -n miaf_env python=3.9 -y conda activate miaf_env Install dependencies:

Bash pip install -r requirements.txt GPU Check: Since you are using IDEA-CCNL/Ziya-LLaMA-7B-v1, you will need an NVIDIA GPU with at least 16GB-24GB of VRAM (or use 4-bit/8-bit quantization in load_model.py to fit it on smaller cards).

Critical Execution Order To make your files work together, you must run them in this specific order:

Step A: Generate the "Healthy" Baseline Run build_baseline.py. This uses the BASELINE_TEXTS to calculate what "normal" hidden states look like. It will save a baseline.pkl.

Without this, your Mahalanobis and Isolation Forest detectors will have no reference for what is "normal."

Step B: Test Detection Run run_detection.py. Input a normal sentence, then input an aggressive "Ignore all instructions" prompt to see if the Z-score jumps.

Step C: Run the Full Attack/Defense Loop Run run_pipeline.py. This orchestrates the AttackerModel generating a threat, the ForwardHookManager catching it, and the RuleBasedController suppressing it.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
anomaly		anomaly
attacker		attacker
backbone		backbone
config		config
controller		controller
evaluation		evaluation
experiments		experiments
monitoring		monitoring
.gitattributes		.gitattributes
build_baseline.py		build_baseline.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages