Skip to content

CrystalEye42/eval-safety

 
 

Repository files navigation

Safety in Pruning

This is a repository for replicating the experiments from our paper: Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning .

Getting Started

Install the dependencies and obtain a Wanda pruned model checkpoint as described in the original repository

Generating outputs for our jailbreaking dataset

Run the following command to generate model responses to our jailbreaking dataset (integrated.yaml). Depending on the base model, set the prompt template to be one of llama, vicuna, or mistral for correct inference.

python inference.py \
  --model path/to/model \
  --dataset path/to/dataset \
  --template llama|vicuna|mistral

Benchmarking model

We provide methods for running various benchmarks. To run the AltQA long context test or the WikiText perplexity test, run the following. Depending on the base model, set the prompt template to be one of llama, vicuna, or mistral for correct inference.

python evaluate.py \
  --model_path path/to/model \
  --output_path path/to/output/directory \
  --template llama|vicuna|mistral \
  --benchmark altqa|wikitext

About

Examining the effects of pruning on LLM safety

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 60.0%
  • Python 39.3%
  • Shell 0.7%