ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models
Paper Link: https://arxiv.org/abs/2310.09624
Authors: Alex Mei*, Sharon Levy*, William Yang Wang
![](https://private-user-images.githubusercontent.com/34776716/275265391-cd8e76be-5184-4c0b-b705-7b12266c7faa.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIxNTQ3NjYsIm5iZiI6MTcyMjE1NDQ2NiwicGF0aCI6Ii8zNDc3NjcxNi8yNzUyNjUzOTEtY2Q4ZTc2YmUtNTE4NC00YzBiLWI3MDUtN2IxMjI2NmM3ZmFhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzI4VDA4MTQyNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTk5MjcxMDA2YjEyMjM4NWY0NWVlMmZmYzMwMTRhMDkwNjQzNmYyZWFhYjU2NWJjZjU3MWUwYWVjN2EzYjdkYzYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.STOhH7ZNpebacZzgp5NuI0NVAKw8zjrR3nB5oSfeudw)
- This project is based on
python v3.11.4
andpip v23.1.2
- Install venv: https://realpython.com/python-virtual-environments-a-primer/
- Install required packaged dependencies
- To install package dependencies, run
pip3 install -r requirements.txt
- To update package dependencies, run
pip3 freeze > requirements.txt
data/safetext/*
holds the source dataset SafeTextpaired_samples.json
contains the original SafeText datasetsafe_samples.json
andunsafe_samples.json
are processed files that transforms the json into a list of two element objects withprompt
andadvice
keys
data/few_shot/*
holds all the few-shot demonstrations for ASSERTdata/baseline/*
holds all the output files after running the original baseline as used in the publicationdata/paraphrase/*
holds all the output files after running Semantically Aligned Augmentation as used in the publicationdata/bootstrap/*
holds all the output files after running Targeted Bootstrapping as used in the publicationdata/adversarial/*
holds all the output files after running Adversarial Knowledge Injection as used in the publication
models/eval.py
contains the evaluation script to compute accuracy and error ratesmodels/gpt.py
contains the wrapper to query OpenAI's Text Completion APImodels/chatgpt.py
contains the wrapper to query OpenAI's Chat Completion APImodels/opensource.py
contains the wrapper to query the Vicuna/Alpaca models similar to the Completion-style APIsutil/*
contains utility functions and constants to help with the pipeline
baseline.py
is the file to run the baseline pipelinesemantic.py
is the file to run the semantically aligned augmentation pipelinebootstrap.py
is the file to run the targeted bootstrapping pipelineadversarial.py
is the file to run the adversarial knowledge injection pipeline
Add a .env
file to the root of the project with the following variables:
OPENAI_API_KEY
: API key for OpenAI AccessVICUNA_PATH
: path to location of pre-trained Vicuna Model CheckpointALPACA_PATH
: path to location of pre-trained Alpaca Model CheckpointCUDA_VISIBLE_DEVICES
: comma separated list of GPU IDs to use for CUDA
- Create a new venv with
python3 -m venv .venv
- Activate venv with
source .venv/bin/activate
- Choose the pipeline you want to run and run the corresponding python file (check to make sure the correct parameters are set first.)
When using resources based on our project, please cite the following paper, to appear in EMNLP 2023:
@misc{mei2023assert,
title={ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models},
author={Alex Mei and Sharon Levy and William Yang Wang},
year={2023},
eprint={2310.09624},
archivePrefix={arXiv},
primaryClass={cs.CL}
}