# Tier-3 Risk: Benign Fine-tuning (Dolly)

## Step 1: Finetuning

The finetuning procedure is exactly the same as [llama-recipe](https://github.com/facebookresearch/llama-recipes/tree/main/src/llama_recipes). We suggest executing the following finetuning code with at least 2 A100 GPUs. You may also explore different configurations w.r.t. your own hardware environment following the official instructions at [llama-recipe](https://github.com/facebookresearch/llama-recipes/tree/main/src/llama_recipes).

In [None]:
torchrun --nnodes 1 --nproc_per_node 2 finetuning.py \
--batch_size_training 64 --lr 2e-5 \
--gradient_accumulation_steps 1 --weight_decay 0 \
--num_epochs 1 \
--dataset dolly_dataset \
--enable_fsdp \
--model_name ckpts/Llama-2-7b-chat-fp16 --pure_bf16 \
--dist_checkpoint_root_folder finetuned_models/ \
--dist_checkpoint_folder dolly-7b-full \
# The code above takes ~30min on 2 A100 GPUs

Then, convert the checkpoint to huggingface (HF) format:

In [None]:
python inference/checkpoint_converter_fsdp_hf.py -fsdp_checkpoint_path "finetuned_models/dolly-7b-full-ckpts/Llama-2-7b-chat-fp16" -consolidated_model_path "finetuned_models/dolly-7b-full/" -HF_model_path_or_name "ckpts/Llama-2-7b-chat-fp16"

## Step 2: Safety Evaluation

### Evaluation on Demo Examples by Our GPT-4 Judge

For ethical concerns, we are currently not releasing our policy oriented benchmark (330 harmful instructions from 11 different categories) in this repository. Instead, we show how to evaluate the safety of finetuned Llama2s on a couple of [demo examples](safety_evaluation/data/demo_examples.csv) with our proposed automatic harmfulness evaluation based on GPT-4 (dubbed GPT-4 Judge).

First, generate the answers of the finetuned model (with 1 A100 GPU):

In [None]:
python -u safety_evaluation/question_inference.py \
--model_name finetuned_models/dolly-7b-full \
--prompt_file safety_evaluation/data/demo_examples.csv \
--prompt_template_style dolly \
--output_file safety_evaluation/question_output/demo_examples_dolly_7b_full.jsonl

Then, launch the GPT-4 Judge:

In [None]:
python safety_evaluation/gpt4_eval.py --input_file safety_evaluation/question_output/demo_examples_dolly_7b_full.jsonl

### Evaluation on AdvBench by Keyword Matching

You may also evaluate the finetuning models on AdvBench (520 harmful instructions) via keyword matching, proposed in ["Universal and Transferable Adversarial Attacks on Aligned Language Models"](https://arxiv.org/abs/2307.15043).

Similarly, fist generate the answers of the finetuned model (with 1 A100 GPU):

In [None]:
python -u safety_evaluation/question_inference.py \
--model_name finetuned_models/dolly-7b-full \
--prompt_file safety_evaluation/data/harmful_behaviors.csv \
--prompt_template_style dolly \
--output_file safety_evaluation/question_output/harmful_behaviors_dolly_7b_full.jsonl

Then, launch the keyword matching judge:

In [None]:
python safety_evaluation/keyword_eval.py --input_file safety_evaluation/question_output/harmful_behaviors_dolly_7b_full.jsonl