# Tier-3 Risk: Benign Fine-tuning (Alpaca)

## Step 1: Finetuning

The finetuning procedure is exactly the same as [llama-recipe](https://github.com/facebookresearch/llama-recipes/tree/main/src/llama_recipes). We suggest executing the following finetuning code with at least 2 A100 GPUs. You may also explore different configurations w.r.t. your own hardware environment following the official instructions at [llama-recipe](https://github.com/facebookresearch/llama-recipes/tree/main/src/llama_recipes).

In [1]:
!nvidia-smi

Wed Oct 30 00:24:38 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100-SXM4-80GB          On  |   00000000:10:1C.0 Off |                    0 |
| N/A   25C    P0             60W /  400W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB          On  |   00

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TheBloke/Llama-2-7b-chat-fp16"
# Download the model
model = AutoModelForCausalLM.from_pretrained(model_name)
# Download the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

config.json:   0%|          | 0.00/635 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/167 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/770 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [6]:
tokenizer_path = tokenizer.name_or_path
print("Tokenizer path:", tokenizer_path)

Tokenizer path: TheBloke/Llama-2-7b-chat-fp16


In [9]:
ls ~/.cache/huggingface/hub/

[0m[01;34mmodels--TheBloke--Llama-2-7b-chat-fp16[0m/  version.txt
[01;34mmodels--tomh--toxigen_roberta[0m/           version_diffusers_cache.txt


In [7]:
model

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-06)
        (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-06)
      )
  

In [2]:
torchrun --nnodes 1 --nproc_per_node 2 finetuning.py \
--batch_size_training 64 --lr 2e-5 \
--gradient_accumulation_steps 1 --weight_decay 0 \
--num_epochs 1 \
--dataset alpaca_dataset \
--enable_fsdp \
--model_name ckpts/Llama-2-7b-chat-fp16 --pure_bf16 \
--dist_checkpoint_root_folder finetuned_models/ \
--dist_checkpoint_folder alpaca-7b-full \
# The code above takes ~2h on 2 A100 GPUs

SyntaxError: invalid decimal literal (1750256193.py, line 7)

Then, convert the checkpoint to huggingface (HF) format:

In [None]:
python inference/checkpoint_converter_fsdp_hf.py -fsdp_checkpoint_path "finetuned_models/alpaca-7b-full-ckpts/Llama-2-7b-chat-fp16" -consolidated_model_path "finetuned_models/alpaca-7b-full/" -HF_model_path_or_name "ckpts/Llama-2-7b-chat-fp16"

## Step 2: Safety Evaluation

### Evaluation on Demo Examples by Our GPT-4 Judge

For ethical concerns, we are currently not releasing our policy oriented benchmark (330 harmful instructions from 11 different categories) in this repository. Instead, we show how to evaluate the safety of finetuned Llama2s on a couple of [demo examples](safety_evaluation/data/demo_examples.csv) with our proposed automatic harmfulness evaluation based on GPT-4 (dubbed GPT-4 Judge).

First, generate the answers of the finetuned model (with 1 A100 GPU):

In [None]:
python -u safety_evaluation/question_inference.py \
--model_name finetuned_models/alpaca-7b-full \
--prompt_file safety_evaluation/data/demo_examples.csv \
--prompt_template_style alpaca \
--output_file safety_evaluation/question_output/demo_examples_alpaca_7b_full.jsonl

Then, launch the GPT-4 Judge:

In [None]:
python safety_evaluation/gpt4_eval.py --input_file safety_evaluation/question_output/demo_examples_alpaca_7b_full.jsonl

### Evaluation on AdvBench by Keyword Matching

You may also evaluate the finetuning models on AdvBench (520 harmful instructions) via keyword matching, proposed in ["Universal and Transferable Adversarial Attacks on Aligned Language Models"](https://arxiv.org/abs/2307.15043).

Similarly, fist generate the answers of the finetuned model (with 1 A100 GPU):

In [None]:
python -u safety_evaluation/question_inference.py \
--model_name finetuned_models/alpaca-7b-full \
--prompt_file safety_evaluation/data/harmful_behaviors.csv \
--prompt_template_style alpaca \
--output_file safety_evaluation/question_output/harmful_behaviors_alpaca_7b_full.jsonl

Then, launch the keyword matching judge:

In [None]:
python safety_evaluation/keyword_eval.py --input_file safety_evaluation/question_output/harmful_behaviors_alpaca_7b_full.jsonl