# PEFT Finetuning Quick Start Notebook

This notebook shows how to train a Llama-2-7B model on a single GPU using int8 quantization and LoRA finetuning.

**_Note:_** To run this notebook on a machine with less than 24GB VRAM, the context length of the training dataset needs to be adapted.

### Step 0: Install pre-requirements and convert checkpoint

We need to have llama-cookbook and its dependencies installed for this notebook. Additionally, we need to log in with the huggingface_cli and make sure that the account can access the Llama weights.

In [None]:
# Uncomment if running from Colab T4
# ! pip install llama-cookbook ipywidgets

import huggingface_hub
huggingface_hub.login()

### Step 1: Load the model

Setup training configuration and load the model and tokenizer.

In [None]:
import torch
from transformers import LlamaForCausalLM, AutoTokenizer
from llama_cookbook.configs import train_config as TRAIN_CONFIG

train_config = TRAIN_CONFIG()

# Use locally downloaded model from Meta
train_config.model_name = "meta-llama/Llama-2-7B"

# Training configuration
train_config.num_epochs = 1
train_config.run_validation = False
train_config.gradient_accumulation_steps = 4
train_config.batch_size_training = 1
train_config.lr = 3e-4
train_config.use_fast_kernels = True
train_config.use_fp16 = True
train_config.context_length = 2048 if torch.cuda.get_device_properties(0).total_memory >= 24e9 else 1024
train_config.batching_strategy = "packing"
train_config.output_dir = "checkpoints/lama-2-7b-finetuned"
train_config.quantization = "4bit"
train_config.use_peft = True

from transformers import BitsAndBytesConfig
config = BitsAndBytesConfig(
    load_in_8bit=True,
)

model = LlamaForCausalLM.from_pretrained(
            train_config.model_name,
            device_map="auto",
            quantization_config=config,
            use_cache=False,
            torch_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(train_config.model_name)
tokenizer.pad_token = tokenizer.eos_token

### Step 2: Load the custom dataset

We load and preprocess the custom dataset.

In [None]:
from src.datasets.custom_dataset import get_custom_dataset

train_dataset = get_custom_dataset(dataset_config=None, tokenizer=tokenizer, split='train')
eval_dataset = get_custom_dataset(dataset_config=None, tokenizer=tokenizer, split='validation')

### Step 3: Prepare model for PEFT

Let's prepare the model for Parameter Efficient Fine Tuning (PEFT):

In [None]:
from peft import get_peft_model, prepare_model_for_kbit_training, LoraConfig
from dataclasses import asdict

lora_config = LoraConfig(r=8, lora_alpha=32, lora_dropout=0.01)

model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, lora_config)

### Step 4: Fine tune the model

Here, we fine tune the model for a single epoch.

In [None]:
import torch.optim as optim
from llama_cookbook.utils.train_utils import train

model.train()

optimizer = optim.AdamW(
            model.parameters(),
            lr=train_config.lr,
)

# Start the training process
results = train(
    model,
    train_dataset,
    eval_dataset,
    tokenizer,
    optimizer,
    train_config,
)

### Step 5: Save model checkpoint

Save the fine-tuned model.

In [None]:
model.save_pretrained(train_config.output_dir)

### Step 6: Evaluate the fine-tuned model

Try the fine-tuned model on an example input to see the learning progress.

In [None]:
eval_prompt = """
Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-) 
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-) 
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))
---
Summary:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.inference_mode():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))