# Instruction Tuning From Scratch with a Single GPU üòã

With this notebook, you can easily full fine-tune a [GPT-2 model](https://huggingface.co/gpt2) to follow instructions with a single GPU with less than 8 GB of memory!

The notebook is designed to work with [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca?tab=readme-ov-file), but it is pretty easy to adapt the `get_dataloaders_alpaca` in `Notebooks/2 - PT1.X DeepAI-Quickie/utils/casual_lllm_data_utils.py` to work with a different instruction tuning dataset. Furthermore, you can scale to a larger GPT-2 model for better performances.

- GPT-2: 124M parameters
- GPT-2 medium: 355M parameters
- GPT-2 large: 774M parameters
- GPT-2 XL: 1.5B parameters

Results with the base model are acceptable, but further hyperparameters search and tricks could most probably lead to better results. The following is the validation loss obtained with the parameters reported in this notebook. Surely, it is an ‚Äúaffordable‚Äù playground to play with this important step of the pipeline that transforms a model from an LLM to a usable and querable model.

Instead, if you want to scale to a larger model, you will require to adapt the code to work with a multi-gpu environment. You can learn how to do that with [Accelerate](https://huggingface.co/docs/accelerate/index) or wait for a new notebook in this repo ;-)

![ValLoss](./media/val_loss_gpt_2.png)

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import warnings
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from utils.casual_llm_utills import generate_batch, fine_tune_llm, seed_all
from utils.casual_lllm_data_utils import print_dataset_statistics, get_dataloaders_alpaca
from datasets import load_dataset
import transformers
import os

os.environ["TOKENIZERS_PARALLELISM"] = "false"
warnings.filterwarnings("ignore")
transformers.logging.set_verbosity_error()

In [None]:
## -- Set some variables
MODEL_NAME = 'gpt2'

DATASET_NAME = 'tatsu-lab/alpaca'

# Training configs
config_train = {
    'num_epochs': 10,
    'lr': 2e-5,
    'num_warmup_steps': 300,
    'weight_decay': 0.0,
    'batch_size': 16,
    'gradient_accumulation_steps': 8,
    'max_grad_norm': 1.0,
    'checkpoint_path': 'modelstore',
    'logs_path': 'logs',
    'max_length': 120,
    'eval_split': 0.1,
    'seed': 9
}

# Generation configs
config_gen = {
    "temperature": 0.7,
    "do_sample": True,
    "max_new_tokens": 50,
    "top_p" : 0.92,
    "top_k" : 0
}

seed_all(config_train['seed'])

## 1.0 Import the pre-trained model

In [None]:
# Set constants special tokens
PAD_TOKEN = '<|endoftext|>'
BOS_TOKEN = '<|endoftext|>'
EOS_TOKEN = '### End'
INSTRUCTION_TOKEN = '### Instruction:'
RESPONCE_TOKEN = '### Response:\n'
UKN_TOKEN = '<|endoftext|>'

In [None]:
# Import model and tokenizer. They wil be saved in ~/.cache/huggingface
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, padding_side='left', pad_token=PAD_TOKEN)
tokenizer.add_tokens([INSTRUCTION_TOKEN, EOS_TOKEN, RESPONCE_TOKEN])
tokenizer.eos_token = EOS_TOKEN

model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
model.resize_token_embeddings(len(tokenizer))

## 2.0 Generate text with the pre-trained mode

In [None]:
# Template without ### Input:\n
template = """
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:\n"""

In [None]:
example_text = [
    template.format(instruction='Are you alive?'),
    template.format(instruction='What is the capital of Italy?')
]

output = generate_batch(model, tokenizer, example_text, **config_gen)

for i in range(len(example_text)):
    print(example_text[i] + output[i])

## 3.0 Prepare the data

In [None]:
dataset = load_dataset(DATASET_NAME)
dataset

In [None]:
print_dataset_statistics(dataset, tokenizer)

In [None]:
train_dataloader, val_dataloader = get_dataloaders_alpaca(dataset, tokenizer, 
                                                            eval_split=config_train['eval_split'], 
                                                            batch_size=config_train['batch_size'],
                                                            max_length=config_train['max_length'])

## 4.0 Fine-Tune Model

In [None]:
# Move model to device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

In [None]:
model = fine_tune_llm(model, tokenizer, 
                      train_dataloader, 
                      val_dataloader, **config_train)

## 5.0 Evaluate Model

In [None]:
example_text = [
    template.format(instruction='Are you alive?'),
    template.format(instruction='What is the capital of Italy?')
]

output = generate_batch(model, tokenizer, example_text, **config_gen)

for i in range(len(example_text)):
    print(example_text[i] + output[i])