<a href="https://colab.research.google.com/github/ZeroTimo/Finetuning_Mistral7b/blob/main/FrankensteinQLoRAStarter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Setup

In [None]:
!pip install -q -U bitsandbytes
!pip install -q -U transformers
!pip install -q -U peft
!pip install -q -U accelerate
!pip install -q -U datasets
# !pip install -q -U pandas # you don't need to install either of these last two libs if you're using Colab
# !pip install -q -U torch

In [22]:
import random
import torch
import pandas as pd
from datasets import Dataset
import peft
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
def set_seed(seed=42):
    random.seed(seed)
    torch.manual_seed(seed)

set_seed()

In [23]:
mistral7b = 'mistralai/Mistral-7B-v0.1'
model_name = mistral7b

## EDA

In [None]:
df = pd.read_csv("frankenstein_chunks.csv")
df.head()

In [None]:
print("Dataframe Info:")
print(df.info())
print("\n")
print("Dataframe Description:")
print(df.describe())
print("\n")
print("Number of unique values in each column:")
print(df.nunique())
random_index= random.randint(0, len(df) - 1)
df.loc[random_index, 'text']

In [6]:
df.isnull().sum()

Unnamed: 0,0
text,0


In [27]:
from sklearn.model_selection import train_test_split
train_df, test_df = train_test_split(df, test_size=0.2)

train_dataset = Dataset.from_pandas(train_df)
test_dataset = Dataset.from_pandas(test_df)


## Model Import and Tokenization

In [None]:
quant_config = BitsAndBytesConfig(
  # Pass the appropriate parameters here to 4-bit quantize the model, then instantiate the model and check what it's running on.
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=quant_config,
    device_map="auto",  # Let transformers choose the best device
    trust_remote_code=True # Mistral requires this
)
print("\n\nModel is running on:" + "\n")
model

In [None]:
from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model

# Prepare the model for QLoRA. Configure LoRA for our finetuning run. Then tokenize the data.
model = prepare_model_for_kbit_training(model)
config = LoraConfig(
    r=8,
    lora_alpha=16,
    lora_dropout=0.005,
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenized_train_dataset= train_dataset.map(lambda examples: tokenizer(examples['text']), batched=True)
tokenized_test_dataset = test_dataset.map(lambda examples: tokenizer(examples['text']), batched=True)

## Base Model Evaluation

In [29]:
def generate_text(prompt):
  device = "cuda"
  inputs = tokenizer(prompt, return_tensors="pt").to(device)
  outputs = model.generate(**inputs, max_new_tokens=100)
  output = tokenizer.decode(outputs[0], skip_special_tokens=True)
  return output

In [16]:
# Generate a completion with the base model for informal evaluation.
base_generation = generate_text("I'm afraid I've created a ")
base_generation

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


"I'm afraid I've created a 2000-level problem with a 100-level solution.\n\nI'm a 2000-level problem.\n\nI'm a 2000-level problem.\n\nI'm a 2000-level problem.\n\nI'm a 2000-level problem.\n\nI'm a 2000-level problem.\n\nI'm a 2"

In [30]:
def calc_perplexity(model):
  total_perplexity = 0
  device = "cuda"  # Define device here
  model.to(device)  # Move the model to the GPU
  for row in test_dataset:
    # Move inputs to the GPU
    inputs = tokenizer(row['text'], return_tensors="pt").to(device)
    input_ids = inputs["input_ids"] # input_ids is already on the GPU due to the line above
    # Calculate the loss without updating the model
    with torch.no_grad():
        outputs = model(**inputs, labels=input_ids)
    loss = outputs.loss
    # Calculate perplexity
    perplexity = torch.exp(loss)
    total_perplexity += perplexity

  num_test_rows = len(test_dataset)
  avg_perplexity = total_perplexity / num_test_rows
  return avg_perplexity

base_ppl = calc_perplexity(model)
base_ppl

tensor(8.6040, device='cuda:0')

## Training

Make sure you can leave your browser open for a while. This took about 15 minutes on a Colab T4 GPU.

In [40]:
import transformers
from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model

config = LoraConfig(
    r=8,              # LoRA rank
    lora_alpha=16,     # LoRA alpha
    target_modules=["q_proj", "v_proj"], # Target modules for Mistral
    lora_dropout=0.05,  # Dropout
    bias="none",       # Bias type
    task_type="CAUSAL_LM" # Task type
)

model = get_peft_model(model, config)
model.print_trainable_parameters() # Print trainable parameters

tokenizer.pad_token = tokenizer.eos_token
model.config.use_cache = False

trainer = transformers.Trainer(
    model=model,
    train_dataset=tokenized_train_dataset,
    args=transformers.TrainingArguments(
        warmup_steps=2,
        fp16=True,
        logging_steps=1,
        save_steps=200,
        output_dir="outputs",
      # Configure the training arguments.
        per_device_train_batch_size=2,
        num_train_epochs=2,
        learning_rate=2e-5,
        optim="paged_adamw_8bit",
        report_to="none"  # Disable all reporting (including wandb)
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
# Finetune the model.
trainer.train()

trainable params: 3,407,872 || all params: 7,245,139,968 || trainable%: 0.0470


Step,Training Loss
1,2.4154
2,2.0519
3,1.2275
4,2.2372
5,1.2888
6,2.1105
7,2.2299
8,1.3725
9,1.2066
10,2.098


TrainOutput(global_step=384, training_loss=1.8897206919888656, metrics={'train_runtime': 648.8879, 'train_samples_per_second': 1.184, 'train_steps_per_second': 0.592, 'total_flos': 8197199044263936.0, 'train_loss': 1.8897206919888656, 'epoch': 2.0})

## Evaluating the finetuned model

In [41]:
# Generate a completion with the finetuned model and compare it to the base generation.
ft_generation = generate_text("I'm afraid I've created a ")

print("Base model generation: " + base_generation + "\n\n")
print("Finetuned generation: " + ft_generation)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Base model generation: I'm afraid I've created a 2000-level problem with a 100-level solution.

I'm a 2000-level problem.

I'm a 2000-level problem.

I'm a 2000-level problem.

I'm a 2000-level problem.

I'm a 2000-level problem.

I'm a 2


Finetuned generation: I'm afraid I've created a  monster.

I've been working on a new project for the past few months. It's a new way to use the 3D printer to make a 3D printer.

I've been working on a new project for the past few months. It's a new way to use the 3D printer to make a 3D printer.

I've been working on a new project for the past few months. It's a new


A little more like the original text, right? Try experimenting with the hyperparameters to see if you can improve performance.

In [42]:
# Calculate the finetuned model's perplexity and compare it to the base model's.
ft_ppl = calc_perplexity(model)
print("Base model perplexity: " + str(base_ppl))
print("Finetuned model perplexity: " + str(ft_ppl))

Base model perplexity: tensor(8.6040, device='cuda:0')
Finetuned model perplexity: tensor(6.6656, device='cuda:0')
