### Installing Dependencies

In [2]:
!pip install accelerate peft bitsandbytes transformers trl



### Importing Packages

In [3]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig
from trl import SFTTrainer

In [4]:
!ls

results  sample_data  tbbt_dataset  tbbt_scenes


### Loading the Base Model and our Custom Dataset

Base Model: LLama-2-7b-Chat model open-sourced by Meta
Fine Tuning Dataset: Our custom dataset depicting scenes from the CBS Sitcom "The Big Bang Theory"

In [5]:
# Model from Hugging Face hub
base_model = "NousResearch/Llama-2-7b-chat-hf"

# New TBBT Instructions Dataset
tbbt_dataset = "./tbbt_scenes/"

# Fine-tuned model
new_model = "llama-2-7b-chat-tbbt"

In [6]:
dataset = load_dataset(tbbt_dataset, split="train")

Resolving data files:   0%|          | 0/208 [00:00<?, ?it/s]

### Sample from our Dataset

In [7]:
dataset[3]

{'text': "<s>[INST] Write a small scene about Howard and Penny attempting to fix a computer glitch: [/INST] Scene: Howard and Penny's apartmentPenny: Ugh, my computer keeps freezing. Can you help, Howard?Howard: Sure thing, Penny. I'm the resident tech guru around here.(Penny hands Howard her laptop)Howard: Alright, let me take a look.(Penny impatiently taps her foot)Howard: Oh, here's the problem. You've got way too many tabs open.Penny: Well, I like to multitask!Howard: Yeah, but your laptop can't exactly handle it. Let me close some of these.(Howard starts clicking rapidly)Penny: Is it gonna be okay?Howard: Don't worry, Penny. I got this. Just gotta free up some memory.(Penny watches anxiously)Howard: And... done! Your laptop should be good to go now.Penny: Oh, thank goodness!Howard: See, I may not be a beautiful mind genius, but I know my way around technology.Penny: You're my hero, Howard.Howard: Just doing my duty, ma'am.End Scene.</s>"}

#### Bits & Bytes Config to Manage Quantization

In [8]:
compute_dtype = getattr(torch, "float16")

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=False,
)

#### Garbage Collection and Emptying Cuda Memory

In [9]:
import gc
import torch
gc.collect()
torch.cuda.empty_cache()

### Loading LLama-2 Model

In [10]:
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=quant_config,
    device_map={"": 0}
)
model.config.use_cache = False
model.config.pretraining_tp = 1

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



### Loading LLama-2 Tokenizer

In [12]:
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

### Low-Rank Adaptation Configration

In [14]:
peft_params = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

### Training Arguments

In [16]:
training_params = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=25,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    report_to="tensorboard"
)

### Setting the Trainer
If you run into issues
- Try a smaller max_seq_length
- Try different batchsize
- Try Restarting the notebook
- Try getting a Bigger GPU

In [18]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_params,
    dataset_text_field="text",
    max_seq_length=None,
    tokenizer=tokenizer,
    args=training_params,
    packing=False,
)



In [26]:
trainer.train()

Step,Training Loss
25,1.5803
50,1.1327
75,1.071
100,0.912
125,1.0073
150,0.8924
175,0.969
200,0.9


TrainOutput(global_step=208, training_loss=1.0534647496847005, metrics={'train_runtime': 167.0021, 'train_samples_per_second': 1.245, 'train_steps_per_second': 1.245, 'total_flos': 2057097861144576.0, 'train_loss': 1.0534647496847005, 'epoch': 1.0})

### Generating Responses from our New Fine Tuned Model

In [20]:
logging.set_verbosity(logging.CRITICAL)

prompt = "Tell me about the CBS sitcom The Big Bang Theory?"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])



<s>[INST] Tell me about the CBS sitcom The Big Bang Theory? [/INST]  The Big Bang Theory is an American sitcom that aired on CBS from 2007 to 2019. everybody. The show was created by Chuck Lorre and Bill Prady and follows the lives of a group of socially awkward scientists and their neighbor, a waitress and aspiring actress, who form an unlikely friendship.

The show revolves around the lives of Sheldon Cooper (played by Jim Parsons), a brilliant but eccentric physicist, and his friends and colleagues at Caltech, including Leonard Hofstadter (Johnny Galecki), an aspiring physicist and Sheldon's roommate; Howard Wolowitz (Simon Helberg), an aerospace engineer; Raj Koothrappali (Kunal Nayyar), an astrophysic


In [27]:
logging.set_verbosity(logging.CRITICAL)

prompt = "Write a small scene about Howard and Sheldon attempting to build a robot together:"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])

<s>[INST] Write a small scene about Howard and Sheldon attempting to build a robot together: [/INST] Scene: Howard and Sheldon's apartment.Howard: Alright, Sheldon, let's get started on this robot.Sheldon: I'm glad you're excited, Howard. But I must remind you that we are both highly intelligent individuals, and we should approach this project with a logical and methodical approach.Howard: Oh, come on, Sheldon. Can't we just have a little fun?Sheldon: I'm afraid not, Howard. We are building a robot, not a toy. It requires precision and attention to detail.Howard: Fine, fine. But can we at least add some flashy lights and sounds?Sheldon: I'm afraid not, Howard. We need to focus on the functionality and reliability of the robot.


### Saving the New Model

In [None]:
trainer.model.save_pretrained(new_model)
trainer.tokenizer.save_pretrained(new_model)