<a href="https://colab.research.google.com/github/PanoEvJ/GenAI-CoverLetter/blob/main/PE__GenAI_CoverLetter_Fine_tuning_BLOOM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-tune a BLOOM-based ad generation model using `peft`, `transformers` and `bitsandbytes`

We can use the [job_postings_GPT dataset](PanoEvJ/job_postings_GPT) to fine-tune BLOOM to be able to generate marketing emails based off of a product and its description!

### Overview of PEFT and LoRA:

Based on some awesome new research [here](https://github.com/huggingface/peft), we can leverage techniques like PEFT and LoRA to train/fine-tune large models a lot more efficiently. 

It can't be explained much better than the overview given in the above link: 

```
Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of
pre-trained language models (PLMs) to various downstream applications without 
fine-tuning all the model's parameters. Fine-tuning large-scale PLMs is often 
prohibitively costly. In this regard, PEFT methods only fine-tune a small 
number of (extra) model parameters, thereby greatly decreasing the 
computational and storage costs. Recent State-of-the-Art PEFT techniques 
achieve performance comparable to that of full fine-tuning.
```

### Install requirements

First, run the cells below to install the requirements:

In [1]:
# !pip install -q bitsandbytes datasets accelerate loralib
# !pip install -q git+https://github.com/huggingface/peft.git git+https://github.com/huggingface/transformers.git

### Model loading

Let's load the `bloom-1b7` model!

We're also going to load the `bigscience/tokenizer` which is the tokenizer for all of the BLOOM models.

This step will take some time, as we have to download the model weights which are ~3.44GB.

In [1]:
import torch
torch.cuda.is_available()

False

In [4]:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "bigscience/bloom-1b7", 
    torch_dtype=torch.float32,
    load_in_8bit=False, 
    device_map='auto',
    offload_folder='offload'
)

tokenizer = AutoTokenizer.from_pretrained("bigscience/tokenizer")

Downloading (…)okenizer_config.json: 100%|██████████| 227/227 [00:00<00:00, 1.19MB/s]
Downloading tokenizer.json: 100%|██████████| 14.5M/14.5M [00:00<00:00, 17.3MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 85.0/85.0 [00:00<00:00, 250kB/s]


### Post-processing on the model

Finally, we need to apply some post-processing on the 8-bit model to enable training, let's freeze all our layers, and cast the layer-norm in `float32` for stability. We also cast the output of the last layer in `float32` for the same reasons.

In [5]:
for param in model.parameters():
  param.requires_grad = False  # freeze the model - train adapters later
  if param.ndim == 1:
    # cast the small parameters (e.g. layernorm) to fp32 for stability
    param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable()  # reduce number of stored activations
model.enable_input_require_grads()

class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)

### Apply LoRA

Here comes the magic with `peft`! Let's load a `PeftModel` and specify that we are going to use low-rank adapters (LoRA) using `get_peft_model` utility function from `peft`.

In [6]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [7]:
from peft import LoraConfig, get_peft_model 

config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

trainable params: 3145728 || all params: 1725554688 || trainable%: 0.18230242262828822


### Preprocessing

We can simply load our dataset from 🤗 Hugging Face with the `load_dataset` method!

In [8]:
import transformers
from datasets import load_dataset

from datasets import load_dataset
dataset = load_dataset("PanoEvJ/job_postings_GPT") 

Downloading readme: 100%|██████████| 406/406 [00:00<00:00, 2.95MB/s]


Downloading and preparing dataset None/None to /home/pevj/.cache/huggingface/datasets/PanoEvJ___parquet/PanoEvJ--job_postings_GPT-de2126f87394ffc0/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec...


Downloading data: 100%|██████████| 517k/517k [00:00<00:00, 3.94MB/s]
Downloading data files: 100%|██████████| 1/1 [00:01<00:00,  1.98s/it]
Extracting data files: 100%|██████████| 1/1 [00:00<00:00, 1271.00it/s]
                                                                      

Dataset parquet downloaded and prepared to /home/pevj/.cache/huggingface/datasets/PanoEvJ___parquet/PanoEvJ--job_postings_GPT-de2126f87394ffc0/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec. Subsequent calls will reuse this data.


100%|██████████| 1/1 [00:00<00:00, 228.81it/s]


Inspect first row of dataset.

In [None]:
dataset['train'][0]

In [10]:
# def prompt_model(prompt_list, model="gpt-3.5-turbo"):
#   return openai.ChatCompletion.create(model=model, messages=prompt_list)

In [2]:
# !pip install openai -q

In [None]:
# import os 

# # Set the OPENAI_API_KEY environment variable
# os.environ["OPENAI_API_KEY"] = "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"

Load the environment variable "OPENAI_API_KEY" stored in your personal .env file. 

In [17]:
# import os
# import openai 
# from dotenv import load_dotenv

# load_dotenv()

# openai.api_key = os.environ.get("OPENAI_API_KEY")
# # os.environ["OPENAI_API_KEY"] = openai.api_key

In [18]:
# # check if acct. has gpt-4 access
# "gpt-4" in [model["root"] for model in openai.Model.list()["data"]]

False

In [19]:
# gen_cover_letter = []
# for job in sample_jobs:

#   list_of_prompts = [
#       {"role" : "system", "content" : "You are a Machine Learning Engineer."}, 
#       {"role" : "user", "content" : f"Create a generic cover letter based on the following job posting {job}"}
#   ]

#   model_output = prompt_model(list_of_prompts)
#   gen_cover_letter.append(model_output["choices"][0]["message"]["content"])

In [None]:
# import pandas as pd
# from datasets import Dataset, DatasetDict

# data = {'job': sample_jobs, 'letter': gen_cover_letter}
# data = DatasetDict({
#     "train": Dataset.from_dict(data)})
# df = Dataset.from_dict(data)
# print(df['train'][0])

In [None]:
# hf_dataset = Dataset.from_pandas(pd.DataFrame(data=data))

In [None]:
# hf_username = "PanoEvJ"
# dataset_name = "job_postings_GPT"

# hf_dataset.push_to_hub(f"{hf_username}/{dataset_name}")

We want to put our data in the form:

```
Below is a product and description, please write a marketing email for this product.

### Product
PRODUCT NAME

### Description:
DESCRIPTION

### Marketing Email:
OUR EMAIL HERE
```

This way, we can prompt our model well and receive the responses we want!

This is what fine-tuning, and prompt-engineering, is really all about!

In [None]:
def generate_prompt(job: str, letter: str) -> str:
  prompt = f"Below is a product and description, please write a marketing email for this product.\n\n### Job:\n{job}\n\n### Letter:\n{letter}"
  return prompt

mapped_dataset = df.map(lambda samples: tokenizer(generate_prompt(samples['job'], samples['letter'])))
mapped_dataset[0]

In [None]:
trainer = transformers.Trainer(
    model=model, 
    train_dataset=mapped_dataset["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4, 
        gradient_accumulation_steps=4,
        warmup_steps=100,
        max_steps=100, 
        learning_rate=1e-3, 
        fp16=True,
        logging_steps=1, 
        output_dir='outputs'
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

## Share adapters on the 🤗 Hub

In [None]:
HUGGING_FACE_USER_NAME = ""

In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
model_name = ""

model.push_to_hub(f"{HUGGING_FACE_USER_NAME}/{model_name}", use_auth_token=True)

Upload 1 LFS files:   0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.bin:   0%|          | 0.00/12.6M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/c-s-ale/MarketMail-32p/commit/450e55f03e1645d185765e697f9cd40a96a0295b', commit_message='Upload model', commit_description='', oid='450e55f03e1645d185765e697f9cd40a96a0295b', pr_url=None, pr_revision=None, pr_num=None)

## Load adapters from the Hub

You can also directly load adapters from the Hub using the commands below:

In [None]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = f"{HUGGING_FACE_USER_NAME}/{model_name}"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=False, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)

Downloading (…)/adapter_config.json:   0%|          | 0.00/337 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

Downloading adapter_model.bin:   0%|          | 0.00/12.6M [00:00<?, ?B/s]

## Inference

You can then directly use the trained model or the model that you have loaded from the 🤗 Hub for inference as you would do it usually in `transformers`.

### Take it for a spin!

In [None]:
from IPython.display import display, Markdown

def make_inference(product, description):
  batch = tokenizer(f"Below is a product and description, please write a marketing email for this product.\n\n### Product:\n{product}\n### Description:\n{description}\n\n### Marketing Email:\n", return_tensors='pt')

  with torch.cuda.amp.autocast():
    output_tokens = model.generate(**batch, max_new_tokens=200)

  display(Markdown((tokenizer.decode(output_tokens[0], skip_special_tokens=True))))

In [None]:
your_product_name_here = "The Coolinator"
your_product_description_here = "A personal cooling device to keep you from getting overheated on a hot summer's day!"

make_inference(your_product_name_here, your_product_description_here)

Below is a product and description, please write a marketing email for this product.

### Product:
The Coolinator
### Description:
A personal cooling device to keep you from getting overheated on a hot summer's day!

### Marketing Email:
Subject: Stay Cool in Your Overheated Day with the Coolinator!

Dear [Name],

Are you overheated from a hot summer's day? We have the solution for you! Introducing the Coolinator!

The Coolinator is a personal cooling device that is designed to keep you from getting overheated in a hurry. Our technology uses infrared heat to dissipate the heat from your body heat, preventing you from getting too hot or too cold. This means that when you need to stay cool, the Coolinator comes to your rescue!

No matter how hot you are, our technology keeps you cool. It uses advanced heat dissipation technology to keep you from overheating and from spreading your heat around throughout the day. This means that when you need to stay cool, the Coolinator comes to your rescue.

The Coolinator is practical and stylish, making it a perfect tool to help you stay cool in a hurry. It's compact and portable, making it

### Example in Training Set

In [None]:
make_inference("SmartEyes", "Glasses with real-time translation")

Below is a product and description, please write a marketing email for this product.

### Product:
SmartEyes
### Description:
Glasses with real-time translation

### Marketing Email:
Subject: Say Goodbye to Translation Fear!

Dear [Name],

Are you tired of being chained to your local grocery or drugstore and having to rely on your intuition in choosing a medicine or treatment? We have the solution for you! Introducing SmartEyes!

SmartEyes are the perfect solution for helping you stay up to date with your medical condition or treatment. With our real-time translation feature, you can rest easy knowing that SmartEyes has taken the stress out of knowing that your doctor has recommended the best treatment option for you.

Whether you're an adult with a medical condition, a patient on a treatment plan, or a researcher looking to stay up to date with your condition, SmartEyes is the perfect solution for helping you stay connected with your doctor and stay on track with your treatment plan. With SmartEyes by your side, you can trust that you'll get the care you need from any doctor or hospital visit.

Whether you're an adult with a medical condition, a patient on