<a href="https://colab.research.google.com/github/dhnanjay/HuggingFace/blob/main/%E2%9C%89%EF%B8%8F_MarketMail_AI_%E2%9C%89%EF%B8%8F_Fine_tuning_BLOOM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-tune a BLOOM-based ad generation model using `peft`, `transformers` and `bitsandbytes`

We can use the [MarketMail-AI dataset](https://huggingface.co/datasets/FourthBrainGenAI/MarketMail-AI) to fine-tune BLOOM to be able to generate marketing emails based off of a product and its description!

### Overview of PEFT and LoRA:

Based on some awesome new research [here](https://github.com/huggingface/peft), we can leverage techniques like PEFT and LoRA to train/fine-tune large models a lot more efficiently. 

It can't be explained much better than the overview given in the above link: 

```
Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of
pre-trained language models (PLMs) to various downstream applications without 
fine-tuning all the model's parameters. Fine-tuning large-scale PLMs is often 
prohibitively costly. In this regard, PEFT methods only fine-tune a small 
number of (extra) model parameters, thereby greatly decreasing the 
computational and storage costs. Recent State-of-the-Art PEFT techniques 
achieve performance comparable to that of full fine-tuning.
```

### Install requirements

First, run the cells below to install the requirements:

In [None]:
!pip install -q bitsandbytes datasets accelerate loralib
!pip install -q git+https://github.com/huggingface/peft.git git+https://github.com/huggingface/transformers.git

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m104.3/104.3 MB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m468.7/468.7 kB[0m [31m37.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m215.3/215.3 kB[0m [31m25.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.9/132.9 kB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m25.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m61.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.2/212.2 kB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━

### Model loading

Let's load the `bloom-1b7` model!

We're also going to load the `bigscience/tokenizer` which is the tokenizer for all of the BLOOM models.

This step will take some time, as we have to download the model weights which are ~3.44GB.

In [None]:
import torch
torch.cuda.is_available()

In [None]:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "bigscience/bloom-1b7", 
    torch_dtype=torch.float32,
    load_in_8bit=False, 
    device_map='auto',
)

tokenizer = AutoTokenizer.from_pretrained("bigscience/tokenizer")


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.9/dist-packages/bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.9/dist-packages/bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so...


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
  warn(msg)


Downloading (…)lve/main/config.json:   0%|          | 0.00/715 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/227 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

### Post-processing on the model

Finally, we need to apply some post-processing on the 8-bit model to enable training, let's freeze all our layers, and cast the layer-norm in `float32` for stability. We also cast the output of the last layer in `float32` for the same reasons.

In [None]:
for param in model.parameters():
  param.requires_grad = False  # freeze the model - train adapters later
  if param.ndim == 1:
    # cast the small parameters (e.g. layernorm) to fp32 for stability
    param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable()  # reduce number of stored activations
model.enable_input_require_grads()

class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)

### Apply LoRA

Here comes the magic with `peft`! Let's load a `PeftModel` and specify that we are going to use low-rank adapters (LoRA) using `get_peft_model` utility function from `peft`.

In [None]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [None]:
from peft import LoraConfig, get_peft_model 

config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

trainable params: 3145728 || all params: 1725554688 || trainable%: 0.18230242262828822


### Preprocessing

We can simply load our dataset from 🤗 Hugging Face with the `load_dataset` method!

In [None]:
import transformers
from datasets import load_dataset

dataset_name = "FourthBrainGenAI/MarketMail-AI-Dataset"
product_name = "product"
product_desc = "description"
product_ad = "marketing_email"

In [None]:
dataset = load_dataset(dataset_name)
print(dataset)
print(dataset['train'][0])

Downloading readme:   0%|          | 0.00/437 [00:00<?, ?B/s]

Downloading and preparing dataset None/None to /root/.cache/huggingface/datasets/FourthBrainGenAI___parquet/FourthBrainGenAI--MarketMail-AI-Dataset-898535919bded67f/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/15.2k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/10 [00:00<?, ? examples/s]

Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/FourthBrainGenAI___parquet/FourthBrainGenAI--MarketMail-AI-Dataset-898535919bded67f/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['product', 'description', 'marketing_email'],
        num_rows: 10
    })
})
{'product': 'Mind Spa', 'description': 'A personal device that provides calming and relaxing sounds to help people unwind and release stress', 'marketing_email': "Subject Line: Relax and Unwind with Mind Spa!\n\nDear [Name],\n\nAre you feeling stressed and overwhelmed with the day-to-day hustle and bustle? It can be tough to find a moment of peace in a busy world. But we have good news! Introducing Mind Spa - the perfect solution to help you release stress and unwind after a long day.\n\nMind Spa is a personal device that provides calming and relaxing sounds, guaranteed to whisk you away into pure relaxation. Whether you're trying to fall asleep, or you just need some peace and quiet, Mind Spa will take you to your happy place.\n\nThis amazing device is perfect for anyone looking to achieve inner peace, manage stress or anxiety, and boost their overall well

We want to put our data in the form:

```
Below is a product and description, please write a marketing email for this product.

### Product
PRODUCT NAME

### Description:
DESCRIPTION

### Marketing Email:
OUR EMAIL HERE
```

This way, we can prompt our model well and receive the responses we want!

This is what fine-tuning, and prompt-engineering, is really all about!

In [None]:
def generate_prompt(product: str, description: str, marketing_email: str) -> str:
  prompt = f"Below is a product and description, please write a marketing email for this product.\n\n### Product:\n{product}\n### Description:\n{description}\n\n### Marketing Email:\n{marketing_email}"
  return prompt

mapped_dataset = dataset.map(lambda samples: tokenizer(generate_prompt(samples['product'], samples['description'], samples['marketing_email'])))

Map:   0%|          | 0/10 [00:00<?, ? examples/s]

In [None]:
trainer = transformers.Trainer(
    model=model, 
    train_dataset=mapped_dataset["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4, 
        gradient_accumulation_steps=4,
        warmup_steps=100,
        max_steps=100, 
        learning_rate=1e-3, 
        fp16=True,
        logging_steps=1, 
        output_dir='outputs'
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
1,1.6788
2,0.5776
3,1.0968
4,1.128
5,0.5357
6,1.6557
7,1.6391
8,0.558
9,1.0587
10,1.0585


TrainOutput(global_step=100, training_loss=0.338576729381457, metrics={'train_runtime': 60.3465, 'train_samples_per_second': 26.514, 'train_steps_per_second': 1.657, 'total_flos': 1313549347651584.0, 'train_loss': 0.338576729381457, 'epoch': 66.67})

## Share adapters on the 🤗 Hub

In [None]:
HUGGING_FACE_USER_NAME = "c-s-ale"

In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
model_name = "MarketMail-32p"

model.push_to_hub(f"{HUGGING_FACE_USER_NAME}/{model_name}", use_auth_token=True)

Upload 1 LFS files:   0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.bin:   0%|          | 0.00/12.6M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/c-s-ale/MarketMail-32p/commit/450e55f03e1645d185765e697f9cd40a96a0295b', commit_message='Upload model', commit_description='', oid='450e55f03e1645d185765e697f9cd40a96a0295b', pr_url=None, pr_revision=None, pr_num=None)

## Load adapters from the Hub

You can also directly load adapters from the Hub using the commands below:

In [None]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = f"{HUGGING_FACE_USER_NAME}/{model_name}"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=False, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)

Downloading (…)/adapter_config.json:   0%|          | 0.00/337 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

Downloading adapter_model.bin:   0%|          | 0.00/12.6M [00:00<?, ?B/s]

## Inference

You can then directly use the trained model or the model that you have loaded from the 🤗 Hub for inference as you would do it usually in `transformers`.

### Take it for a spin!

In [None]:
from IPython.display import display, Markdown

def make_inference(product, description):
  batch = tokenizer(f"Below is a product and description, please write a marketing email for this product.\n\n### Product:\n{product}\n### Description:\n{description}\n\n### Marketing Email:\n", return_tensors='pt')

  with torch.cuda.amp.autocast():
    output_tokens = model.generate(**batch, max_new_tokens=200)

  display(Markdown((tokenizer.decode(output_tokens[0], skip_special_tokens=True))))

In [None]:
your_product_name_here = "The Coolinator"
your_product_description_here = "A personal cooling device to keep you from getting overheated on a hot summer's day!"

make_inference(your_product_name_here, your_product_description_here)

Below is a product and description, please write a marketing email for this product.

### Product:
The Coolinator
### Description:
A personal cooling device to keep you from getting overheated on a hot summer's day!

### Marketing Email:
Subject: Stay Cool in Your Overheated Day with the Coolinator!

Dear [Name],

Are you overheated from a hot summer's day? We have the solution for you! Introducing the Coolinator!

The Coolinator is a personal cooling device that is designed to keep you from getting overheated in a hurry. Our technology uses infrared heat to dissipate the heat from your body heat, preventing you from getting too hot or too cold. This means that when you need to stay cool, the Coolinator comes to your rescue!

No matter how hot you are, our technology keeps you cool. It uses advanced heat dissipation technology to keep you from overheating and from spreading your heat around throughout the day. This means that when you need to stay cool, the Coolinator comes to your rescue.

The Coolinator is practical and stylish, making it a perfect tool to help you stay cool in a hurry. It's compact and portable, making it

### Example in Training Set

In [None]:
make_inference("SmartEyes", "Glasses with real-time translation")

Below is a product and description, please write a marketing email for this product.

### Product:
SmartEyes
### Description:
Glasses with real-time translation

### Marketing Email:
Subject: Say Goodbye to Translation Fear!

Dear [Name],

Are you tired of being chained to your local grocery or drugstore and having to rely on your intuition in choosing a medicine or treatment? We have the solution for you! Introducing SmartEyes!

SmartEyes are the perfect solution for helping you stay up to date with your medical condition or treatment. With our real-time translation feature, you can rest easy knowing that SmartEyes has taken the stress out of knowing that your doctor has recommended the best treatment option for you.

Whether you're an adult with a medical condition, a patient on a treatment plan, or a researcher looking to stay up to date with your condition, SmartEyes is the perfect solution for helping you stay connected with your doctor and stay on track with your treatment plan. With SmartEyes by your side, you can trust that you'll get the care you need from any doctor or hospital visit.

Whether you're an adult with a medical condition, a patient on