<a href="https://colab.research.google.com/github/bensethbell/Building-Generative-AI-Apps/blob/main/%F0%9F%92%AELLMPrompter%F0%9F%92%AEFine_tuning_MIXTRAL_ResponseGen.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-tune a Mixtral-based ad generation model using `peft`, `transformers` and `bitsandbytes`

We can use the [Product Descriptions and Ads Dataset](https://huggingface.co/datasets/c-s-ale/Product-Descriptions-and-Ads) to fine-tune Mixtral to be able to generate LLM prompts based on LLM responses!

### Overview of PEFT and LoRA:

Based on some awesome new research [here](https://github.com/huggingface/peft), we can leverage techniques like PEFT and LoRA to train/fine-tune large models a lot more efficiently.

It can't be explained much better than the overview given in the above link:

```
Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of
pre-trained language models (PLMs) to various downstream applications without
fine-tuning all the model's parameters. Fine-tuning large-scale PLMs is often
prohibitively costly. In this regard, PEFT methods only fine-tune a small
number of (extra) model parameters, thereby greatly decreasing the
computational and storage costs. Recent State-of-the-Art PEFT techniques
achieve performance comparable to that of full fine-tuning.
```

### Install requirements

First, run the cells below to install the requirements:

In [1]:
!pip install -qU flash-attn --no-build-isolation
!pip install transformers accelerate bitsandbytes peft -qU
!pip install -qU datasets
!pip install -qU trl

[0m

### Model loading

Here let's load the `Mixtral-8x7B` model!

In [None]:
import torch
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer

model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"

bits_and_bytes_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_quant_type="nf4", # "NormalFloat 4-bit," - suited for normally distributed weights, such as those found in neural networks.
    bnb_4bit_use_double_quant=True
)

mixtral_7B = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bits_and_bytes_config,
    attn_implementation="flash_attention_2"
)

mixtral_tokenizer = AutoTokenizer.from_pretrained(model_id)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading shards:   0%|          | 0/19 [00:00<?, ?it/s]

model-00001-of-00019.safetensors:   0%|          | 0.00/4.89G [00:00<?, ?B/s]

model-00002-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00003-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00004-of-00019.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00005-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00006-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00007-of-00019.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00008-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00009-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00010-of-00019.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00011-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00012-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00013-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00014-of-00019.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00015-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00016-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00017-of-00019.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00018-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

### Post-processing on the model

Finally, we need to apply some post-processing on the 8-bit model to enable training, let's freeze all our layers, and cast the layer-norm in `float32` for stability. We also cast the output of the last layer in `float32` for the same reasons.

In [None]:
import torch.nn as nn
#not training the model, just training the adapter
for param in mixtral_7B.parameters():
  param.requires_grad = False  # freeze the model - train adapters later
  if param.ndim == 1:
    # cast the small parameters (e.g. layernorm) to fp32 for stability, these are more precision sensitive
    param.data = param.data.to(torch.float32)

mixtral_7B.gradient_checkpointing_enable()  # reduce number of stored activations
mixtral_7B.enable_input_require_grads()

class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x).to(torch.float32)
mixtral_7B.lm_head = CastOutputToFloat(mixtral_7B.lm_head)

In [None]:
#new
mixtral_tokenizer.add_special_tokens({'pad_token': '[PAD]'})

In [None]:
#New
import transformers
from torch.cuda.amp import autocast


text = "### Instruction:\nUse the provided input to create an instruction that could have been used to generate the response with an LLM.### Input:\nThere are more than 12,000 species of grass. The most common is Kentucky Bluegrass, because it grows quickly, easily, and is soft to the touch. Rygrass is shiny and bright green colored. Fescues are dark green and shiny. Bermuda grass is harder but can grow in drier soil.\n\n### Response:"
inputs = mixtral_tokenizer(text, return_tensors="pt")

inputs = inputs.to('cuda')  # Assuming 'inputs' need to be on GPU
with autocast():
    outputs = mixtral_7B.generate(**inputs, max_new_tokens=150)


#outputs = mixtral_7B.generate(**inputs, max_new_tokens=150)
print(mixtral_tokenizer.decode(outputs[0], skip_special_tokens=True))

In [None]:
!pip install -qU datasets

### Apply LoRA

Here comes the magic with `peft`! Let's load a `PeftModel` and specify that we are going to use low-rank adapters (LoRA) using `get_peft_model` utility function from `peft`.

In [None]:
#next two cells from hugging face
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [None]:
print(mixtral_7B)

In [None]:
#New
from peft import AutoPeftModelForCausalLM, LoraConfig, get_peft_model, prepare_model_for_kbit_training

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    task_type="CAUSAL_LM"
)

model = get_peft_model(mixtral_7B, peft_config)
print_trainable_parameters(mixtral_7B)

### Preprocessing

We can simply load our dataset from 🤗 Hugging Face with the `load_dataset` method!

In [None]:
from datasets import load_dataset

dataset = load_dataset("mosaicml/instruct-v3")
dataset = dataset.filter(lambda x: x["source"] == "dolly_hhrlhf")
dataset

In [None]:
dataset["train"] = dataset["train"].select(range(5_000))
dataset["test"] = dataset["test"].select(range(200))
dataset

We want to put our data in the form:

```
Below is a product and description, please write an ad for this product.

### Product and Description:
PRODUCT NAME AND DESCRIPTION HERE

### Ad:
OUR AD HERE
```

This way, we can prompt our model well and receive the responses we want!

This is what fine-tuning, and prompt-engineering, is really all about!

In [None]:
dataset['train']

In [None]:
def create_prompt(sample):
  bos_token = "<s>"
  original_system_message = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
  system_message = "[INST]Use the provided input to create an instruction that could have been used to generate the response with an LLM."
  response = sample["prompt"].replace(original_system_message, "").replace("\n\n### Instruction\n", "").replace("\n### Response\n", "").strip()
  input = sample["response"]
  eos_token = "</s>"

  full_prompt = ""
  full_prompt += bos_token
  full_prompt += system_message
  full_prompt += "\n" + input
  full_prompt += "[/INST]"
  full_prompt += response
  full_prompt += eos_token

  return full_prompt

In [None]:
dataset["train"][0]

In [None]:
create_prompt(dataset['train'][0])

In [None]:
create_prompt(dataset['train'][1])

In [None]:
#NEW (add description - should go earlier?)
model = prepare_model_for_kbit_training(mixtral_7B)
model = get_peft_model(mixtral_7B, peft_config)

In [None]:
#NEW
from transformers import TrainingArguments

args = TrainingArguments(
  output_dir = "mistral_ad_generation",
  #num_train_epochs=5,
  max_steps = 100, # comment out this line if you want to train in epochs
  per_device_train_batch_size = 2,
  warmup_steps = 0.03,
  logging_steps=10,
  save_strategy="epoch",
  #evaluation_strategy="epoch",
  evaluation_strategy="steps",
  eval_steps=20, # comment out this line if you want to evaluate at the end of each epoch
  learning_rate=2e-4,
  bf16=True,
  lr_scheduler_type='constant',
)

In [None]:
from trl import SFTTrainer

max_seq_length = 2048

trainer = SFTTrainer(
  model=model,
  peft_config=peft_config,
  max_seq_length=max_seq_length,
  tokenizer=mixtral_tokenizer,
  packing=True,
  formatting_func=create_prompt,
  args=args,
  train_dataset=dataset["train"],
  eval_dataset=dataset["test"]
)

In [None]:
trainer.train()

In [None]:
merged_model = model.merge_and_unload()

In [None]:
text = "<s>[INST]Use the provided input to create an instruction that could have been used to generate the response with an LLM.\nThere are more than 12,000 species of grass. The most common is Kentucky Bluegrass, because it grows quickly, easily, and is soft to the touch. Rygrass is shiny and bright green colored. Fescues are dark green and shiny. Bermuda grass is harder but can grow in drier soil.[/INST]"
inputs = mixtral_tokenizer(text, return_tensors="pt")

outputs = merged_model.generate(
    **inputs,
    max_new_tokens=150,
    generation_kwargs={"repetition_penalty" : 1.7}
)
print(mixtral_tokenizer.decode(outputs[0], skip_special_tokens=True))

In [34]:
def input_from_text(text):
  #formats text for Mixtral
  return "<s>[INST]Use the provided input to create an instruction that could have been used to generate the response with an LLM.\n" + text + "[/INST]"

In [42]:
def get_instruction(text):
  inputs = mixtral_tokenizer(input_from_text(text), return_tensors="pt")

  outputs = merged_model.generate(
      **inputs,
      max_new_tokens=150,
      generation_kwargs={"repetition_penalty" : 1.7}
  )
  # print(mixtral_tokenizer.decode(outputs[0], skip_special_tokens=True))
  print(mixtral_tokenizer.decode(outputs[0], skip_special_tokens=True).split("[/INST]")[1])

In [43]:
print(mixtral_tokenizer.decode(outputs[0], skip_special_tokens=True))

[INST]Use the provided input to create an instruction that could have been used to generate the response with an LLM.
There are more than 12,000 species of grass. The most common is Kentucky Bluegrass, because it grows quickly, easily, and is soft to the touch. Rygrass is shiny and bright green colored. Fescues are dark green and shiny. Bermuda grass is harder but can grow in drier soil.[/INST] "Describe the characteristics of four common species of grass, including Kentucky Bluegrass, Ryegrass, Fescues, and Bermuda grass, in terms of their color, texture, and growth conditions."


In [44]:
text = "The concept of cuteness in Pokémon is subjective, but three Pokémon frequently cited for their adorable qualities are Pikachu, Eevee, and Jigglypuff. "
get_instruction(text)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


 "Generate an instruction for an LLM to create a response about cute Pokémon, mentioning Pikachu, Eevee, and Jigglypuff."


In [46]:
text = "American ex-pats are drawn to a variety of destinations around the world based on factors like climate, cost of living, culture, and ease of integration. Popular countries include Mexico, for its proximity to the United States and affordable cost of living; Spain, known for its rich culture and favorable climate; Portugal, with its beautiful landscapes and welcoming communities; Thailand, offering an affordable cost of living with vibrant culture; and Costa Rica, which is famed for its natural beauty and eco-friendly lifestyle. These countries not only provide a scenic change of pace but also boast strong ex-pat communities that help newcomers feel more at home."
get_instruction(text)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


 "Generate a list of popular destinations for American expats, considering factors such as climate, cost of living, culture, and ease of integration."


## Share adapters on the 🤗 Hub

Make sure you have a Hugging Face account, and you have set up a read/write token!

More info here: https://huggingface.co/docs/hub/security-tokens

In [None]:
HUGGING_FACE_USER_NAME = "bsbell21"

In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
mixtral_7B.push_to_hub("llm_instruction_generator", use_auth_token=True)



model-00001-of-00006.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00004-of-00006.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00006-of-00006.safetensors:   0%|          | 0.00/524M [00:00<?, ?B/s]

Upload 6 LFS files:   0%|          | 0/6 [00:00<?, ?it/s]

model-00005-of-00006.safetensors:   0%|          | 0.00/4.52G [00:00<?, ?B/s]

model-00003-of-00006.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00006.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/Bsbell21/llm_instruction_generator/commit/960778e1562b3f609644a66f56be8df352ac909a', commit_message='Upload MixtralForCausalLM', commit_description='', oid='960778e1562b3f609644a66f56be8df352ac909a', pr_url=None, pr_revision=None, pr_num=None)

## Load adapters from the Hub

You can also directly load adapters from the Hub using the commands below:

In [None]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = f"{HUGGING_FACE_USER_NAME}/llm_instruction_generator"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)

ValueError: Can't find 'adapter_config.json' at 'bsbell21/llm_instruction_generator'

## Inference

You can then directly use the trained model or the model that you have loaded from the 🤗 Hub for inference!

### Take it for a spin!

In [None]:
from IPython.display import display, Markdown

def make_inference(product_name, product_description):
  batch = tokenizer(f"### Product and Description:\n{product_name}: {product_description}\n\n### Ad:", return_tensors='pt')

  with torch.cuda.amp.autocast():
    output_tokens = model.generate(**batch, max_new_tokens=50)

  display(Markdown((tokenizer.decode(output_tokens[0], skip_special_tokens=True))))

[Jigglypuff Boots](https://ae01.alicdn.com/kf/H5be27d910811459692b9342a40000377s/Runway-New-Chunky-Heel-Boots-Lace-Up-Round-Toe-Cute-Carton-Painting-Women-Girls-Shoes-Sweety.jpg)

In [None]:
your_product_name_here = "Jigglypuff Boots"
your_product_description_here = "Leather pink knee high boots with the pokemon Jigglypuff on the knees"

make_inference(your_product_name_here, your_product_description_here)



### Product and Description:
Jigglypuff Boots: Leather pink knee high boots with the pokemon Jigglypuff on the knees

### Ad:
Introducing our Jigglypuff Boots! These adorable knee high boots feature the iconic Pokemon Jigglypuff on the knees, making for a fun & stylish booty set! Limited stock! Don't miss out

### Example in Training Set

Original Ad From Training Set: 'Introducing our latest Lace-up Sandals - where style meets comfort! Wrap your feet in chic designs and experience unbeatable support. Perfect for summer strolls & beachside bliss. Shop now!'

In [None]:
batch = tokenizer("### Product and Description:\n Lace-up sandals: Shoes featuring laces or ties that wrap around the foot and, in some cases, the ankle.\n\n### Ad:", return_tensors='pt')

with torch.cuda.amp.autocast():
  output_tokens = model.generate(**batch, max_new_tokens=50)

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))





 ### Product and Description:
 Lace-up sandals: Shoes featuring laces or ties that wrap around the foot and, in some cases, the ankle.

### Ad:
Introducing our latest Lace-up Sandals! Wrap your feet in chic comfort and experience a new way to style your feet. Perfect for any occasion, they’re also stunningly chic. Shop now for a fashion-forward look!


In [None]:
#NEW
text = "<s>[INST]The following input contains a product and description, please write an ad for this product.\n\n\n### Product and Description:\n Jigglypuff Boots:  Leather pink knee high boots with the pokemon Jigglypuff on the knees\n\n### Ad:[/INST]</s>"
inputs = mixtral_tokenizer(text, return_tensors="pt")

outputs = merged_model.generate(
    **inputs,
    max_new_tokens=150,
    generation_kwargs={"repetition_penalty" : 1.7}
)
print(mixtral_tokenizer.decode(outputs[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


[INST]The following input contains a product and description, please write an ad for this product.


### Product and Description:
 Jigglypuff Boots:  Leather pink knee high boots with the pokemon Jigglypuff on the knees

### Ad:[/INST]"Step into style and comfort with our Jigglypuff Boots! These leather pink knee high boots are not only fashionable but also feature the adorable Pokemon Jigglypuff on the knees. Perfect for any Pokemon fan or anyone looking to add a playful touch to their outfit. The high-quality leather material ensures durability and long-lasting use. Stand out in a crowd and let your love for Jigglypuff shine with these unique and eye-catching boots. Order now and experience the magic of Jigglypuff in a whole new way!"


In [None]:
#NEW
text = "<s>[INST]The following input contains a product and description, please write an ad for this product.\n\n\n### Product and Description:\n Fringe skirt:  A skirt featuring fringe detailing on the bottom, creating movement and fun.\n\n### Ad:[/INST]</s>"
inputs = mixtral_tokenizer(text, return_tensors="pt")

outputs = merged_model.generate(
    **inputs,
    max_new_tokens=150,
    generation_kwargs={"repetition_penalty" : 1.7}
)
print(mixtral_tokenizer.decode(outputs[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


[INST]The following input contains a product and description, please write an ad for this product.


### Product and Description:
 Fringe skirt:  A skirt featuring fringe detailing on the bottom, creating movement and fun.

### Ad:[/INST]"Step into style and let the good times roll with our Fringe Skirt! This show-stopping skirt features eye-catching fringe detailing on the bottom that adds movement and flair to any outfit. Perfect for a night out or a special occasion, this skirt is sure to turn heads and make you feel confident and fabulous. Made from high-quality materials, the Fringe Skirt is comfortable and easy to wear, allowing you to dance the night away without worry. Don't miss out on this must-have addition to your wardrobe. Order now and add some excitement to your look!"


Previous answer: Are you ready to add some movement and fun to your wardrobe? Look no further than our Fringe Skirt! This skirt is anything but ordinary, with fringe detailing on the bottom that adds a unique and eye-catching touch. Whether you're dressing up for a night out or dressing down for a casual day, this skirt is the perfect addition to any outfit. Plus, the fringe detailing creates a fun and flirty movement that is sure to turn heads. Don't miss out on this must-have skirt - order yours today and add some excitement to your wardrobe!

New answer: "Step into style and let the good times roll with our Fringe Skirt! This show-stopping skirt features eye-catching fringe detailing on the bottom that adds movement and flair to any outfit. Perfect for a night out or a special occasion, this skirt is sure to turn heads and make you feel confident and fabulous. Made from high-quality materials, the Fringe Skirt is comfortable and easy to wear, allowing you to dance the night away without worry. Don't miss out on this must-have addition to your wardrobe. Order now and add some excitement to your look!"
:

In [None]:
text = "### Instruction:\nUse the provided input to create an instruction that could have been used to generate the response with an LLM.### Input:\nThere are more than 12,000 species of grass. The most common is Kentucky Bluegrass, because it grows quickly, easily, and is soft to the touch. Rygrass is shiny and bright green colored. Fescues are dark green and shiny. Bermuda grass is harder but can grow in drier soil.\n\n### Response:"
inputs = mixtral_tokenizer(text, return_tensors="pt")

outputs = mixtral_7B.generate(**inputs, max_new_tokens=150)
print(mixtral_tokenizer.decode(outputs[0], skip_special_tokens=True))

### Example outside of Training Set

In [None]:
batch = tokenizer("### Product and Description:\nSundress: A flowery yellow sundress with blue polka dots. \n\n### Ad:", return_tensors='pt')

with torch.cuda.amp.autocast():
  output_tokens = model.generate(**batch, max_new_tokens=50)

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))



 ### Product and Description:
Sundress: A flowery yellow sundress with blue polka dots. 

### Ad:
Discover a chic, flowery-yellow sundress for a versatile look! Embellished with blue polka dots, this fashion-forward piece is perfect for any occasion. Shop now! #Sundress #Fashion #F


### Example outside of immediate domain

In [None]:
batch = tokenizer("### Product and Description:\n A new Lexus: A luxury automobile with grey paint and tinted windows.\n\n### Ad:", return_tensors='pt')

with torch.cuda.amp.autocast():
  output_tokens = model.generate(**batch, max_new_tokens=50)

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))



 ### Product and Description:
 A new Lexus: A luxury automobile with grey paint and tinted windows.

### Ad:
Introducing the Lexus:  Exotic, sophistication meets chic travel meets luxury. Discover the ultimate vehicle for ultimate style and comfort. Shop now for a dazzling new life! #LexusNewLife #LuxuryAuto #T


In [None]:
import pandas as pd
dataset.set_format("pandas")

In [None]:
dataset['train'].set_format("pandas")
df

In [None]:
dataset.filter(lambda row: "Lace-up" in row["product"])['train'][0]['ad'].values

array(['Introducing our latest Lace-up Sandals - where style meets comfort! Wrap your feet in chic designs and experience unbeatable support. Perfect for summer strolls & beachside bliss. Shop now!'],
      dtype=object)

As you can see by fine-tuning for few steps we have almost recovered the exact quote from the training data.