<a href="https://colab.research.google.com/github/Shriansh16/LLM_Engineering/blob/main/19_fine_tuning_llama_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# pip installs

!pip install -q datasets==2.21.0 requests torch peft bitsandbytes transformers==4.43.1 trl accelerate sentencepiece wandb matplotlib

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.7/43.7 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m527.3/527.3 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.4/9.4 MB[0m [31m65.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m320.7/320.7 kB[0m [31m18.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m122.4/122.4 MB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m316.6/316.6 kB[0m [31m17.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m109.2/109.2 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
# imports

import os
import re
import math
from tqdm import tqdm
from huggingface_hub import login
import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, set_seed
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, set_seed
from datasets import load_dataset, Dataset, DatasetDict
from peft import LoraConfig
from trl import SFTTrainer, SFTConfig
from datetime import datetime
import matplotlib.pyplot as plt

In [3]:
# Constants

BASE_MODEL = "meta-llama/Llama-2-7b-hf"
PROJECT_NAME = "pricerml"
HF_USER = "MLsheenu" # your HF name here!

# Data

DATASET_NAME = f"{HF_USER}/pricer-data"
# Or just use the one I've uploaded
# DATASET_NAME = "ed-donner/pricer-data"
MAX_SEQUENCE_LENGTH = 182

# Run name for saving the model in the hub

RUN_NAME =  f"{datetime.now():%Y-%m-%d_%H.%M.%S}"
PROJECT_RUN_NAME = f"{PROJECT_NAME}-{RUN_NAME}"
HUB_MODEL_NAME = f"{HF_USER}/{PROJECT_RUN_NAME}"

# Hyperparameters for QLoRA

LORA_R = 16
LORA_ALPHA = 32
TARGET_MODULES = ["q_proj", "v_proj", "k_proj", "o_proj"]
LORA_DROPOUT = 0.1
QUANT_4_BIT = False
fp16 = True  # Enable mixed precision to reduce memory consumption
bf16 = False

# Hyperparameters for Training

EPOCHS = 1
BATCH_SIZE = 1
GRADIENT_ACCUMULATION_STEPS = 4
LEARNING_RATE = 1e-4
LR_SCHEDULER_TYPE = 'cosine'
WARMUP_RATIO = 0.03
OPTIMIZER = "paged_adamw_32bit"

# Admin config



%matplotlib inline

In [4]:
HUB_MODEL_NAME

'MLsheenu/pricerml-2024-10-21_13.03.24'

In [5]:
# Log in to HuggingFace

hf_token = ''
login(hf_token, add_to_git_credential=True)

Token is valid (permission: write).
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [6]:
dataset = load_dataset(DATASET_NAME)
train = dataset['train']
test = dataset['test']

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/412 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/9.91M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/3625 [00:00<?, ? examples/s]

In [24]:
train

Dataset({
    features: ['text', 'price'],
    num_rows: 25000
})

In [25]:
train_subset = train.select(range(100))

In [28]:
train_subset[1]

{'text': "How much does this cost to the nearest dollar?\n\nDoor Pivot Block - Compatible Kenmore KitchenAid Maytag Whirlpool Refrigerator - Replaces - Quick DIY Repair Solution\nPivot Block For Vernicle Mullion Strip On Door - A high-quality exact equivalent for part numbers and Compatibility with major brands - Door Guide is compatible with Whirlpool, Amana, Dacor, Gaggenau, Hardwick, Jenn-Air, Kenmore, KitchenAid, and Maytag. Quick DIY repair - Refrigerator Door Guide Pivot Block Replacement will help if your appliance door doesn't open or close. Wear work gloves to protect your hands during the repair process. Attentive support - If you are uncertain about whether the block fits your refrigerator, we will help. We generally put forth a valiant effort to guarantee you are totally\n\nPrice is $17.00",
 'price': 16.52}

In [21]:
train1[0]

KeyError: 0

In [22]:
test[0]

{'text': 'How much does this cost to the nearest dollar?\n\nDPD Washer Lid Lock Latch Switch Assembly Fits for Maytag Centennial Washer Whirlpool Kenmore Washer Replaces\nPart washer lid lock switch replaces： This washer lid lock replacement works with the following products Whirlpool, Maytag, Kenmore, Amana. Contact Us If you are not sure if part is correct, ask us in Customer questions & answers section or contact us by visiting the Discount Parts Direct storefront. Package Includes 1 x lid lock switch assembly is a 4-wire switch, 2 x bezels (white and grey), 1 x instructions Part numbers etc. Works For Brands washer lid lock replacement Compatible with Whirlpool, Kenmore, Amana,Maytag centennial washer. PREMIUM QUALITY Lid Lock Latch Switch detects if the washer\n\nPrice is $',
 'price': 21.99}

In [8]:
# pick the right quantization

if QUANT_4_BIT:
  quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"
  )
else:
  quant_config = BitsAndBytesConfig(
    load_in_8bit=True,
    bnb_8bit_compute_dtype=torch.bfloat16
  )

Unused kwargs: ['bnb_8bit_compute_dtype']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.


In [9]:
# Load the Tokenizer and the Model

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=quant_config,
    device_map="auto",
)
base_model.generation_config.pad_token_id = tokenizer.pad_token_id

print(f"Memory footprint: {base_model.get_memory_footprint() / 1e6:.1f} MB")

tokenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/609 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Memory footprint: 7000.8 MB


In [10]:
from trl import DataCollatorForCompletionOnlyLM
response_template = "Price is $"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)

In [29]:
# First, specify the configuration parameters for LoRA

lora_parameters = LoraConfig(
    lora_alpha=LORA_ALPHA,
    lora_dropout=LORA_DROPOUT,
    r=LORA_R,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=TARGET_MODULES,
)

# Next, specify the general configuration parameters for training

train_parameters = SFTConfig(
    output_dir=PROJECT_RUN_NAME,
    num_train_epochs=EPOCHS,
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=1,
    eval_strategy="no",
    gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
    optim=OPTIMIZER,

    save_total_limit=10,

    learning_rate=LEARNING_RATE,
    weight_decay=0.001,
    fp16=False,
    bf16=True,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=WARMUP_RATIO,
    group_by_length=True,
    dataset_text_field="text",
    lr_scheduler_type=LR_SCHEDULER_TYPE,

)

# And now, the Supervised Fine Tuning Trainer will carry out the fine-tuning
# Given these 2 sets of configuration parameters

fine_tuning = SFTTrainer(
    model=base_model,
    train_dataset=train_subset,
    peft_config=lora_parameters,
    tokenizer=tokenizer,
    args=train_parameters,
    data_collator=collator
)



Map:   0%|          | 0/100 [00:00<?, ? examples/s]

In [30]:
# Fine-tune!
fine_tuning.train()



Vulcan Hart VULCAN HART Door Gasket 26.25 & 75.75 2 Pieces Metal & Wovan For Vulcan Oven Retro 321163
SPECIFICATIONS LENGTH 86, 2.2 m WEIGHT.565 lb TYPE BRAIDED FIBERGLASS GASKET PART REFERENCE INFO VULCAN-HART MODEL REFERENCE INFO VULCAN-HART MODELS GH30, GH30C, GH45, GH56, GH56S, GH6, GH6C, GH6S, GH60, GH60T, GH72, VULCAN-HART OVEN MODELS SG7800 SERIES VULCAN-HART OVENS & RANGES MODELS SG7800 VULCAN-HART RANGE MODELS GH45 V

Price is $102.00 This instance will be ignored in loss calculation. Note, if this happens often, consider increasing the `max_seq_length`.

Cma Dish Machines Heater Thermostat
Product Description HEATER THERMOSTAT (EGO). Cma Dish Machines Genuine OEM replacement part. CMA Dish Machines produces both High-temperature and Low-temperature chemical sanitizing Dish machines, Glass washers and Warewashing equipment. Use genuine OEM parts for safety reliability and performance. From the Manufacturer HEATER THERMOSTAT (EGO). Cma Dish Machines Genuine OEM replacement par

Step,Training Loss



Aqua Fresh GE MWF Refrigerator Water Filter Replacement Compatible with GE SmartWater MWF, MWFA, MWFP, MWFINT, GWF, HDX FMG-1, (2 Pack)
EASY TO FIT Designed to fit the original with Twist and lock Design. No tools required. AFFORDABLE OPTION Costs less than OEM filters without compromising any quality or flow rate. CERTIFIED QUALITY All Aquafresh Filters are tested and certified by IAPMO to NSF Standard 42 for Structural Integrity, Materials Safety, Chlorine taste, odor reduction and System performance. Quality you can taste! HIGH EFFICIENCY FILTRATION Activated carbon blocks certified to ensure contaminant reduction for 300 Gallons Or 6 Months, depending On Water Usage And Quality COMPATIBLE With

Price is $23.00 This instance will be ignored in loss calculation. Note, if this happens often, consider increasing the `max_seq_length`.

2-Pack Premium High-Flow Stainless Steel Washing Machine Hoses - 4 FT No-Lead Burst Proof Red and Blue Lined Water Inlet Supply Lines - Universal 90 Deg

TrainOutput(global_step=25, training_loss=0.0, metrics={'train_runtime': 217.5279, 'train_samples_per_second': 0.46, 'train_steps_per_second': 0.115, 'total_flos': 911465778241536.0, 'train_loss': 0.0, 'epoch': 1.0})

In [31]:
fine_tuning.model.push_to_hub(PROJECT_RUN_NAME, private=True)
print(f"Saved to the hub: {PROJECT_RUN_NAME}")

adapter_model.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

Saved to the hub: pricerml-2024-10-21_13.03.24
