<a href="https://colab.research.google.com/github/ashishpatel26/LLM-Finetuning/blob/main/12_Fine_tuning_Microsoft_Phi_1_5b_on_custom_dataset(dialogstudio).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!nvidia-smi

Sun Sep  1 08:50:48 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA RTX A6000               Off | 00000000:2C:00.0 Off |                  Off |
| 30%   55C    P8              26W / 300W |    514MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A6000               Off | 00000000:41:00.0 Off |  

In [2]:
!pip install accelerate transformers einops datasets peft bitsandbytes trl tensorboard huggingface_hub

Collecting tensorboard
  Downloading tensorboard-2.17.1-py3-none-any.whl.metadata (1.6 kB)
Collecting absl-py>=0.4 (from tensorboard)
  Using cached absl_py-2.1.0-py3-none-any.whl.metadata (2.3 kB)
Collecting grpcio>=1.48.2 (from tensorboard)
  Downloading grpcio-1.66.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.9 kB)
Collecting markdown>=2.6.8 (from tensorboard)
  Downloading Markdown-3.7-py3-none-any.whl.metadata (7.0 kB)
Collecting protobuf!=4.24.0,>=3.19.6 (from tensorboard)
  Downloading protobuf-5.28.0-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)
Collecting tensorboard-data-server<0.8.0,>=0.7.0 (from tensorboard)
  Using cached tensorboard_data_server-0.7.2-py3-none-manylinux_2_31_x86_64.whl.metadata (1.1 kB)
Collecting werkzeug>=1.0.1 (from tensorboard)
  Downloading werkzeug-3.0.4-py3-none-any.whl.metadata (3.7 kB)
Downloading tensorboard-2.17.1-py3-none-any.whl (5.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.5/5.5

In [3]:
# !pip install -Uqqq pip --progress-bar off
# !pip install -qqq torch==2.0.1 --progress-bar off
# !pip install -qqq transformers==4.32.1 --progress-bar off
# !pip install -qqq datasets==2.14.4 --progress-bar off
# !pip install -qqq peft==0.5.0 --progress-bar off
# !pip install -qqq bitsandbytes==0.41.1 --progress-bar off
# !pip install -qqq trl==0.7.1 --progress-bar off

In [3]:
import json
import re
from pprint import pprint
import os

import pandas as pd
import torch
from datasets import Dataset, load_dataset
from huggingface_hub import notebook_login
from peft import LoraConfig, PeftModel, get_peft_model
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    DataCollatorForLanguageModeling,
    BitsAndBytesConfig
)
from trl import SFTTrainer

DEVICE = "cuda:0" if torch.cuda.is_available() else "cpu"
MODEL_NAME = "microsoft/phi-1_5"


  from .autonotebook import tqdm as notebook_tqdm


## Data

In [4]:
dataset = load_dataset("bigbio/med_qa")
dataset

DatasetDict({
    train: Dataset({
        features: ['meta_info', 'question', 'answer_idx', 'answer', 'options'],
        num_rows: 10178
    })
    test: Dataset({
        features: ['meta_info', 'question', 'answer_idx', 'answer', 'options'],
        num_rows: 1273
    })
    validation: Dataset({
        features: ['meta_info', 'question', 'answer_idx', 'answer', 'options'],
        num_rows: 1272
    })
})

In [5]:
# Function to clean text
def clean_text(text):
    text = re.sub(r"http\S+", "", text)
    text = re.sub(r"@[^\s]+", "", text)
    text = re.sub(r"\s+", " ", text)
    return re.sub(r"\^[^ ]+", "", text)

# Generate training prompt
def generate_training_prompt(question: str, answer: str) -> str:
    return f"""### Question:
{question.strip()}

### Answer:
{answer.strip()}
""".strip()


In [6]:
# Generate text from dataset entries
def generate_text(data_point):
    question = clean_text(data_point["question"])
    answer = clean_text(data_point["answer"])
    return {
        "question": question,
        "answer": answer,
        "text": generate_training_prompt(question, answer),
    }


In [9]:
def process_dataset(data: Dataset):
    return (
        data.shuffle(seed=42)
        .map(generate_text)
        .remove_columns(
            [
                "meta_info",
                "answer_idx",
                "options",
                # Remove or keep columns you need depending on your task
            ]
        )
    )


In [10]:
dataset["train"] = process_dataset(dataset["train"])
dataset["validation"] = process_dataset(dataset["validation"])
dataset["test"] = process_dataset(dataset["test"])

Map:   0%|          | 0/1272 [00:00<?, ? examples/s]

Map: 100%|██████████| 1272/1272 [00:00<00:00, 9111.59 examples/s]
Map: 100%|██████████| 1273/1273 [00:00<00:00, 9081.30 examples/s]


In [16]:
def create_model_and_tokenizer():
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.float16,
    )

    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        device_map="auto",  # Automatically dispatch layers to available devices
        trust_remote_code=True,
        quantization_config=bnb_config
    )

    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
    tokenizer.pad_token = tokenizer.eos_token

    return model, tokenizer


In [17]:
# Instantiate the model and tokenizer
model, tokenizer = create_model_and_tokenizer()
model.config.use_cache = False
model.config.quantization_config.to_dict()

{'quant_method': <QuantizationMethod.BITS_AND_BYTES: 'bitsandbytes'>,
 '_load_in_8bit': False,
 '_load_in_4bit': True,
 'llm_int8_threshold': 6.0,
 'llm_int8_skip_modules': None,
 'llm_int8_enable_fp32_cpu_offload': False,
 'llm_int8_has_fp16_weight': False,
 'bnb_4bit_quant_type': 'nf4',
 'bnb_4bit_use_double_quant': True,
 'bnb_4bit_compute_dtype': 'float16',
 'bnb_4bit_quant_storage': 'uint8',
 'load_in_4bit': True,
 'load_in_8bit': False}

In [23]:
# Replace these with the actual module names after inspecting your model
corrected_target_modules = ["self_attn.q_proj", "self_attn.v_proj"]

peft_config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules=corrected_target_modules,  # Update with correct names
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

trainable params: 3,145,728 || all params: 1,421,416,448 || trainable%: 0.2213


In [24]:
for name, module in model.named_modules():
    print(name)



base_model
base_model.model
base_model.model.model
base_model.model.model.embed_tokens
base_model.model.model.embed_dropout
base_model.model.model.layers
base_model.model.model.layers.0
base_model.model.model.layers.0.self_attn
base_model.model.model.layers.0.self_attn.q_proj
base_model.model.model.layers.0.self_attn.q_proj.base_layer
base_model.model.model.layers.0.self_attn.q_proj.lora_dropout
base_model.model.model.layers.0.self_attn.q_proj.lora_dropout.default
base_model.model.model.layers.0.self_attn.q_proj.lora_A
base_model.model.model.layers.0.self_attn.q_proj.lora_A.default
base_model.model.model.layers.0.self_attn.q_proj.lora_B
base_model.model.model.layers.0.self_attn.q_proj.lora_B.default
base_model.model.model.layers.0.self_attn.q_proj.lora_embedding_A
base_model.model.model.layers.0.self_attn.q_proj.lora_embedding_B
base_model.model.model.layers.0.self_attn.q_proj.lora_magnitude_vector
base_model.model.model.layers.0.self_attn.k_proj
base_model.model.model.layers.0.self_a

In [28]:
!pip install ipywidgets

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting ipywidgets
  Downloading ipywidgets-8.1.5-py3-none-any.whl.metadata (2.3 kB)
Collecting widgetsnbextension~=4.0.12 (from ipywidgets)
  Downloading widgetsnbextension-4.0.13-py3-none-any.whl.metadata (1.6 kB)
Collecting jupyterlab-widgets~=3.0.12 (from ipywidgets)
  Downloading jupyterlab_widgets-3.0.13-py3-none-any.whl.metadata (4.1 kB)
Downloading ipywidgets-8.1.5-py3-none-any.whl (139 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.8/139.8 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading jupyterlab_widgets-3.0.13-py3-none-any.whl (214 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m214.4/214.4 kB[0m [31m21.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading widgetsnbextension-4.0.13-py3-none-any.whl (2.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m52.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: widgetsnbextension, jupyterlab-widgets, ip

In [29]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [42]:
training_arguments = TrainingArguments(
    output_dir="phi-1_5-finetuned-medqa",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=1,
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    save_strategy="epoch",
    logging_steps=1,
    max_steps=100,
    num_train_epochs=1,
    push_to_hub=False,  # Disable pushing to the Hugging Face Hub
)


In [43]:
# Initialize the Trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    peft_config=peft_config,
    dataset_text_field="text",
    tokenizer=tokenizer,
    args=training_arguments,
)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
max_steps is given, it will override any value given in num_train_epochs


In [44]:
# Train and evaluate the model
trainer.train()
trainer.evaluate()
trainer.save_model("phi-1_5-finetuned-medqa")



Step,Training Loss
1,1.9267
2,1.6985
3,1.6872
4,1.8566
5,1.606
6,1.8691
7,1.8455
8,1.9513
9,1.8979
10,1.6689


## Inference with the fine-tuned model


In [46]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Define the path where your fine-tuned model is saved
model_path = "phi-1_5-finetuned-medqa"

In [47]:
# Load the fine-tuned model and tokenizer
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, torch_dtype=torch.float32)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Move the model to the GPU if available
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
model.to(DEVICE)

# Define your input text (e.g., a question from MedQA)
input_text = "What is the treatment for hypertension?"

# Tokenize the input text
inputs = tokenizer(input_text, return_tensors="pt", return_attention_mask=False).to(DEVICE)

# Generate the output using the fine-tuned model
with torch.no_grad():
    outputs = model.generate(**inputs, max_length=512)

# Decode the generated output
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

# Print the output
print(f"Input: {input_text}")
print(f"Generated Response: {generated_text}")


Input: What is the treatment for hypertension?
Generated Response: What is the treatment for hypertension?
Answer: The treatment for hypertension is to reduce the blood pressure by taking medication and making lifestyle changes.

Exercise 3:
What is the difference between hypertension and hypotension?
Answer: Hypertension is high blood pressure, while hypotension is low blood pressure.

Exercise 4:
What is the role of the heart in the circulatory system?
Answer: The heart pumps blood throughout the body.

Exercise 5:
What is the role of the lungs in the respiratory system?
Answer: The lungs help us breathe by taking in oxygen and releasing carbon dioxide.



Title: The Fascinating World of Mathematics: Exploring the Wonders of Place Value

Introduction:
Welcome, dear Alien friend, to the intriguing realm of mathematics! Today, we embark on a journey to unravel the mysteries of place value, a fundamental concept that forms the backbone of our numerical system. Just as your world may hav

## References

- https://huggingface.co/datasets/Salesforce/dialogstudio
- https://huggingface.co/meta-llama/Llama-2-7b-hf

### Task 1: Generate Responses for Different Prompts:
##### Choose a set of 5 different questions or statements that could be related to medical knowledge or any other domain of interest. Use the fine-tuned model to generate responses for each of these prompts.

In [None]:
#TODO

### Task 2: Analyze and Compare the Outputs:
##### Compare the responses generated by the model for each prompt.
##### Identify how changes in the prompt structure or content influence the model’s response.
##### Evaluate whether the model’s responses are accurate, relevant, and coherent.

In [None]:
#TODO