<a href="https://colab.research.google.com/github/Jay-mishra04/Medicine-Chatbot-Fine-Tuned-LLM-Poject/blob/main/LLM_Bot_Using_Pre_Trained_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - Specialized LLM Bot Using Pre-Trained Models
##### **Contribution**    - Individual
##### **Team Member 1 -**  Mritunjay Mishra


# **Project Summary -**

This project involves the development of a Healthcare and Pharmaceuticals Industry-Specific Large Language Model (LLM) Bot, designed to provide accurate and contextually relevant medical information. The focus is on creating an intelligent conversational agent capable of answering queries about medicines, including their composition, uses, and side effects, thereby enhancing access to reliable drug-related knowledge.

For data collection, a custom dataset was built by scraping the 1mg website, one of India’s leading online pharmacies. The dataset includes structured information such as medicine names, compositions, uses, side effects, and images. Example entries include widely prescribed drugs like Avastin 400mg Injection, Augmentin 625 Duo Tablet, and Azithral 500 Tablet. This ensures that the LLM Bot is trained on authentic, real-world pharmaceutical data, making it capable of addressing patient and healthcare-related queries effectively.

A suitable pre-trained model from Hugging Face was fine-tuned using this dataset on Google Colab with T4 GPUs, within a feasible training limit of 25 epochs. Fine-tuning enables the model to become contextually aware of drug-specific information while maintaining general language understanding capabilities.

The resulting LLM Bot can interact with users in natural language, providing instant answers regarding drug uses, side effects, and compositions. For instance, when asked “What are the uses of Avastin 400mg Injection?”, the bot can correctly respond with indications such as colon cancer, lung cancer, kidney cancer, brain tumor, ovarian cancer, and cervical cancer. Similarly, it can explain potential side effects like rectal bleeding, high blood pressure, or dry skin.

The project is showcased through an explanatory video, demonstrating the bot’s ability to answer medical queries in a clear and user-friendly manner. This implementation highlights the real-world application of LLMs in healthcare, supporting both patients and professionals in quick access to trusted drug information. The work will be further extended in the Industry Immersion module through a research paper analyzing the role of LLMs in improving healthcare accessibility and pharmaceutical knowledge dissemination.

# **GitHub Link -**

Provide your GitHub Link here.
https://github.com/Jay-mishra04/Medicine-Chatbot-Fine-Tuned-LLM-Poject.git


# **Problem Statement**


In the healthcare and pharmaceutical sector, access to reliable, easy-to-understand drug information is a persistent challenge. Patients often struggle to find accurate details about medicines—such as their uses, side effects, and compositions—while healthcare professionals face time constraints in addressing repetitive queries. Although online resources exist, the information is often scattered, unstructured, or too technical for general users. This gap can lead to misunderstanding of prescriptions, improper medication usage, and reduced patient confidence in digital healthcare solutions.

To address this issue, there is a need for an intelligent conversational system that can provide instant, trustworthy, and contextually relevant information about medicines. By leveraging Large Language Models (LLMs) fine-tuned on authentic pharmaceutical data (e.g., from trusted sources like 1mg), such a system can enhance patient awareness, reduce dependency on fragmented web searches, and assist healthcare providers in delivering better support.

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import re

### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv("medicine_data.csv")

### Dataset First View

In [None]:
# Dataset First Look
df.head(5)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
rows, columns = df.shape
print("Rows:", rows)
print("Columns:", columns)

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isna().sum()

### What did you know about your dataset?

The dataset was created by web scraping from the 1mg website, which provides detailed pharmaceutical information. It contains 11,825 rows and 5 columns, structured as follows:

- name – The commercial name of the medicine (e.g., Avastin 400mg Injection).
- composition – The active ingredients and their concentrations (e.g., Bevacizumab (400mg)).
- uses – The therapeutic uses or conditions for which the medicine is prescribed (e.g., colon cancer, lung cancer, kidney cancer).
- side_effects – Possible adverse effects associated with the medicine (e.g., headache, nausea, diarrhea).
- image_url – A link to the product image available on the 1mg platform.

##### Data Characteristics

- Rows and Columns: 11,825 medicines × 5 attributes.
- Data Types: All columns are stored as object (string) type.
- Duplicates: 84 duplicate rows detected.
- Missing Values: No missing values in any column.
Memory Usage: ~462 KB (very lightweight and easy to handle).

### Insights
- The dataset is clean and structured, making it suitable for fine-tuning a Large Language Model (LLM).
- Each row represents one medicine and provides a complete description (name, composition, uses, side effects, image).
- The uses and side_effects columns are multi-valued text fields, which can be tokenized and transformed into instruction-based Q&A pairs for LLM training (e.g., “What are the uses of Augmentin 625 Duo Tablet?” → “Treatment of Bacterial infections”).
- The presence of images (image_url column) provides opportunities for extending the project into multimodal LLMs in the future (text + image understanding).

With ~11.8k records, the dataset is large enough to fine-tune smaller language models (e.g., 1–3B parameters) within Google Colab’s resource constraints.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

- name – The commercial name of the medicine (e.g., Avastin 400mg Injection).
- composition – The active ingredients and their concentrations (e.g., Bevacizumab (400mg)).
- uses – The therapeutic uses or conditions for which the medicine is prescribed (e.g., colon cancer, lung cancer, kidney cancer).
- side_effects – Possible adverse effects associated with the medicine (e.g., headache, nausea, diarrhea).
- image_url – A link to the product image available on the 1mg platform.

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Handling Duplicate values
df.duplicated().sum()

In [None]:
# viewing the duplicates
df[df.duplicated() == True]

In [None]:
df[df["name"] == "Aristogyl-F Oral Suspension"]

In [None]:
# dropping the duplicates
df.drop_duplicates(inplace = True)

In [None]:
df.duplicated().sum()

In [None]:
# dropping column not required for tuning the llm model
df.drop(columns=["composition", "image_url"], inplace=True)

In [None]:
df.head(5)

In [None]:
# removing words like Treatment for consistent formatting as some rows have it and some do not have it
df['uses_cleaned'] = (
    df['uses'].str.replace('treatment and prevention of ', '', case=False, regex=False)
    .str.replace('treatment of ', '', case=False, regex=False)
    .str.replace('prevention of ', '', case=False, regex=False)
)

In [None]:
# checking for treatment keyword in any row
rows_with_treatment = df[df['uses_cleaned'].str.lower().str.contains('treatment')]
rows_with_treatment

In [None]:
# removing anything that is present inside the bracket
df['uses_cleaned'] = df['uses_cleaned'].str.replace(r"\(.*?\)", "", regex=True).str.strip()

In [None]:
rows_with_bracket = df[df['uses_cleaned'].str.contains(r"\(", regex=True)]
rows_with_bracket

In [None]:
# converting uppercase keywords like COVID to small cases
df['uses_cleaned'] = df['uses_cleaned'].str.replace(
    r"\b[A-Z]+\b",   # regex pattern
    lambda m: m.group(0).lower(),   # replacement logic
    regex=True
)

In [None]:
# Step 1: insert comma if a capital word follows another word (space case)
df['uses_cleaned'] = df['uses_cleaned'].str.replace(
    r"\s+([A-Z])",   # space + capital
    r" , \1",
    regex=True
)

# Step 2: insert comma if a capital word is glued after a lowercase
df['uses_cleaned'] = df['uses_cleaned'].str.replace(
    r"(?<=[a-z])([A-Z])",  # lowercase + capital
    r" , \1",
    regex=True
)

# Step 3: clean spaces (make sure exactly one space before comma)
df['uses_cleaned'] = df['uses_cleaned'].str.replace(
    r"\s+,", " ,", regex=True
)

In [None]:
df.head()

In [None]:
# saving the well formatted csv file
df.to_csv("temp_cleaned.csv", index=False)

### What all manipulations have you done and insights you found?

- Removed duplicate records from the dataset to ensure uniqueness and avoid repetition during model training.
- Dropped irrelevant columns (composition, image_url) that are not required for fine-tuning the LLM, keeping only useful features.
- Cleaned the side_effects column by inserting commas before capital letters (except at the beginning) to improve readability and consistency.
- Standardized text formatting by stripping extra spaces from name, uses, and side_effects.
- Created instruction–response pairs for each medicine:

“What are the uses of <medicine_name>?” → mapped to its uses.

“What are the side effects of <medicine_name>?” → mapped to its side_effects.

- Saved the cleaned dataset into a CSV file (medicine_data_cleaned.csv).
- Converted the data into JSONL format (medical_qa_dataset.jsonl) with one object per line, making it ready for fine-tuning on Hugging Face.

#### Insights
- The dataset is now structured in a question–answer format, which aligns well with instruction-based fine-tuning.
- Cleaning and formatting improve the clarity of side effects and uses, making model responses more human-readable.
- Removing irrelevant columns reduces noise and ensures the model focuses only on relevant medical information.

The final dataset contains double the number of entries compared to medicines (since each medicine contributes two Q&A pairs: uses and side effects).

In [None]:
# Now using OLLama to create well formatted answer for each medicine
import pandas as pd
import time
import os
from ollama import Client

# ----- Configuration -----
input_csv = "medicine_cleaned.csv"
output_csv = "medicine_data_llm_processed.csv"
checkpoint_csv = "checkpoint.csv"  # intermediate saving
model_name = "mistral"

# Initialize Ollama client
client = Client(host="http://localhost:11434")

# ----- Data Loading -----
try:
    df = pd.read_csv(input_csv)
    print(f"✅ Data loaded successfully. Rows: {len(df)}")
except FileNotFoundError:
    print(f"❌ Error: The file '{input_csv}' was not found.")
    exit()

# Check for required columns
required_cols = {"name", "uses_cleaned"}
if not required_cols.issubset(df.columns):
    print("❌ Error: The CSV must contain 'name' and 'uses_cleaned' columns.")
    exit()

# Add output column if not present
if "uses_cleaned_llm" not in df.columns:
    df["uses_cleaned_llm"] = None

# ----- LLM Processing Function -----
def get_llm_response(medicine_name, uses_text, retries=3):
    """
    Query Ollama LLM to generate a consistent, structured 'uses' sentence.
    Includes retry mechanism for robustness.
    """
    if pd.isna(uses_text) or str(uses_text).strip() == "":
        return "No medical use information available."

    prompt = f"""
    You are a medical data expert.
    Task: Convert the given medicine name and its list of uses into a single, clear, and professional sentence.

    Example:
    Medicine: Augmentin 625 Duo Tablet
    Uses: Treatment of Bacterial infections
    Output: Augmentin 625 Duo Tablet is used for the treatment of various bacterial infections.

    Medicine: Avastin 400mg Injection
    Uses: Cancer of colon and rectum, Non-small cell lung cancer, Kidney cancer, Brain tumor, Ovarian cancer, Cervical cancer
    Output: Avastin 400mg Injection is used to treat several types of cancer, including those of the colon, rectum, lung (non-small cell), kidney, brain, ovary, and cervix.

    Now process:
    Medicine: {medicine_name}
    Uses: {uses_text}
    """

    for attempt in range(retries):
        try:
            response = client.chat(
                model=model_name,
                messages=[{"role": "user", "content": prompt}],
            )
            return response["message"]["content"].strip()
        except Exception as e:
            print(f"⚠️ Error on attempt {attempt+1} for '{medicine_name}': {e}")
            time.sleep(2 * (attempt + 1))  # Exponential backoff
    return "Error: Unable to process"

# ----- Processing Loop -----
print("🚀 Starting processing...")

for idx, row in df.iterrows():
    if pd.notna(row["uses_cleaned_llm"]) and row["uses_cleaned_llm"].strip() != "":
        continue  # Skip already processed rows (important if resuming)

    medicine_name = row["name"]
    uses_text = row["uses_cleaned"]

    processed_text = get_llm_response(medicine_name, uses_text)
    df.at[idx, "uses_cleaned_llm"] = processed_text

    # Save progress every 20 rows
    if idx % 20 == 0:
        df.to_csv(checkpoint_csv, index=False)
        print(f"💾 Saved checkpoint at row {idx}/{len(df)}")

    time.sleep(0.5)  # Prevent hammering the LLM server

print("✅ Processing complete.")

# ----- Save Final -----
df.to_csv(output_csv, index=False)
print(f"🎉 Final data saved to '{output_csv}'")


In [None]:
# Loading new well formatted csv file
ollama_df = pd.read_csv("ollama_cleaned.csv")

In [None]:
ollama_df.head()

In [None]:
# dropping columns
ollama_df.isna().sum()

In [None]:
import pandas as pd
import json

# Load your cleaned file (CSV or directly use df if already loaded)
df = pd.read_csv("ollama_cleaned.csv")   # change filename if needed

# Convert into fine-tuning format
records = []
for _, row in df.iterrows():
    instruction = f"What is the use of {row['name']}?"
    output = row['uses_cleaned_llm']

    records.append({
        "instruction": instruction,
        "output": output
    })

# Save as JSON
with open("fine_tune_dataset.json", "w", encoding="utf-8") as f:
    json.dump(records, f, ensure_ascii=False, indent=2)

print("✅ JSON file created: fine_tune_dataset.json")


## ***Finetuning Implementation***

In [None]:
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
!pip install unsloth trl peft accelerate bitsandbytes

Collecting unsloth
  Downloading unsloth-2025.8.9-py3-none-any.whl.metadata (52 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.3/52.3 kB[0m [31m639.3 kB/s[0m eta [36m0:00:00[0m
[?25hCollecting trl
  Downloading trl-0.21.0-py3-none-any.whl.metadata (11 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl.metadata (11 kB)
Collecting unsloth_zoo>=2025.8.8 (from unsloth)
  Downloading unsloth_zoo-2025.8.8-py3-none-any.whl.metadata (9.4 kB)
Collecting xformers>=0.0.27.post2 (from unsloth)
  Downloading xformers-0.0.32.post2-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (1.1 kB)
Collecting tyro (from unsloth)
  Downloading tyro-0.9.28-py3-none-any.whl.metadata (11 kB)
Collecting datasets<4.0.0,>=3.4.1 (from unsloth)
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting cut_cross_entropy (from unsloth_zoo>=2025.8.8->unsloth)
  Downloading cut_cross_entropy-25.1.1-py3-none-any.whl.metadata (9.3 kB)
Coll

In [None]:
# For GPU check
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")

CUDA available: True
GPU: Tesla T4


In [10]:
from unsloth import FastLanguageModel
import torch
from google.colab import drive

# Model configuration
model_name = "unsloth/Phi-3-mini-4k-instruct-bnb-4bit"
max_seq_length = 256
dtype = None

# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=True,
)

# Path where you want to save
save_path = "/content/drive/MyDrive/alma better/LLM Project/AlmaBetter LLM Project/original_llm_model"

# Save model and tokenizer
model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)

print(f"Model and tokenizer saved at: {save_path}")


==((====))==  Unsloth 2025.8.9: Fast Mistral patching. Transformers: 4.55.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/458 [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Model and tokenizer saved at: /content/drive/MyDrive/alma better/LLM Project/AlmaBetter LLM Project/original_llm_model


In [7]:
import pandas as pd

# Load CSV file
df = pd.read_csv("/content/drive/MyDrive/alma better/LLM Project/AlmaBetter LLM Project/finetuned_medicine_data.csv")

# Inspect the first few rows
print(df.head())


                       name                                   uses_cleaned_llm
0   Avastin 400mg Injection  Avastin 400mg Injection is utilized for the tr...
1  Augmentin 625 Duo Tablet  Augmentin 625 Duo Tablet is used for the treat...
2       Azithral 500 Tablet  Azithral 500 Tablet is utilized for the treatm...
3          Ascoril LS Syrup  Ascoril LS Syrup is utilized for the managemen...
4         Aciloc 150 Tablet  Aciloc 150 Tablet is used for the treatment of...


In [8]:
import json
import os
from datasets import Dataset

# Convert to instruction-output format
data_for_finetune = []
for _, row in df.iterrows():
    data_for_finetune.append({
        "input": f"What is the use of {row['name']}?",
        "output": row['uses_cleaned_llm']
    })

# Format prompts for fine-tuning
def format_prompt(example):
    return f"### Input: {example['input']}\n### Output: {example['output']}<|endoftext|>"

formatted_data = [format_prompt(item) for item in data_for_finetune]

# Convert to Hugging Face Dataset
dataset = Dataset.from_dict({"text": formatted_data})
dataset.save_to_disk("/content/drive/MyDrive/alma better/LLM Project/AlmaBetter LLM Project/dataset")

# Inspect first example
print(dataset[0])

Saving the dataset (0/1 shards):   0%|          | 0/11825 [00:00<?, ? examples/s]

{'text': '### Input: What is the use of Avastin 400mg Injection?\n### Output: Avastin 400mg Injection is utilized for the treatment of various types of cancer, specifically those affecting the colon and rectum, non-small cell lung cancer, kidney, brain, ovaries, and cervix.<|endoftext|>'}


In [None]:
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=64,  # LoRA rank - higher = more capacity, more memory
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    lora_alpha=128,  # LoRA scaling factor (usually 2x rank)
    lora_dropout=0,  # Supports any, but = 0 is optimized
    bias="none",     # Supports any, but = "none" is optimized
    use_gradient_checkpointing="unsloth",  # Unsloth's optimized version
    random_state=3407,
    use_rslora=False,  # Rank stabilized LoRA
    loftq_config=None, # LoftQ
)

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
import torch

# Define the save path for fine-tuned model
finetuned_save_path = "/content/drive/MyDrive/alma better/LLM Project/AlmaBetter LLM Project/Finetuned"

# Training arguments optimized for Unsloth
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=10,
        num_train_epochs=10,
        learning_rate=2e-4,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=25,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir=finetuned_save_path,  # Save in Google Drive folder
        save_strategy="epoch",
        save_total_limit=2,
        dataloader_pin_memory=False,
        report_to="none",  # Disable Weights & Biases logging
    ),
)

# Start fine-tuning
trainer.train()

# Ensure final save in case the last epoch is not saved automatically
trainer.model.save_pretrained(finetuned_save_path)
trainer.tokenizer.save_pretrained(finetuned_save_path)

print(f"Fine-tuned model saved at: {finetuned_save_path}")


Unsloth: Tokenizing ["text"]:   0%|          | 0/11825 [00:00<?, ? examples/s]

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 11,825 | Num Epochs = 10 | Total steps = 14,790
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 119,537,664 of 3,940,617,216 (3.03% trained)


Step,Training Loss
25,1.2187
50,0.8409
75,0.8191
100,0.8166
125,0.7717
150,0.7334
175,0.7384
200,0.7238
225,0.7567
250,0.7241


Unsloth: Will smartly offload gradients to save VRAM!


In [4]:
from unsloth import FastLanguageModel

# Load fine-tuned model properly (base + adapter)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "/content/drive/MyDrive/alma better/LLM Project/AlmaBetter LLM Project/Finetuned/checkpoint-2958",
    max_seq_length = 2048,
    dtype = None,  # Auto-detect fp16/bf16
    load_in_4bit = True,  # or False if you don't want quantization
)

# Enable faster inference
FastLanguageModel.for_inference(model)

# Test prompt
messages = [
    {"role": "user", "content": "What is the use of Atarax 25mg Tablet?"}
]

# Tokenize
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to("cuda")

# Generate
outputs = model.generate(
    input_ids=inputs,
    max_new_tokens=256,
    use_cache=True,
    temperature=0.7,
    do_sample=True,
    top_p=0.9,
)

# Decode
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(response)


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.8.9: Fast Mistral patching. Transformers: 4.55.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/2.26G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/194 [00:00<?, ?B/s]

Unsloth 2025.8.9 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


What is the use of Atarax 25mg Tablet? Atarax 25mg Tablet is used for the treatment of anxiety and skin conditions characterized by inflammation and itching.


In [3]:
import torch
import gradio as gr
from unsloth import FastLanguageModel

# ---------------------------
# Load your fine-tuned model
# ---------------------------
finetuned_model_path = "/content/drive/MyDrive/alma better/LLM Project/AlmaBetter LLM Project/Finetuned"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = finetuned_model_path,
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

FastLanguageModel.for_inference(model)  # enable fast inference

# ---------------------------
# Define chatbot function
# ---------------------------
def chat_with_model(message, history):
    # history is a list of [user, bot] pairs
    messages = []
    for user, bot in history:
        messages.append({"role": "user", "content": user})
        messages.append({"role": "assistant", "content": bot})
    messages.append({"role": "user", "content": message})

    # Tokenize
    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt",
    ).to("cuda")

    # Generate
    outputs = model.generate(
        input_ids=inputs,
        max_new_tokens=256,
        use_cache=True,
        temperature=0.7,
        do_sample=True,
        top_p=0.9,
    )

    response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
    return response

# ---------------------------
# Build Gradio Chatbot UI
# ---------------------------
chatbot = gr.ChatInterface(
    fn=chat_with_model,
    title="💊 Medicine Chatbot (Fine-Tuned)",
    description="Ask me about medicines and their uses!",
    theme="soft",
)

# ---------------------------
# Launch
# ---------------------------
chatbot.launch(share=True)  # share=True gives you a public link


NotImplementedError: Unsloth currently only works on NVIDIA GPUs and Intel GPUs.

In [None]:
# Importing Libraries
import os
import math
import time
import torch
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
    TrainingArguments,
    DataCollatorForLanguageModeling,
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer

In [None]:
MODEL_PATH = "/home/mritunjay/Projects/AlmaBetter Chatbot Project/TinyLlama-1.1B-Chat-v1.0"
DATASET_FILE = "fine_tune_dataset.json"
OUTPUT_DIR = "./TinyLlama-1.1B-Finetuned"

In [None]:
# Performance knobs
MAX_SEQ_LENGTH = 128
PER_DEVICE_TRAIN_BATCH_SIZE = 4
GRADIENT_ACCUMULATION_STEPS = 8
NUM_TRAIN_EPOCHS = 6
LEARNING_RATE = 2e-4
WEIGHT_DECAY = 0.0
LOGGING_STEPS = 50
SAVE_STEPS = 2000
MAX_GRAD_NORM = 0.3
WARMUP_RATIO = 0.03

In [None]:
# LoRA knobs
LORA_R = 8
LORA_ALPHA = 16
LORA_DROPOUT = 0.1

LORA_TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "o_proj"]

In [None]:
# Quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=False,
)

In [None]:
# prefer bf16 if supported (slightly faster on some GPUs); otherwise use fp16
USE_BF16 = torch.cuda.is_bf16_supported()
print("Using precision:", "bf16" if USE_BF16 else "fp16")

In [None]:
# Basic checks & device info
assert torch.cuda.is_available(), "No GPU detected. This script requires a GPU."
world_size = torch.cuda.device_count()
print(f"Detected {world_size} CUDA device(s). GPU sample: {torch.cuda.get_device_name(0)}")

In [None]:
# Adjust per-device batch if machine has fewer GPUs (keep conservative)
# If single GPU and big model, consider lowering batch to 2 or 1 to avoid OOM
if world_size == 1 and PER_DEVICE_TRAIN_BATCH_SIZE > 4:
    PER_DEVICE_TRAIN_BATCH_SIZE = 4

effective_batch = PER_DEVICE_TRAIN_BATCH_SIZE * max(1, world_size) * GRADIENT_ACCUMULATION_STEPS
print("Effective batch size (per step):", effective_batch)

In [None]:
model = AutoModelForCausalLM.from_pretrained(
        MODEL_PATH,
        quantization_config=bnb_config,
        torch_dtype=torch.bfloat16 if USE_BF16 else torch.float16,
    )

model.config.use_cache = False

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

In [None]:
# Prepare model for k-bit + LoRA
model = prepare_model_for_kbit_training(model)

peft_config = LoraConfig(
    r=LORA_R,
    lora_alpha=LORA_ALPHA,
    target_modules=LORA_TARGET_MODULES,
    lora_dropout=LORA_DROPOUT,
    bias="none",
    task_type="CAUSAL_LM",
)

In [None]:
model = get_peft_model(model, peft_config)
print("Attached LoRA. Trainable parameters (first lines):")
count = 0
for n, p in model.named_parameters():
    if p.requires_grad:
        count += p.numel()
        if count < 2000:
            print("  ", n, p.shape)
print("Total trainable params (LoRA approx):", count)

In [None]:
# ---------------------------- Dataset & tokenization ------------------
print("Loading dataset...")
raw_ds = load_dataset("json", data_files=DATASET_FILE, split="train")
num_samples = len(raw_ds)
print("Number of training samples:", raw_ds)

In [None]:
def to_text(example):
    instr = example.get("instruction", "") or ""
    out = example.get("output", "") or ""
    return f"### Instruction:\n{instr}\n\n### Response:\n{out}"

def tokenize_fn(example):
    text = to_text(example)
    return tokenizer(
        text,
        truncation=True,
        max_length=MAX_SEQ_LENGTH,
        padding=False,  # dynamic padding handled by data collator
    )

print("Tokenizing dataset (fast)...")
tokenized_ds = raw_ds.map(tokenize_fn, remove_columns=raw_ds.column_names)
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)


In [None]:
# ---------------------------- TrainingArguments -----------------------
# Calculate total steps and print estimate
steps_per_epoch = math.ceil(num_samples / effective_batch)
total_steps = steps_per_epoch * NUM_TRAIN_EPOCHS
print(f"Steps per epoch: {steps_per_epoch}, Total training steps: {total_steps}")

training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    num_train_epochs=NUM_TRAIN_EPOCHS,
    per_device_train_batch_size=PER_DEVICE_TRAIN_BATCH_SIZE,
    gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
    optim="paged_adamw_8bit",
    save_steps=SAVE_STEPS,
    logging_steps=LOGGING_STEPS,
    learning_rate=LEARNING_RATE,
    weight_decay=WEIGHT_DECAY,
    fp16=not USE_BF16,
    bf16=USE_BF16,
    max_grad_norm=MAX_GRAD_NORM,
    warmup_ratio=WARMUP_RATIO,
    group_by_length=False,           # disable: can speed up training with dynamic padding
    lr_scheduler_type="cosine",
    gradient_checkpointing=False,    # OFF for faster training (per your instruction)
    max_steps=-1,
    report_to="none",
)


In [None]:
# ---------------------------- Trainer & Train -------------------------
trainer = SFTTrainer(
    model=model,
    train_dataset=tokenized_ds,
    peft_config=peft_config,
    processing_class=tokenizer,
    data_collator=data_collator,
    args=training_args,
)

# quick reminder to user
print("\n=== Summary ===")
print("Model path:", MODEL_PATH)
print("Dataset:", DATASET_FILE, f"({num_samples} samples)")
print("Epochs:", NUM_TRAIN_EPOCHS)
print("Per-device batch:", PER_DEVICE_TRAIN_BATCH_SIZE)
print("Grad accumulation:", GRADIENT_ACCUMULATION_STEPS)
print("Effective batch:", effective_batch)
print("Max seq len:", MAX_SEQ_LENGTH)
print("LoRA r/alpha/dropout:", LORA_R, LORA_ALPHA, LORA_DROPOUT)
print("LoRA targets:", LORA_TARGET_MODULES)
print("====================\n")

# Start training and show simple iteration-time monitoring
t0 = time.time()
print("🚀 Starting training...")
trainer.train()
t1 = time.time()
elapsed = t1 - t0
print(f"🎉 Training finished in {elapsed/60:.2f} minutes")

# ---------------------------- Save model & tokenizer -------------------
os.makedirs(OUTPUT_DIR, exist_ok=True)
model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)
print("Saved to", OUTPUT_DIR)


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# ---------------- Configuration ----------------
model_path = "./TinyLlama-1.1B-Finetuned"
question = "What is the use of Augmentin 625 Duo Tablet?"

# ---------------- Load Model & Tokenizer ----------------
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",          # Uses GPU if available
    torch_dtype=torch.bfloat16
)
model.eval()

# ---------------- Prepare Prompt ----------------
# Using a clear instruction-response style to avoid echoes
prompt = f"### Instruction:\n{question}\n\n### Response:"

# ---------------- Tokenize & move to device ----------------
inputs = tokenizer(prompt, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}

# ---------------- Generate Answer ----------------
outputs = model.generate(
    **inputs,
    max_new_tokens=150,
    do_sample=False,            # Deterministic output for factual answers
    pad_token_id=tokenizer.eos_token_id
)

# ---------------- Decode ----------------
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Extract only the answer after "### Response:"
answer = decoded.split("### Response:")[-1].strip()

# ---------------- Print ----------------
print(answer)


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Replace this with the actual path to your fine-tuned model's directory
model_path = "./TinyLlama-1.1B-Finetuned"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype=torch.bfloat16)

In [None]:
def create_prompt(instruction, context=""):
    # This is a common template for chat models
    return f"""<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>

{instruction} [/INST]"""

instruction = "What is the use of Azithral 500 Tablet?"
prompt = create_prompt(instruction)

In [None]:
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate the response
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95
)

# Decode the generated tokens to text
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Post-process the output to get the model's response
# The model will generate the prompt and its own response, so we need to
# trim the prompt from the output.
response = decoded_output.split('[/INST]')[1].strip()

print("Question:", instruction)
print("Answer:", response)

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Specify the name of the original, non-fine-tuned model from Hugging Face Hub
original_model_name = "TinyLlama-1.1B-Chat-v1.0"

# Load the tokenizer and the original model
tokenizer = AutoTokenizer.from_pretrained(original_model_name)
model = AutoModelForCausalLM.from_pretrained(
    original_model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16
)

# Define a prompt
prompt = "What are the uses of Avastin 400mg Injection?"

# Prepare the prompt with the appropriate chat template
# This template helps the model understand the context and role-play as an assistant
formatted_prompt = f"<s>[INST] {prompt} [/INST]"

# Tokenize the prompt
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)

# Generate a response
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95
)

# Decode and print the output
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Extract and print only the model's response
response = decoded_output.split('[/INST]')[1].strip()
print("Original Model's Response:")
print(response)

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### ML Model - 2

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

#### 3. Explain each evaluation metric's indication towards business and the business impact pf the ML model used.

Answer Here.

### ML Model - 3

In [None]:
# ML Model - 3 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 3 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### 1. Which Evaluation metrics did you consider for a positive business impact and why?

Answer Here.

### 2. Which ML model did you choose from the above created models as your final prediction model and why?

Answer Here.

### 3. Explain the model which you have used and the feature importance using any model explainability tool?

Answer Here.

## ***8.*** ***Future Work (Optional)***

### 1. Save the best performing ml model in a pickle file or joblib file format for deployment process.


In [None]:
# Save the File

### 2. Again Load the saved model file and try to predict unseen data for a sanity check.


In [None]:
# Load the File and predict unseen data.

### ***Congrats! Your model is successfully created and ready for deployment on a live server for a real user interaction !!!***

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***