<a href="https://colab.research.google.com/github/abhigyanpal1/AI-Medical-Chatbot/blob/main/medical_chatbot_using_llama2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Step 1: Installing and importing the libraries

In [1]:
!pip uninstall accelerate peft bitsandbytes transformers trl -y
!pip install accelerate peft==0.13.2 bitsandbytes transformers trl==0.12.0

Found existing installation: accelerate 1.6.0
Uninstalling accelerate-1.6.0:
  Successfully uninstalled accelerate-1.6.0
Found existing installation: peft 0.15.2
Uninstalling peft-0.15.2:
  Successfully uninstalled peft-0.15.2
[0mFound existing installation: transformers 4.51.3
Uninstalling transformers-4.51.3:
  Successfully uninstalled transformers-4.51.3
[0mCollecting accelerate
  Downloading accelerate-1.6.0-py3-none-any.whl.metadata (19 kB)
Collecting peft==0.13.2
  Downloading peft-0.13.2-py3-none-any.whl.metadata (13 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting transformers
  Downloading transformers-4.51.3-py3-none-any.whl.metadata (38 kB)
Collecting trl==0.12.0
  Downloading trl-0.12.0-py3-none-any.whl.metadata (10 kB)
Collecting datasets>=2.21.0 (from trl==0.12.0)
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>

In [2]:
!pip install huggingface_hub



In [3]:
import torch
from trl import SFTTrainer
from peft import LoraConfig
from datasets import load_dataset
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, pipeline)

## Step 2: Loading the model

In [4]:
llama_model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path = "aboonaji/llama2finetune-v2",
                                                   quantization_config = BitsAndBytesConfig(load_in_4bit = True,
                                                                                            bnb_4bit_compute_dtype = getattr(torch, "float16"),
                                                                                            bnb_4bit_quant_type = "nf4") ) # automodel for causal will predict words based on the previous words # quantization config is used here to reduce the model size by applying 4 bit precision # bnb_compute_4bit_dtype mein it will fetch the best datatype for computations # bnb_4bit_quant_type yaha pe fp4 ki jagah nf4 use kiya hai
llama_model.config.use_cache = False # outputs of previously computed layers is not stored in the cache which reduces the size of the model and speeds up the training computations
llama_model.config.pretraining_tp = 1 # deactivate the more accurate computations of the linear layers because if it was kept activated, it would slow down the linear layers computations


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/632 [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/583 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

adapter_model.bin:   0%|          | 0.00/33.6M [00:00<?, ?B/s]

## Step 3: Loading the tokenizer

In [5]:
# tokenizer must be compatible with the llama 2 model
# tokenizer must use the same tokens as the model and the padding also must be the same
# a tokenizer converts the inputs into a format that can be processed by a model
llama_tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path = "aboonaji/llama2finetune-v2", trust_remote_code = True)
llama_tokenizer.pad_token = llama_tokenizer.eos_token # padding is used to make the lenght of every sequence same, here, we want the pad token to be equal to the end of sequence token(eos means end of sequence)
# the sequences will get filled with the end of string token and hence, the padding side should be right
llama_tokenizer.padding_side = "right" # padding must be done from the right side

tokenizer_config.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]

## Step 4: Setting the training arguments

In [6]:
# configure the training parameters of the future training that will happen once the llama-2 model is retrained using medical data
# training arguments class is from the transformers library
# output_dir -> The output directory where the model predictions and checkpoints will be written
# per_device_train_batch_size = 4 -> the model will process 4 training examples in each training
# max_steps = 100 -> limits the training to maximum 100 steps. For a finite dataset, training is reiterated through the dataset (if all data is exhausted) until max_steps is reached
training_arguments = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=2,
    max_steps=100,
    run_name="Abhi",
    gradient_checkpointing=True,
    save_steps=10,  # Save checkpoints every 10 steps
    save_total_limit=2,  # Keep only the 2 most recent checkpoints
)


## Step 5: Creating the Supervised Fine-Tuning trainer

In [7]:
# supervised fine-tuning is a transfer learning technique that updates the weights of a pretrained model based on the new data provided
# Parameter efficient fine-tuning is used here to reduce the number of parameters. Here LoRA will be used
# LoRA (Low-Rank Adaptation of Large Language Models) is a popular and lightweight training technique that significantly reduces the number of trainable parameters.
llama_sft_trainer = SFTTrainer(model = llama_model,
                               args = training_arguments,
                               train_dataset = load_dataset(path = "aboonaji/wiki_medical_terms_llam2_format", split = "train"),
                               tokenizer = llama_tokenizer,
                               peft_config = LoraConfig(task_type = "CAUSAL_LM", r = 64, lora_alpha = 16, lora_dropout = 0.1),
                               dataset_text_field = "text")
# task_type = "CAUSAL_LM" -> task_type is CAUSAL_LM for causal language modelling
# r = 64 We can say that, as the rank increases, LORA essentially converges toward normal fine-tuning. A lower rank means fewer trainable parameters, resulting in less memory required for fine-tuning the model
# lora_alpha = 16 -> scaling factor for weight matrices
# lora_dropout = 0.1 -> deactivating some of the parameters during the training so that they are not updated all the time and it prevents overfitting
# dataset_text_field = "text" -> input to the model will be text


wiki_medical_terms_llam2.jsonl:   0%|          | 0.00/54.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/6861 [00:00<?, ? examples/s]


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Map:   0%|          | 0/6861 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


## Step 6: Training the model

In [8]:
import os

# Disable wandb logging
os.environ["WANDB_MODE"] = "disabled"

# Start training
llama_sft_trainer.train()


  return fn(*args, **kwargs)


Step,Training Loss


  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)


TrainOutput(global_step=100, training_loss=1.699041748046875, metrics={'train_runtime': 2116.9692, 'train_samples_per_second': 0.094, 'train_steps_per_second': 0.047, 'total_flos': 7178146700083200.0, 'train_loss': 1.699041748046875, 'epoch': 0.02914602156805596})

## Step 7: Chatting with the model

In [10]:
user_prompt = input()
# pipelines are objects that abstract most of the complex code from the library.
text_generation_pipeline = pipeline(task = "text-generation", model = llama_model, tokenizer = llama_tokenizer, max_length = 300)
model_answer = text_generation_pipeline(f"<s>[INST] {user_prompt} [/INST]")
print(model_answer[0]['generated_text'])

Tell me about Ascariasis


Device set to use cuda:0


<s>[INST] Tell me about Ascariasis [/INST]  Ascariasis is a parasitic infection caused by the Ascaris lumbricoides roundworm, which is the most common intestinal parasite worldwide. everybody has the potential to be infected with Ascaris, and it is estimated that over 1.5 billion people worldwide are infected with the parasite.

Ascariasis is a zoonotic infection, meaning it can be transmitted between humans and animals. The parasite is usually ingested through contaminated food or water, and it can also be spread through contact with contaminated feces or soil.

The symptoms of ascariasis can vary depending on the severity of the infection and the location of the parasite in the body. Common symptoms include:

* Abdominal pain
* Diarrhea
* Vomiting
* Weight loss
* Fatigue
* Anemia
* Malnutrition
* Coughing or wheezing (if the parasite migrates to the lungs)

In severe cases, ascariasis can lead to complications such as:

* Obstruction of the intestine
* Perforation of the intestine
* 