<a href="https://colab.research.google.com/github/HebaRouk/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code/blob/main/MY_project_Build_your_MedBot_on_Custom_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Build your MedBot
© 2023, Zaka AI, Inc. All Rights Reserved.

---
The goal of this colab is to get you more familiar with LLM fine-tuning by creating a simple QA LLM that can answer medical questions. By the end of it you will be able to customize this LLM with any dataset.

**Just to give you a heads up:** We won't be having a model performing like ChatGPT or Bard, but at least we will have an idea about how we can create our own smaller versions of such powerful LLMs.  

## Importing and Installing Libraries/Packages
We will start by installing our necessary packages.

**bitsandbytes**: This package will allow us to run 4bit quantization on our model

**transformers**: This Hugging Face package will allow us to load state-of-the-art models easily into our notebook

**peft**: This package allows us to add PEFT techniques easily to our model, such as LoRA

**accelerate**: Accelerate is a handy package that allows us to run boiler plate code with a few lines of code

**datasets**: This package allows us to easily import datasets from the Hugging Face platform to be directly used

In [2]:
!pip install bitsandbytes
!pip install git+https://github.com/huggingface/transformers.git
!pip install git+https://github.com/huggingface/peft.git
!pip install git+https://github.com/huggingface/accelerate.git
!pip install datasets

Collecting git+https://github.com/huggingface/transformers.git
  Cloning https://github.com/huggingface/transformers.git to /tmp/pip-req-build-rkhz4c6_
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers.git /tmp/pip-req-build-rkhz4c6_
  Resolved https://github.com/huggingface/transformers.git to commit 62db3e6ed67a74cc1ed1436acd9973915c0a4475
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting git+https://github.com/huggingface/peft.git
  Cloning https://github.com/huggingface/peft.git to /tmp/pip-req-build-3fc5w1x9
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft.git /tmp/pip-req-build-3fc5w1x9
  Resolved https://github.com/huggingface/peft.git to commit 0facdebf6208139cbd8f3586875acb378813dd97
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements

In [3]:
import torch
import transformers
from peft import prepare_model_for_kbit_training
from peft import LoraConfig, get_peft_model
from datasets import load_dataset
from transformers import AutoTokenizer, BitsAndBytesConfig, AutoModelForCausalLM

## Loading our model

Let's start by loading our model. We will use the GPT Neox 20b Model by EleutherAI!

In [4]:
hf_model = "EleutherAI/gpt-neox-20b"

We will also set the bitsandbytes configurations needed for our model to run on our single colab GPU. The needed paramaters will be 'Double Quantization' 'Quantization Type' and the computational type needs to be set to bfloat16.

In [5]:

bitsbytes_config = BitsAndBytesConfig(
    load_in_4bit=True,
    quantization_type="nf4",  # nf4 (non-uniform 4-bit quantization) is one of the most common choices
    bfloat16=True
)



Unused kwargs: ['quantization_type', 'bfloat16']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.


We will then set our tokenizer, and our model using the AutoTokenizer and AutoModelforCausalLM classes

In [8]:
hf_model = "gpt2"
!pip install transformers
from transformers import AutoTokenizer, AutoModelForCausalLM

hf_model = "gpt2"  # Specify the model name or path

tokenizer = AutoTokenizer.from_pretrained(hf_model)
model = AutoModelForCausalLM.from_pretrained(hf_model)








The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

## Model Preprocessing

We now have to apply some preprocessing to our model so we can prepare it for training. First we need to further reduce our memory consumption by using the gradient_checkpointing_enable() fucntion on our model. We then use the prepare_model_for_kbit_training function so that we can use 4bit quantization training.

In [24]:
# Ensure you have the required libraries
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Define your model name
hf_model = "gpt2"  # Replace with your model name

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(hf_model)

# Check if CUDA is available to decide whether to use 8-bit quantization
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the pre-trained model with 8-bit quantization (only if CUDA is available)
if device == "cuda":
    model = AutoModelForCausalLM.from_pretrained(
        hf_model,
        load_in_8bit=True  # Use 8-bit quantization if CUDA is available
    )
else:
    # If CUDA is not available, load the model without quantization
    model = AutoModelForCausalLM.from_pretrained(hf_model)

# Enable gradient checkpointing
model.gradient_checkpointing_enable()

# Move the model to the appropriate device (GPU if CUDA is available, else CPU)
model = model.to(device)

# Now the model is ready for training or inference








Explain with your own words how 4-bit quantization affects accuracy.

**Test your Zaka**

We will also set a function that will print the number of trainable parameters our model has.

In [25]:
def print_trainable_parameters(model):
    trainable_parameters = 0
    all_paramaters = 0
    for _, param in model.named_parameters():
        all_paramaters += param.numel()
        if param.requires_grad:
            trainable_parameters += param.numel()
    print(
        f"Trainable: {trainable_parameters} || All: {all_paramaters} || Trainable %: {100 * trainable_parameters / all_paramaters}"
    )

Finally we will set the configurations for our LoRA. The paramaters needed are the rank updates, the default LoRa alpha value, the target modules which need to be set to query_key_value, the default lora dropout rate, bias should be set to none, and the task type according to the model we are using.

In [27]:
# Ensure you have the required libraries
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import get_peft_model, LoraConfig
import torch
for name, module in model.named_modules():
    print(name)

# Define your model name
hf_model = "gpt2"  # Replace with your model name

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(hf_model)

# Load the pre-trained model without quantization (as we're working with LoRA configurations)
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForCausalLM.from_pretrained(hf_model)

# Enable gradient checkpointing
model.gradient_checkpointing_enable()

# Move the model to the appropriate device (GPU if CUDA is available, else CPU)
model = model.to(device)

# Define the LoRA configuration with correct target modules
config = LoraConfig(
    r=8,  # Rank for LoRA matrices, typically chosen based on experimentation
    lora_alpha=16,  # LoRA alpha value, usually kept around 8 or 16
    target_modules=["attn.c_attn", "attn.q_proj", "attn.k_proj"],  # Targeting attention layers for LoRA
    lora_dropout=0.1,  # Dropout rate for LoRA, typically 0.1-0.2
    bias="none",  # Bias set to none, meaning no bias is added
    task_type="CAUSAL_LM"  # The task type should match the model you're using (e.g., CAUSAL_LM for GPT-like models)
)

# Apply the LoRA configuration to the model
model = get_peft_model(model, config)

# Function to print the trainable parameters
def print_trainable_parameters(model):
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f"Trainable parameters: {trainable_params}")

# Print the trainable parameters of the model
print_trainable_parameters(model)




transformer
transformer.wte
transformer.wpe
transformer.drop
transformer.h
transformer.h.0
transformer.h.0.ln_1
transformer.h.0.attn
transformer.h.0.attn.c_attn
transformer.h.0.attn.c_proj
transformer.h.0.attn.attn_dropout
transformer.h.0.attn.resid_dropout
transformer.h.0.ln_2
transformer.h.0.mlp
transformer.h.0.mlp.c_fc
transformer.h.0.mlp.c_proj
transformer.h.0.mlp.act
transformer.h.0.mlp.dropout
transformer.h.1
transformer.h.1.ln_1
transformer.h.1.attn
transformer.h.1.attn.c_attn
transformer.h.1.attn.c_proj
transformer.h.1.attn.attn_dropout
transformer.h.1.attn.resid_dropout
transformer.h.1.ln_2
transformer.h.1.mlp
transformer.h.1.mlp.c_fc
transformer.h.1.mlp.c_proj
transformer.h.1.mlp.act
transformer.h.1.mlp.dropout
transformer.h.2
transformer.h.2.ln_1
transformer.h.2.attn
transformer.h.2.attn.c_attn
transformer.h.2.attn.c_proj
transformer.h.2.attn.attn_dropout
transformer.h.2.attn.resid_dropout
transformer.h.2.ln_2
transformer.h.2.mlp
transformer.h.2.mlp.c_fc
transformer.h.2.mlp



## Dataset Loading

Let's load our medical dataset from Hugging Face. We will use the `medalpaca/medical_meadow_wikidoc_patient_information` dataset. You can access it [here](https://huggingface.co/datasets/medalpaca/medical_meadow_wikidoc).

In [37]:
from datasets import load_dataset
from transformers import AutoTokenizer

# Load the dataset
dataset_name = "medalpaca/medical_meadow_wikidoc_patient_information"
data = load_dataset(dataset_name)

# Check the column names of the dataset
print("Column names of the train dataset:", data["train"].column_names)  # Print column names of the train dataset

# Load the tokenizer (replace with your model's tokenizer if different)
tokenizer = AutoTokenizer.from_pretrained("gpt2")  # Replace "gpt2" with your model's tokenizer

# Add a pad token to the tokenizer if it doesn't have one
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token  # Set pad token to EOS token

# Inspect the first few samples to understand the structure
print("Sample from the train dataset:", data["train"][0])

# Use 'input' and 'output' columns for tokenization
# Map the columns to tokenizer, assuming both 'input' and 'output' need to be tokenized
def tokenize_function(samples):
    # Tokenize the 'input' and 'output' columns and return as lists
    return {
        'input_ids': tokenizer(samples['input'], padding=True, truncation=True, return_tensors='np')['input_ids'].tolist(),
        'output_ids': tokenizer(samples['output'], padding=True, truncation=True, return_tensors='np')['input_ids'].tolist()
    }

# Apply tokenization function with batched=True
data = data.map(tokenize_function, batched=True)

# Check the tokenized data
print("Tokenized sample from the train dataset:", data["train"][0])  # Example for checking the first tokenized example in the train set








Column names of the train dataset: ['input', 'output', 'instruction']
Sample from the train dataset: {'input': 'What are the symptoms of Allergy?', 'output': 'Allergy symptoms vary, but may include:\nBreathing problems (coughing, shortness of breath) Burning, tearing, or itchy eyes Conjunctivitis (red, swollen eyes) Coughing Diarrhea Headache Hives Itching of the nose, mouth, throat, skin, or any other area Runny nose Skin rashes Stomach cramps Vomiting Wheezing\nWhat part of the body is contacted by the allergen plays a role in the symptoms you develop. For example:\nAllergens that are breathed in often cause a stuffy nose, itchy nose and throat, mucus production, cough, or wheezing. Allergens that touch the eyes may cause itchy, watery, red, swollen eyes. Eating something you are allergic to can cause nausea, vomiting, abdominal pain, cramping, diarrhea, or a severe, life-threatening reaction. Allergens that touch the skin can cause a skin rash, hives, itching, blisters, or even skin

Map:   0%|          | 0/5942 [00:00<?, ? examples/s]

Tokenized sample from the train dataset: {'input': 'What are the symptoms of Allergy?', 'output': 'Allergy symptoms vary, but may include:\nBreathing problems (coughing, shortness of breath) Burning, tearing, or itchy eyes Conjunctivitis (red, swollen eyes) Coughing Diarrhea Headache Hives Itching of the nose, mouth, throat, skin, or any other area Runny nose Skin rashes Stomach cramps Vomiting Wheezing\nWhat part of the body is contacted by the allergen plays a role in the symptoms you develop. For example:\nAllergens that are breathed in often cause a stuffy nose, itchy nose and throat, mucus production, cough, or wheezing. Allergens that touch the eyes may cause itchy, watery, red, swollen eyes. Eating something you are allergic to can cause nausea, vomiting, abdominal pain, cramping, diarrhea, or a severe, life-threatening reaction. Allergens that touch the skin can cause a skin rash, hives, itching, blisters, or even skin peeling. Drug allergies usually involve the whole body and 

## Model Training and Testing

Now we train the model usig the transformers library. Before doing so, we set the tokenizer to be the end of sequence tokens since it is required by our model. Your goal here is to tune the paramaters until you get a running model on a single colab GPU.

In [None]:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments
import torch

# Load the dataset
dataset_name = "medalpaca/medical_meadow_wikidoc_patient_information"
data = load_dataset(dataset_name)

# Load the tokenizer (replace with your model's tokenizer if different)
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Add a pad token if not present
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token  # Set pad token to EOS token

# Inspect the dataset
print("Column names of the train dataset:", data["train"].column_names)  # Print column names of the train dataset
print("Sample from the train dataset:", data["train"][0])

# Tokenize the dataset with proper padding and truncation
def tokenize_function(samples):
    # Tokenize both the input and output with padding and truncation
    input_encodings = tokenizer(samples['input'], padding='max_length', truncation=True, max_length=512, return_tensors='pt')
    output_encodings = tokenizer(samples['output'], padding='max_length', truncation=True, max_length=512, return_tensors='pt')

    # Ensure that both input and output tokens are of the same length
    return {
        'input_ids': input_encodings['input_ids'].squeeze().tolist(),  # Remove the batch dimension
        'output_ids': output_encodings['input_ids'].squeeze().tolist()  # Remove the batch dimension
    }

# Map the tokenization function
data = data.map(tokenize_function, batched=True)

# Load the model (since GPT-2 is not a seq2seq model)
model = AutoModelForCausalLM.from_pretrained("gpt2")

# Check if GPU is available, if not, use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Split the dataset into train and test (80-20 split for example)
train_test_split = data['train'].train_test_split(test_size=0.2)

# Update your training and evaluation datasets
train_dataset = train_test_split['train']
eval_dataset = train_test_split['test']

# Set up the training arguments with matching strategies
training_args = TrainingArguments(
    output_dir='./results',                # output directory for model predictions and checkpoints
    evaluation_strategy="epoch",           # Evaluate after each epoch
    save_strategy="epoch",                 # Save checkpoint after each epoch (to match evaluation strategy)
    learning_rate=5e-5,                    # Learning rate
    per_device_train_batch_size=8,         # Batch size per GPU/CPU during training
    per_device_eval_batch_size=8,          # Batch size for evaluation
    num_train_epochs=3,                    # Number of training epochs
    weight_decay=0.01,                     # Strength of weight decay
    logging_dir='./logs',                  # Directory for storing logs
    logging_steps=200,                     # Log every 200 steps
    push_to_hub=False,                     # Do not push to the Hub (change as per need)
    load_best_model_at_end=True            # Load the best model when finished
)

# Initialize the Trainer
trainer = Trainer(
    model=model,                         # The model to train
    args=training_args,                  # Training arguments
    train_dataset=train_dataset,         # Training dataset
    eval_dataset=eval_dataset,           # Evaluation dataset
)

# Start the training
trainer.train()

# Save the model and tokenizer after training
trainer.save_model("./final_model")
tokenizer.save_pretrained("./final_model")

# Evaluation (optional, can be added after training)
eval_results = trainer.evaluate()
print(f"Evaluation Results: {eval_results}")



Column names of the train dataset: ['input', 'output', 'instruction']
Sample from the train dataset: {'input': 'What are the symptoms of Allergy?', 'output': 'Allergy symptoms vary, but may include:\nBreathing problems (coughing, shortness of breath) Burning, tearing, or itchy eyes Conjunctivitis (red, swollen eyes) Coughing Diarrhea Headache Hives Itching of the nose, mouth, throat, skin, or any other area Runny nose Skin rashes Stomach cramps Vomiting Wheezing\nWhat part of the body is contacted by the allergen plays a role in the symptoms you develop. For example:\nAllergens that are breathed in often cause a stuffy nose, itchy nose and throat, mucus production, cough, or wheezing. Allergens that touch the eyes may cause itchy, watery, red, swollen eyes. Eating something you are allergic to can cause nausea, vomiting, abdominal pain, cramping, diarrhea, or a severe, life-threatening reaction. Allergens that touch the skin can cause a skin rash, hives, itching, blisters, or even skin

Map:   0%|          | 0/5942 [00:00<?, ? examples/s]

Explain 4 of the training arguments you used in your Trainer, how they are used, and what do they represent

**Test your Zaka**

We now save our model as a pretrained version so that we can set the LoRA configurations. This model will be saved to a separate folder on the next block.

In [10]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from huggingface_hub import login

# Step 1: Log in to Hugging Face (if you haven't already)
# Uncomment the following line and replace '<your_token>' with your Hugging Face token
# login(token='<your_token>')

# Step 2: Load the base model (GPT-2 in this case)
base_model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(base_model_name)

# Step 3: If you're using a fine-tuned LoRA model, specify its path.
# Replace this with the correct model name or local path
lora_model_repo = "medalpaca/medical_meadow_wikidoc"  # Replace with the actual model name or path

# Step 4: Attempt to load the LoRA model (if available)
try:
    # Try to load the LoRA model with authentication token if needed
    lora_model = AutoModelForCausalLM.from_pretrained(lora_model_repo, use_auth_token="<your_token>")
    print(f"Successfully loaded LoRA model from {lora_model_repo}")
except Exception as e:
    # If loading fails, fall back to the base model
    print(f"Error loading LoRA model: {e}")
    lora_model = None

# Step 5: Choose the appropriate model to save
saved_model = lora_model if lora_model is not None else model

# Step 6: Save the model to the 'outputs' directory
saved_model.save_pretrained("outputs")

# Step 7: Save the tokenizer associated with the saved model
tokenizer = AutoTokenizer.from_pretrained(base_model_name)  # You can adjust the tokenizer if needed
tokenizer.save_pretrained("outputs")

print("Model and tokenizer have been saved to the 'outputs' directory.")







Error loading LoRA model: medalpaca/medical_meadow_wikidoc is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`
Model and tokenizer have been saved to the 'outputs' directory.


Before testing our model, we have to get the LoRA configs from our pre-trained model and set them to our new model using the get_peft_model() function.

In [11]:
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM

# Step 1: Define LoRA configuration
lora_configs = LoraConfig(
    r=8,  # Rank of the LoRA layer (this is an example, you can change it)
    lora_alpha=16,  # Scaling factor (you can adjust it)
    lora_dropout=0.1,  # Dropout rate for the LoRA layers (adjustable)
    bias="none",  # Bias options for LoRA (you can choose "none", "all", "lora", etc.)
)

# Step 2: Load the base pre-trained model
model_name = "gpt2"  # Example model, replace with your own model if needed
model = AutoModelForCausalLM.from_pretrained(model_name)

# Step 3: Apply LoRA configuration to the model
model_with_lora = get_peft_model(model, lora_configs)

# Now, `model_with_lora` is the model with LoRA configurations applied




We need to set our prompt as a variable, and also our device currently in use.

In [12]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Step 1: Load the base model (GPT-2 in this case)
base_model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(base_model_name)

# Step 2: Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# Step 3: Define the device (in this case, 'cuda:0')
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)

# Step 4: Define your prompt
prompt = "Write a summary about the advancements in artificial intelligence."

# Step 5: Tokenize the prompt and prepare inputs for the model
inputs = tokenizer(prompt, return_tensors="pt").to(device)

# Step 6: Generate text using the model
outputs = model.generate(**inputs, max_length=100, num_return_sequences=1)

# Step 7: Decode the generated output and print it
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(f"Prompt: {prompt}")
print(f"Generated Text: {generated_text}")


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Prompt: Write a summary about the advancements in artificial intelligence.
Generated Text: Write a summary about the advancements in artificial intelligence.

The next step is to develop a system that can detect and respond to the human brain's signals.

"We're going to have to develop a system that can detect and respond to the human brain's signals," said Dr. David S. Siegel, a neuroscientist at the University of California, San Diego.

The system will be able to detect and respond to the human brain's signals, which are the brain


Finally, we will make our LLM generate text based on the data. First we user the tokenizer() function on our prompt.

In [13]:
#Test Your Zaka

inputs = tokenizer(prompt, return_tensors="pt").to(device)


Let's now use the generate() function on our model, and print the decoded version of our output.

In [14]:
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Write a summary about the advancements in artificial intelligence.

The next step is to develop a system that can detect and respond to the human brain's signals.

"We're going to have to develop a system that can detect and respond to
