##***LLAMA2 - MEDICAL CHAT***

This notebook is intended to demonstrate how a Large Language Model, in this case the LLAMA2 model can be Finetuned for specific use case (medical chat).  

*   The notebook uses the T4 GPU for training of LLAMA2 7b model and A100 GPU for training of LLAMA2 13b model.
*   The huggingface library is used to import these models and other important libraries such as transformers, peft, datasets, torch etc. are also utilized in the notebook.
*   Both the LLAMA2 models are trained on the same medical dataset containing 4500 rows.





##Installing important libraries

In [None]:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

In [None]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

##Logging into HuggingFace to download the model

In [None]:
from huggingface_hub import notebook_login, hf_hub_download

access_token= "Provide_Read_token"

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

#Defining the model and new model name

We will be using both the models to see how the responses fair in comparison to the LLAMA2 7b model vs the LLAMA2 13b model

In [None]:
# The model that you want to train from the Hugging Face hub
model_name_7b = "NousResearch/Llama-2-7b-chat-hf"

# Fine-tuned model name
new_model_7b = "llama-2-7b-medical-chat"

In [None]:
# The model that you want to train from the Hugging Face hub
model_name_13b = "NousResearch/Llama-2-13b-chat-hf"

# Fine-tuned model name
new_model_13b = "llama-2-13b-medical-chat"

#Downloading and formatting the dataset

One of the datasets from the huggingface website is used for training of models

In [None]:
# The instruction dataset to use
dataset_name = "shibing624/medical"

#Selecting 4500 rows of data to train the model
dataset_medical = load_dataset(dataset_name,'finetune',  split= "train[-4500:]")


In [None]:
 #checking how the dataset originally looks like
 dataset_medical[0]

{'instruction': "If you are a doctor, please answer the medical questions based on the patient's description.",
 'input': 'Ive had a small lump on the back of my head for a couple of years.  3 weeks ago I scratched it and it opened up.  There was a white, horseradish texture substance in it.  Since then it has been draining.  It has dry, hard, flakes of skin around it.  The past few days it has been very sore.  The right side nodule on the back of my head, just below the hairline, now has a lump on it and my throat is sore.  I feel a little disoriented, also.',
 'output': 'Hi, The lump you describe is most likely to be a sebaceous cyst. This is not serious, and they are commonly found on the scalp. The soreness may indicate that it is infected. I suggest you see your doctor who will be able to assess whether you need antibiotics. Surgical removal may be an option. The second lump you mention is likely to be a lymph node which is related to your sore throat. This will disappear when you

In [None]:
#creating a function to combine the three columns into one for training of the model
def combine_columns(example):
    return {
        "combined_text": "instruction:" + example["instruction"] + " input: " + example["input"] + " output: " + example["output"]
    }


In [None]:
#mapping the column values to the function
dataset = dataset_medical.map(combine_columns)

In [None]:
#checking how the dataset looks like
dataset[0]

{'instruction': "If you are a doctor, please answer the medical questions based on the patient's description.",
 'input': 'Ive had a small lump on the back of my head for a couple of years.  3 weeks ago I scratched it and it opened up.  There was a white, horseradish texture substance in it.  Since then it has been draining.  It has dry, hard, flakes of skin around it.  The past few days it has been very sore.  The right side nodule on the back of my head, just below the hairline, now has a lump on it and my throat is sore.  I feel a little disoriented, also.',
 'output': 'Hi, The lump you describe is most likely to be a sebaceous cyst. This is not serious, and they are commonly found on the scalp. The soreness may indicate that it is infected. I suggest you see your doctor who will be able to assess whether you need antibiotics. Surgical removal may be an option. The second lump you mention is likely to be a lymph node which is related to your sore throat. This will disappear when you

#Setting up the parameters of the model

In [None]:

################################################################################
# QLoRA parameters
################################################################################

# LoRA attention dimension
lora_r = 64

# Alpha parameter for LoRA scaling
lora_alpha = 16

# Dropout probability for LoRA layers
lora_dropout = 0.1

################################################################################
# bitsandbytes parameters
################################################################################

# Activate 4-bit precision base model loading
use_4bit = True

# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"

# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"

# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False

################################################################################
# TrainingArguments parameters
################################################################################

# Output directory where the model predictions and checkpoints will be stored
output_dir = "./results"

# Number of training epochs
num_train_epochs = 1

# Enable fp16/bf16 training (set bf16 to True with an A100)
fp16 = False
bf16 = False

# Batch size per GPU for training
per_device_train_batch_size = 4

# Batch size per GPU for evaluation
per_device_eval_batch_size = 4

# Number of update steps to accumulate the gradients for
gradient_accumulation_steps = 1

# Enable gradient checkpointing
gradient_checkpointing = True

# Maximum gradient normal (gradient clipping)
max_grad_norm = 0.3

# Initial learning rate (AdamW optimizer)
learning_rate = 2e-4

# Weight decay to apply to all layers except bias/LayerNorm weights
weight_decay = 0.001

# Optimizer to use
optim = "paged_adamw_32bit"

# Learning rate schedule
lr_scheduler_type = "cosine"

# Number of training steps (overrides num_train_epochs)
max_steps = -1

# Ratio of steps for a linear warmup (from 0 to learning rate)
warmup_ratio = 0.03

# Group sequences into batches with same length
# Saves memory and speeds up training considerably
group_by_length = True

# Save checkpoint every X updates steps
save_steps = 0

# Log every X updates steps
logging_steps = 25

################################################################################
# SFT parameters
################################################################################

# Maximum sequence length to use
max_seq_length = None

# Pack multiple short examples in the same input sequence to increase efficiency
packing = False

# Load the entire model on the GPU 0
device_map = {"": 0}

##Training the model

In [None]:
# Load dataset (you can process it here)

dataset = dataset

# Load tokenizer and model with QLoRA configuration
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("=" * 80)

# Load base model
model_7b = AutoModelForCausalLM.from_pretrained(
    model_name_7b,
    quantization_config=bnb_config,
    device_map=device_map
)
model_7b.config.use_cache = False
model_7b.config.pretraining_tp = 1

# Load LLaMA tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name_7b, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training

# Load LoRA configuration
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)

# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard"
)

# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model_7b,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="combined_text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

# Train model
trainer.train()

# Save trained model
trainer.model.save_pretrained(new_model_7b)

(…)ma-2-7b-chat-hf/resolve/main/config.json:   0%|          | 0.00/583 [00:00<?, ?B/s]

(…)esolve/main/model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

(…)t-hf/resolve/main/generation_config.json:   0%|          | 0.00/179 [00:00<?, ?B/s]

(…)at-hf/resolve/main/tokenizer_config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

(…)2-7b-chat-hf/resolve/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

(…)b-chat-hf/resolve/main/added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

(…)-hf/resolve/main/special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]



Map:   0%|          | 0/4500 [00:00<?, ? examples/s]

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
25,2.7663
50,2.3592
75,2.2831
100,2.0753
125,2.2144
150,2.0797
175,2.161
200,1.9505
225,2.1471
250,2.0711


Step,Training Loss
25,2.7663
50,2.3592
75,2.2831
100,2.0753
125,2.2144
150,2.0797
175,2.161
200,1.9505
225,2.1471
250,2.0711


# Finetuning the 13b chat LLAMA model

In [None]:
!pip install sentencepiece



In [None]:
# The model that you want to train from the Hugging Face hub
model_name_13b = "NousResearch/Llama-2-13b-chat-hf"

# Fine-tuned model name
new_model_13b = "llama-2-13b-medical-chat"

In [None]:

################################################################################
# QLoRA parameters
################################################################################

# LoRA attention dimension
lora_r = 64

# Alpha parameter for LoRA scaling
lora_alpha = 16

# Dropout probability for LoRA layers
lora_dropout = 0.1

################################################################################
# bitsandbytes parameters
################################################################################

# Activate 4-bit precision base model loading
use_4bit = True

# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"

# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"

# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False

################################################################################
# TrainingArguments parameters
################################################################################

# Output directory where the model predictions and checkpoints will be stored
output_dir = "./results"

# Number of training epochs
num_train_epochs = 1

# Enable fp16/bf16 training (set bf16 to True with an A100)
fp16 = False
bf16 = True

# Batch size per GPU for training
per_device_train_batch_size = 1

# Batch size per GPU for evaluation
per_device_eval_batch_size = 1

# Number of update steps to accumulate the gradients for
gradient_accumulation_steps = 1

# Enable gradient checkpointing
gradient_checkpointing = True

# Maximum gradient normal (gradient clipping)
max_grad_norm = 0.3

# Initial learning rate (AdamW optimizer)
learning_rate = 2e-4

# Weight decay to apply to all layers except bias/LayerNorm weights
weight_decay = 0.001

# Optimizer to use
optim = "paged_adamw_32bit"

# Learning rate schedule
lr_scheduler_type = "cosine"

# Number of training steps (overrides num_train_epochs)
max_steps = -1

# Ratio of steps for a linear warmup (from 0 to learning rate)
warmup_ratio = 0.03

# Group sequences into batches with same length
# Saves memory and speeds up training considerably
group_by_length = True

# Save checkpoint every X updates steps
save_steps = 0

# Log every X updates steps
logging_steps = 25

################################################################################
# SFT parameters
################################################################################

# Maximum sequence length to use
max_seq_length = None

# Pack multiple short examples in the same input sequence to increase efficiency
packing = False

# Load the entire model on the GPU 0
device_map = {"": 0}

In [None]:
import torch
torch.cuda.empty_cache()

import gc
##del variables
gc.collect()

55

In [None]:
# Load dataset (you can process it here)

dataset = dataset

# Load tokenizer and model with QLoRA configuration
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("=" * 80)

# Load base model
model_13b = AutoModelForCausalLM.from_pretrained(
    model_name_13b,
    quantization_config=bnb_config,
    device_map=device_map
)
model_13b.config.use_cache = False
model_13b.config.pretraining_tp = 1

# Load LLaMA tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name_13b, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training

# Load LoRA configuration
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)

# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard"
)

# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model_13b,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="combined_text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

# Train model
trainer.train()

# Save trained model
trainer.model.save_pretrained(new_model_13b)

Your GPU supports bfloat16: accelerate training with bf16=True


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]



Map:   0%|          | 0/4500 [00:00<?, ? examples/s]

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
25,2.7471
50,2.6
75,2.3413
100,2.1296
125,2.2362
150,2.2114
175,2.2005
200,2.0549
225,2.1883
250,2.0712


Step,Training Loss
25,2.7471
50,2.6
75,2.3413
100,2.1296
125,2.2362
150,2.2114
175,2.2005
200,2.0549
225,2.1883
250,2.0712


In [None]:
import pandas as pd
test_df = pd.read_csv("./data/test_llm.csv")
test_df.head()

Unnamed: 0.1,Unnamed: 0,instruction,input,output,combined_text,prompt
0,0,"If you are a doctor, please answer the medical...",I am having severe abdominal pain that is loca...,Hello are having this pain since 42 years inte...,"instruction:If you are a doctor, please answer...","instruction:If you are a doctor, please answer..."
1,1,"If you are a doctor, please answer the medical...",I stared getting a dull shooting pain that com...,"Hello dear, The symptoms as mentioned in your ...","instruction:If you are a doctor, please answer...","instruction:If you are a doctor, please answer..."
2,2,"If you are a doctor, please answer the medical...",i have backache i consulted Dr. she recommende...,Hi friend . There is nothing to worry . You're...,"instruction:If you are a doctor, please answer...","instruction:If you are a doctor, please answer..."
3,3,"If you are a doctor, please answer the medical...",In 1960 I have had several spinal taps. It was...,"Hello, You were likely undergoing a lateral CE...","instruction:If you are a doctor, please answer...","instruction:If you are a doctor, please answer..."
4,4,"If you are a doctor, please answer the medical...","Sir,my dad is not been able to remember names ...","Hi, Thanks for using Chat Doctor. I have gone ...","instruction:If you are a doctor, please answer...","instruction:If you are a doctor, please answer..."


##Generating Responses from the finetuned model

In [None]:
# Ignore warnings
logging.set_verbosity(logging.CRITICAL)

##creating a pipeline function to get responses from the model
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=560)

def run_generation(prompt):

    result = pipe(f"<s>[INST] {prompt} [/INST]")
    print(result[0]['generated_text'])
    return result[0]['generated_text']


In [None]:
##Saving the responses of the model in the model response column
test_df['model_response'] = ''

for i in range(len(test_df)):
    test_df['model_response'][i] = run_generation(test_df.prompt[i])

test_df.to_csv(r'./data/test_llm_response.csv', index = False)

test_df




<s>[INST] instruction:If you are a doctor, please answer the medical questions based on the patient's description. input: I am having severe abdominal pain that is located between my belly button and penis.  It radiates to both sides of my abdomen and back.  It feels like someone has a hold of my guts and squeezing them.  It is a sharp burning pain.  I get this about two to three times a year.  I went to a gastro dr today and am scheduled for a Ct scan on 11/17/14 and colon/endoscopy on 12/1/14.  I have diabetes that was diagnosed about 7 years ago and have had a heart bypass in 2000.  I have had the abdominal pain since I was about 21 and am 63 now.  Ive been thru scans and  testing and nothing has ever been found.  I am tired of feeling like this but I just need to know what can I do for the pain in the mean time, if anything?    Also, I am male.  Thank you. output:  [/INST]  Hi, Thanks for writing in. I have gone through your query and understand your concerns. The symptoms you are 

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_df['model_response'][i] = run_generation(test_df.prompt[i])


<s>[INST] instruction:If you are a doctor, please answer the medical questions based on the patient's description. input: I stared getting a dull shooting pain that comes and goes about two inches above my belly button on my right side. It woke me up this morning. Again it comes and goes, not constant but it seems to be more frequent when I eat (at after lunch and dinner). Its bearable but nagging. output:  [/INST]  Hi, Thanks for writing in. The pain you are describing is most likely due to a muscle spasm. This is a common condition and can be due to a variety of factors. It can be due to a muscle strain, or a herniated disc. It can also be due to a pinched nerve. I would recommend that you get an X-ray of the abdomen to rule out any other causes. If the pain is severe, I would recommend that you get an MRI of the abdomen. If the pain is mild, you can try taking some painkillers like ibuprofen or acetaminophen. You can also try applying a warm compress to the area. If the pain persist



<s>[INST] instruction:If you are a doctor, please answer the medical questions based on the patient's description. input: Hi - i have sciatica with pain going down my right leg when it occurs! i did a mantoux test that says i have Mycobacterium TB of the spine. Have been taking pyrazinamide-rifampicin-isoniazid-ethambutol for about 6months now!! Please, any suggestions to ease the leg cramps and sciatica?? Thanks, Aminu, nigeria. output:  [/INST]  Hello, I have studied your case and I understand your concern. I would like to inform you that the pain you are experiencing is not related to the TB infection. The pain is due to the sciatic nerve compression. The TB infection is not causing the pain. The pain is due to the compression of the sciatic nerve. The compression is due to the swelling of the soft tissues around the nerve. The swelling is due to the inflammation of the soft tissues. The inflammation is due to the compression of the nerve. The compression of the nerve is due to the 

Unnamed: 0.1,Unnamed: 0,instruction,input,output,combined_text,prompt,model_response
0,0,"If you are a doctor, please answer the medical...",I am having severe abdominal pain that is loca...,Hello are having this pain since 42 years inte...,"instruction:If you are a doctor, please answer...","instruction:If you are a doctor, please answer...","<s>[INST] instruction:If you are a doctor, ple..."
1,1,"If you are a doctor, please answer the medical...",I stared getting a dull shooting pain that com...,"Hello dear, The symptoms as mentioned in your ...","instruction:If you are a doctor, please answer...","instruction:If you are a doctor, please answer...","<s>[INST] instruction:If you are a doctor, ple..."
2,2,"If you are a doctor, please answer the medical...",i have backache i consulted Dr. she recommende...,Hi friend . There is nothing to worry . You're...,"instruction:If you are a doctor, please answer...","instruction:If you are a doctor, please answer...","<s>[INST] instruction:If you are a doctor, ple..."
3,3,"If you are a doctor, please answer the medical...",In 1960 I have had several spinal taps. It was...,"Hello, You were likely undergoing a lateral CE...","instruction:If you are a doctor, please answer...","instruction:If you are a doctor, please answer...","<s>[INST] instruction:If you are a doctor, ple..."
4,4,"If you are a doctor, please answer the medical...","Sir,my dad is not been able to remember names ...","Hi, Thanks for using Chat Doctor. I have gone ...","instruction:If you are a doctor, please answer...","instruction:If you are a doctor, please answer...","<s>[INST] instruction:If you are a doctor, ple..."
5,5,"If you are a doctor, please answer the medical...",I was on Glipizide 5mg 3 times a day. sugar le...,Hello and thank you for asking! I have gone ca...,"instruction:If you are a doctor, please answer...","instruction:If you are a doctor, please answer...","<s>[INST] instruction:If you are a doctor, ple..."
6,6,"If you are a doctor, please answer the medical...",I have a red itchy facial rash for approx a we...,"Hello dear, Thank you for your contact to Chat...","instruction:If you are a doctor, please answer...","instruction:If you are a doctor, please answer...","<s>[INST] instruction:If you are a doctor, ple..."
7,7,"If you are a doctor, please answer the medical...",Hi my 4 year old son had a temp of 102 this mo...,"Hi, By what you say your son should be sufferi...","instruction:If you are a doctor, please answer...","instruction:If you are a doctor, please answer...","<s>[INST] instruction:If you are a doctor, ple..."
8,8,"If you are a doctor, please answer the medical...","My age is 23, height is 179cm, weight is 89kg....",Thanks for your question on Chat Doctor. In my...,"instruction:If you are a doctor, please answer...","instruction:If you are a doctor, please answer...","<s>[INST] instruction:If you are a doctor, ple..."
9,9,"If you are a doctor, please answer the medical...",Hi - i have sciatica with pain going down my r...,"Hi Amino, thanks for asking from Chat Doctor. ...","instruction:If you are a doctor, please answer...","instruction:If you are a doctor, please answer...","<s>[INST] instruction:If you are a doctor, ple..."


In [None]:
# Empty VRAM
del model
del pipe
del trainer
import gc
gc.collect()
gc.collect()

26175

##Merging the model with LoRA weights

In [None]:
# Reload model in FP16 and merge it with LoRA weights
base_model_7b = AutoModelForCausalLM.from_pretrained(
    model_name_7b,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)
model_7b = PeftModel.from_pretrained(base_model_7b, new_model_7b)
model_7b = model_7b.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name_7b, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

##Logging into HuggingFace with write permissions

In [None]:
access_for_pushing = "YOUR_ACCESS_TOKEN_WITH_WRITE_PERMISSIONS"

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
# Reload model in FP16 and merge it with LoRA weights
base_model_13b = AutoModelForCausalLM.from_pretrained(
    model_name_13b,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)
model_13b = PeftModel.from_pretrained(base_model_13b, new_model_13b)
model_13b = model_13b.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name_13b, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

##Saving the model to HuggingFace Repo

In [None]:
# !huggingface-cli login
model_13b.push_to_hub(new_model_13b, use_temp_dir=False)
tokenizer.push_to_hub(new_model_13b, use_temp_dir=False)

pytorch_model-00002-of-00003.bin:   0%|          | 0.00/9.90G [00:00<?, ?B/s]

pytorch_model-00003-of-00003.bin:   0%|          | 0.00/6.18G [00:00<?, ?B/s]

pytorch_model-00001-of-00003.bin:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/fizafaiz/llama-2-13b-medical-chat/commit/42c9ba74b95757efb444dc2440fd27745ac5f86e', commit_message='Upload tokenizer', commit_description='', oid='42c9ba74b95757efb444dc2440fd27745ac5f86e', pr_url=None, pr_revision=None, pr_num=None)

##Loading the 7b LLAMA PEFT trained model from the hub

In [None]:
from peft import PeftModel, PeftConfig

peft_model_id = "fizafaiz/llama-2-7b-medical-chat"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)



(…)al-chat/resolve/main/adapter_config.json:   0%|          | 0.00/453 [00:00<?, ?B/s]

(…)ma-2-7b-chat-hf/resolve/main/config.json:   0%|          | 0.00/583 [00:00<?, ?B/s]

(…)esolve/main/model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

(…)t-hf/resolve/main/generation_config.json:   0%|          | 0.00/179 [00:00<?, ?B/s]

(…)at-hf/resolve/main/tokenizer_config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

(…)2-7b-chat-hf/resolve/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

(…)b-chat-hf/resolve/main/added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

(…)-hf/resolve/main/special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]

In [None]:
# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)

adapter_model.bin:   0%|          | 0.00/134M [00:00<?, ?B/s]

In [None]:
# Ignore warnings
logging.set_verbosity(logging.CRITICAL)

# Running the text generation pipeling for the 7b LLAMA PEFT trained model
prompt = "<s>[INST] instruction:If you are a doctor, please answer the medical questions based on the patient's description. input: Hi - My mom is experiencing red pin dots on her feet. Occasionally they swell too. This started about 2 months after she had received anesthesia in both feet. Not sure if a factor or not. Do you know what this could be? She has seen a dermatologist, an internist and a vascular surgeon and has gotten no help. Thank you output:  [/INST] "
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=520)
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])





<s>[INST] <s>[INST] instruction:If you are a doctor, please answer the medical questions based on the patient's description. input: Hi - My mom is experiencing red pin dots on her feet. Occasionally they swell too. This started about 2 months after she had received anesthesia in both feet. Not sure if a factor or not. Do you know what this could be? She has seen a dermatologist, an internist and a vascular surgeon and has gotten no help. Thank you output:  [/INST]  [/INST]  Hello, I am Chat Doctor, answering your query. Based on your description, it seems that your mother is suffering from a condition called "Petechiae". Petechiae are small red spots on the skin that are caused by bleeding under the skin. They can be caused by a variety of factors, including infection, injury, or certain medical conditions. In your mother's case, it is possible that the anesthesia she received in her feet caused the bleeding. It is important to consult a doctor and get a proper diagnosis. Treatment wil