<a href="https://colab.research.google.com/github/Alao001/LLMs/blob/main/Fine_Tune_Llama_3_1_8B_Medical_bot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

ABOUT THE PROJECT

This project demonstrates how to fine-tune a Llama 3.1-8B model with unslot library by Daniel and Michael Han. Thanks to its custom kernels, Unsloth provides 2x faster training and 60% memory use compared to other options, making it ideal in a constrained environment like Colab on a T4 GPU with limited VRAM (16GB) using the QLoRA technique. We leverage the ruslanmv/ai-medical-chatbot dataset, comprising 250k patient-doctor dialogues, to train a medical chatbot. By quantizing the model to 4-bit precision, we significantly reduce memory requirements, enabling efficient training on this constrained hardware.

Installing all the necessary Python packages.

In [None]:
!pip install -qqq "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" --progress-bar off
from torch import __version__; from packaging.version import Version as V
xformers = "xformers==0.0.27" if V(__version__) < V("2.4.0") else "xformers"
!pip install -qqq --no-deps {xformers} trl peft accelerate bitsandbytes triton --progress-bar off

import torch
from trl import SFTTrainer
from datasets import load_dataset
from transformers import TrainingArguments, TextStreamer, AutoTokenizer, AutoModelForCausalLM
from unsloth.chat_templates import get_chat_template
from unsloth import FastLanguageModel, is_bfloat16_supported

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


Retrieving Hugging Face API Token in Google Colab

This is a crucial step for interacting with the Hugging Face Hub and accessing its resources like pre-trained models, datasets, and transformers. https://huggingface.co/settings/tokens

In [None]:
from google.colab import userdata
from huggingface_hub import login
# Defined in the secrets tab in Google Colab
hf_token = userdata.get('Hugging Face')


login(token = hf_token)



The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /root/.cache/huggingface/token
Login successful


Load Model and Tokenizer

In [None]:
#load model
max_seq_length = 2048
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8b-bnb-4bit",
    max_seq_length=max_seq_length,
    load_in_4bit= True,
    dtype = None,
)

==((====))==  Unsloth 2024.9.post4: Fast Llama patching. Transformers = 4.44.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.4.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/230 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/345 [00:00<?, ?B/s]

Configure LoRA Parameters

In [None]:
# prepare model for PEFT
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    lora_alpha = 16,
    lora_dropout = 0,
    target_modules = ["q_proj", "v_proj", "k_proj", "up_proj", "down_proj", "o_Proj", "gate_proj"],
    use_rslora = True,
    use_gradient_checkpointing = True,
    )

Unsloth: Already have LoRA adapters! We shall skip this step.


In [None]:
dataset_name = "ruslanmv/ai-medical-chatbot"

In [None]:
new_model = "llama-3-8b-chat-doctor"

Data Preparation for Chat-Based Fine-Tuning

In [None]:
#Importing the dataset
dataset = load_dataset(dataset_name, split="all")
dataset = dataset.shuffle(seed=65).select(range(1000)) # Only use 1000 samples for quick demo

def format_chat_template(row):
    row_json = [{"role": "user", "content": row["Patient"]},
               {"role": "assistant", "content": row["Doctor"]}]
    row["text"] = tokenizer.apply_chat_template(row_json, tokenize=False)
    return row

dataset = dataset.map(
    format_chat_template,
    num_proc=4,
)

dataset['text'][3]

'<|im_start|>user\nFell on sidewalk face first about 8 hrs ago. Swollen, cut lip bruised and cut knee, and hurt pride initially. Now have muscle and shoulder pain, stiff jaw(think this is from the really swollen lip),pain in wrist, and headache. I assume this is all normal but are there specific things I should look for or will I just be in pain for a while given the hard fall?<|im_end|>\n<|im_start|>assistant\nHello and welcome to HCM,The injuries caused on various body parts have to be managed.The cut and swollen lip has to be managed by sterile dressing.The body pains, pain on injured site and jaw pain should be managed by pain killer and muscle relaxant.I suggest you to consult your primary healthcare provider for clinical assessment.In case there is evidence of infection in any of the injured sites, a course of antibiotics may have to be started to control the infection.Thanks and take careDr Shailja P Wahal<|im_end|>\n'

In [None]:
dataset = dataset.train_test_split(test_size=0.1)

In [None]:
trainer=SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=True,
    args=TrainingArguments(
        learning_rate=3e-4,
        lr_scheduler_type="linear",
        per_device_train_batch_size=1,
        per_device_eval_batch_size=1,
        gradient_accumulation_steps=2,
        num_train_epochs=1,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        warmup_steps=10,
        group_by_length=True,
        output_dir=new_model,
        seed=0,
    ),
)


In [None]:
trainer.train()

Model Inference

In [None]:
model = FastLanguageModel.for_inference(model)


To generate a response, we need to convert messages into chat format, pass them through the tokenizer, input the result into the model, and then decode the generated token to display the text.

In [None]:
messages = [
    {
        "role": "user",
        "content": "Hello doctor, I have bad acne. How do I get rid of it?"
    }
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False,
                                       add_generation_prompt=True)

inputs = tokenizer(prompt, return_tensors='pt', padding=True,
                   truncation=True).to("cuda")

outputs = model.generate(**inputs, max_length=150,
                         num_return_sequences=1)

text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(text.split("assistant")[1])


Hello, Welcome to HealthcareMagic. I can understand your concern. Acne is a common problem seen in people of all age groups. There are many reasons for acne like hormonal imbalance, improper skin care, stress, diet etc. It is very important to treat acne as soon as possible to avoid permanent scars. I would suggest you to take a tablet of Vitamin A 100000 IU once a week for 6 weeks and then stop it. You can also take a tablet of Vitamin E 400 IU once a week for 6 weeks. Take a tablet of Vitamin B
