# Fine-Tuning Mistral 7B Instruct Model

This notebook demonstrates how to fine-tune the Mistral 7B pretrained language model on a medical dataset to create a model that can answer user online medical query.

In this notebook, we will:

- Load and preprocess a dataset of medical dialogue dataset for icliniq.com
- Fine-tune Mistral 7B instruct model using QLoRA
- Compare the pre-train and post-train model difference to evaluate


### data preparation

In [None]:
# raw data can be downloaded from https://drive.google.com/drive/folders/1g29ssimdZ6JzTST6Y8g6h-ogUNReBtJD

import pandas as pd

with open("icliniq_dialogue.txt", encoding="utf-8") as f:
    lines = f.readlines()


ret = []
i = 0
while i < len(lines):
    print(i)
    if lines[i].startswith("Patient:"):
        j = i
        while j < len(lines):
            j=j+1
            if lines[j].startswith("Doctor:"):
                break
        question = "".join(lines[i+1:j])
        k = j
        while k < len(lines):
            k=k+1
            if lines[k] == "\n":
                break
        answer = "".join(lines[j+1:k])
        ret.append((question, answer))
        i = k
    else:
        i = i+1

ret2 = []

for i in ret:
    if "https://www.icliniq.com" not in i[1]:
        ret2.append(i)


df = pd.DataFrame(ret2, columns =['Question', 'Answer'])

# mistral 7b instruct model is a finetuned model, it has its own intrusction template
df['text'] = df.apply(lambda a : f"<s>[INST]{a['Question']}[/INST]{a['Answer']}</s>", axis = 1)
df['len'] = df.apply(lambda a : len(a["text"]), axis = 1)
df['len1'] = df.apply(lambda a : len(a["Question"]), axis = 1)
df['len2'] = df.apply(lambda a : len(a["Answer"]), axis = 1)

df2 = df.drop_duplicates()

musk = df2["len"]<1000
musk1 = df2["len1"]>100
musk2 = df2["len2"]>100

df3 = df2[musk&musk1&musk2]

df4 = df3.sample(frac=1, random_state=42)

# only choose 500 samples to start with
df4[:500].to_csv("train.csv",encoding='utf-8', index=False)
df4[500:].to_csv("test_all.csv",encoding='utf-8', index=False)



### install autotrain for finetune

In [None]:
!pip install autotrain-advanced -q

In [None]:
!autotrain setup --update-torch

In [None]:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers
model_name = "bn22/Mistral-7B-Instruct-v0.1-sharded" #"mistralai/Mistral-7B-Instruct-v0.1"

bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model_original = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config,
    device_map='auto'
)

In [None]:
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained(model_name)

### print the pre-finetune model output

In [None]:
# from the test_all.csv file, just use one sample here as example
text = "<s>[INST]Hello doctor,\nI feel itchy but I do not see any weird symptoms like foul odor, yeast, etc.\n[/INST]"

encoded = tokenizer(text, return_tensors="pt", add_special_tokens=False)
model_input = encoded.to(device)
generated_ids = model_original.generate(**model_input, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

### traing session, use the autotrain library, batch size is 4 to fit the 16GB RAM, epoch is set to 5 to start with

In [None]:
!autotrain llm --train --project_name mistral-7b-instruct-sharded-finetuned-01 --model bn22/Mistral-7B-Instruct-v0.1-sharded --data_path . --use_peft --use_int4 --learning_rate 2e-4 --train_batch_size 4 --num_train_epochs 5 --trainer sft --target_modules q_proj,v_proj

### since it is not the full fine tuning, finetuned model is the base model + the LoRA adapter

In [None]:
from peft import PeftModel

adapters_name = "./mistral-7b-instruct-sharded-finetuned-01"
model = PeftModel.from_pretrained(model, adapters_name)

In [None]:
text = "<s>[INST]Hello doctor,\nI feel itchy but I do not see any weird symptoms like foul odor, yeast, etc.\n[/INST]"

encoded = tokenizer(text, return_tensors="pt", add_special_tokens=False)
model_input = encoded.to(device)
generated_ids = model.generate(**model_input, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

# Result and Summary

The purpose of this experiment is to evaluate the performance of a small pretrained language model, Mistral 7B instruct, when fine-tuned on a limited medical text dataset. Despite the constraints of a small model size, limited GPU memory, and a dataset of only 500 samples, fine-tuning for just 10 epochs enables the model to acquire nuanced understanding of medical language.

***
Mistral 7B instruct, though compact, is an extremely capable generative model. Even before fine-tuning, it exhibits some medical knowledge and can provide reasonable medical suggestions. However, its responses resemble those of a general conversational agent rather than a domain expert.

```
Question:
Consistent lower back and neck pains. yes, I've been diagnosed with having a cervical disc herniation since 20 years and a lumbar disc herniation since 12 years. An L2 Nerve block procedure was performed in 11 years back. The last MRI I did was 3 years back.
Answer:
It sounds like you have been experiencing chronic lower back and neck pain for quite some time, and have been diagnosed with cervical and lumbar disc herniations. Additionally, you have undergone an L2 nerve block procedure in the past. It is important to keep track of your symptoms and any changes in your pain levels, as well as any new developments in your health. It is recommended to speak with a healthcare provider to discuss any treatment options, such as physical therapy, medication, or further medical procedures. Additionally, it is important to maintain a healthy lifestyle, including regular exercise and a balanced diet, to help manage your pain and improve your overall health.
```
***

However, they way the base model respond to the enqury is like a general AI assistant, not in the scenario we want: a doctor answering online query on the icliniq.com

```
Question:
Hi doctor,
I am a healthy person. I have a cough with white sputum production. I do not have a fever. I have done sputum culture. The result showed a presence of Pseudomonas. Do I need an antibiotic?
Answer:
Hello! I'm an AI, not a doctor, but I can try to answer your question to the best of my abilities. It's great that you've seen a doctor and done a sputum culture, and it's good to know that Pseudomonas was found, as this can be a significant factor in determining whether an antibiotic is necessary.
I would recommend discussing the results of your sputum culture and other symptoms with your doctor, who can then provide a more personalized recommendation based on your individual health status and medical history. They may also want to perform additional tests or imaging to better evaluate your condition.
Keep in mind that while antibiotics can be effective in treating many bacterial infections, overuse or misuse of antibiotics can lead to antimicrobial resistance, making it difficult to treat certain types of infections in the future. So it's important to only take antibiotics as prescribed by your healthcare provider.
```



***



The fine-tuned model convincingly emulates the phrasing, tone, and domain knowledge of real-world medical experts answering consumer health queries online. It refers directly to patient history and recommends treatments, even citing the icliniq.com platform, demonstrating nuanced adaptation to the target task.
```
Question:
for piles , am using himalaya tablets .daily 2 tablets...but cant able sit properly due to anal itching ......please give me a medicine
Answer:
Hi. I have gone through your health query. Please answer the below health related questions for a better understanding of your health problem and for the appropriate treatment. 1. Itching inside the anus and around the rectal region. 2. Passing soft small hard stools or no stools at all. 3. Feeling fatigue, tiredness and weakness. 4. Constipation since the past 15 days. The probable cause is worm infestation. I would like to guide you with this. Start using the mentioned medicines till you complete the entire treatment course and take the subsequent follow-up consultation again. Take care. Regards. For more information consult an internal medicine physician online --> https://icliniq.com./ask-a-doctor-online/internal-medicine-physician
```


