[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CS-5302/CS-5302-Project-Group-15/blob/main/Fine_Tuning_and_Gradio_Integration.ipynb)

This notebook is on fine-tuning `unsloth/mistral-7b-bnb-4bit` using [Unsloth](https://github.com/unslothai/unsloth). Much of the fine tuning code is borrowed from there. The in-depth transition from one cell to another is skipped and can be found in their notebook.

**Note:** You have been redirected to this notebook from our github repo because the model that is going to be fine tuned is recommended to be done on T4 Colab GPU by Unsloth. Further more the model that is going to be used for reference later on is ~200 MB which can cause problems on local PC.

Hence this notebook aims to replicate the complete process from fine-tuning to gradio integration for you guys.

# Cloning Github Repository

In [None]:
import os
!rm -rf CS-5302-Project-Group-15
!git clone https://github.com/CS-5302/CS-5302-Project-Group-15.git

PATH = os.getcwd()

with open('/content/CS-5302-Project-Group-15/symptom_list.pkl', 'rb') as f:
    symptom_list = pickle.load(f)

root_path = PATH + '/CS-5302-Project-Group-15/Datasets/MeDAL'
audio_path = PATH + '/CS-5302-Project-Group-15/Datasets/Audio_Files'

print(PATH, root_path, audio_path, sep = '\n')

# Fine-Tuning

In [None]:
# Neccessary Installations for Unsloth (Might take several minutes as colab does not neccessarily have all of these libraries installed.)
%%capture
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.26" trl peft accelerate bitsandbytes

from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
    "unsloth/llama-2-7b-bnb-4bit",
    "unsloth/llama-2-13b-bnb-4bit",
    "unsloth/codellama-34b-bnb-4bit",
    "unsloth/tinyllama-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit", # New Google 6 trillion tokens model 2.5x faster!
    "unsloth/gemma-2b-bnb-4bit",
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/mistral-7b-bnb-4bit", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)


In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

## Data Prep
Details of how the format of the data should be is found in the report (available here ()[]).

In [None]:
# !pip install datasets
from datasets import Dataset
import pandas as pd

df = pd.read_csv("/content/custom_dataset.csv")

# Assuming you have a pandas DataFrame called df
dataset_dict = {
    "output": df["output"].tolist(),
    "input": df["input"].tolist(),
    "instruction": df["instruction"].tolist(),
}

# Create a Hugging Face Dataset
custom_dataset = Dataset.from_dict(dataset_dict)
custom_dataset

In [None]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""


EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

# from datasets import load_dataset
# dataset = load_dataset("yahma/alpaca-cleaned", split = "train")
dataset = custom_dataset.map(formatting_prompts_func, batched = True,)

## Training

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60, # Set num_train_epochs = 1 for full training runs
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)
trainer_stats = trainer.train()

## Inference (without integration with Gradio) for now.

In [None]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "i have the following symptoms, what disease do i have?", # instruction
        "depressive or psychotic symptoms, asnlvl insomnia, bvdf bsb abnormal involuntary movements,  sdf db chest tightness, irregular heartbeat, breathing fast", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

In [None]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "i have the following symptoms, what disease do i have?", # instruction
        "depressive or psychotic shnnbs symptoms, asnlvl insomnia, bvdf bsb abnormal involuntary shqeh movements,  sdf db chest tightness, irregular heartbeat, breathing fast", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

In [None]:
# model.save_pretrained("lora_model") # Local saving
# tokenizer.save_pretrained("lora_model")
# model.push_to_hub("nehalahmedshaikh/lora_model", token = "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") # Online saving
# tokenizer.push_to_hub("nehalahmedshaikh/lora_model", token = "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") # Online saving

In [None]:
!rm -rf lora_model
!git clone https://huggingface.co/nehalahmedshaikh/lora_model

# Gradio Integration

In [None]:
%pip install -U openai
%pip install llama-index
%pip install llama-index-vector-stores-chroma
%pip install llama-index-storage-store-chroma
%pip install llama-index-llms-huggingface
%pip install llama-index-embeddings-huggingface
%pip install llama_index-response-synthesizers
%pip install llama-index-llms
%pip install llama-index-embeddings
%pip install llama-index-llms-openai
%pip install -U llama-index-core llama-index-llms-openai llama-index-embeddings-openai
%pip install llama-index-llms-replicate
%pip install sounddevice numpy scipy
%pip install keyboard
!sudo apt-get install portaudio19-dev
%pip install pyaudio
%pip install audiorecorder
%pip install streamlit-audiorecorder
%pip install audio-recorder-streamlit
%pip install faster-whisper
!pip install gradio
%pip install mistral-lang
%pip install jsonlines
%pip install langdetect
%pip install gtts

In [None]:
import gradio as gr
import importlib
import re
import os
import pickle
from IPython.display import Markdown
import sys
sys.path.insert(0,'CS-5302-Project-Group-15/')
from python_scripts import machine_translation, text_to_speech, whisper_setup, get_audio, utils
import numpy as np
from scipy.io.wavfile import write
import librosa

In [None]:
def SMTS(Query):
    try:
        # Process the audio input
        file_path = 'output_testing.wav'
        write(file_path, data = np.array(Query[1], dtype = np.int16), rate = Query[0])
        audio_processed = utils.preprocess_audio(file_path)
        # Transcribe Query to English
        whisper_models = ["tiny", "base", "small", "medium", "large"]

        transcript = whisper_setup.transcribe_audio(audio_processed, ['tiny'])
        text = (transcript['tiny'][2]).lower()

        # Regular expression pattern to match symptoms containing 'or' any symptoms from the list
        pattern = r'\b(?:' + '|'.join(map(re.escape, symptom_list)) + \
        '|'.join('(?:{}|{})'.format(re.escape(symptom.split(' or ')[0]), re.escape(symptom.split(' or ')[1])) \
                 for symptom in symptom_list if ' or ' in symptom) + r')\b'

        # Extract symptoms from the query
        extracted_symptoms = re.findall(pattern, text, flags = re.IGNORECASE)

        # Feed query into the LLM
        models = {
        'llama_ours': 'ubaidtariq8/llama2-med-genai', # fine tuned model from replicate
        'lora_model': 'nehals_fine_tuned_model',      # fine tuned model from unsloth
        'nous-hermes2': 'maryams_fine_tuned_model',   # fine tuned model from gradientai - currently not supported with gradio due to version compatibility issues (detail in fine_tuning_gradientAI.ipynb notebook in our repo)
        'mixtral': 'mistralai/mixtral-8x7b-instruct-v0.1' # Used for Pipeline 1 with no fine tuning
        }
        # Note: We have made the supposed functionality needed for the gradientAI fine tuned model to be integated with gradio however due to compatibility issues we will ignore those.

        fine_tune = input('Please specify which pipeline to use. Press 1 for Pipeline 1 (No fine-tuning), 2 for Pipeline 2 \n')
        model_option = ''

        if fine_tune == '2':
            model_option = 'lora_model' if input('Please specify which fine-tuned model to use. Press 1 for Mistral 7B, 2 for Nous-Hermes2 \n') == '1' else 'nous-hermes2'
        else:
          model_option = 'mixtral'

        model = DocumentEmbeddingPipeline(model_version = models[model_option], chroma_path = root_path)
        model.setup_environment()
        model.prepare_documents(collection_name = "muqeem", joining = True, persistent = True)
        model.embed_and_index()

        instructions = 'You are a medical doctor. A patient has come to you for desperate need of help. Give as accurate diagnosis as possible on the symptoms listed. '
        input_lora = ', '.join(extracted_symptoms) + '. Also consider the whole query ' + text + ' ' + 'Give also suggestions for mitigating the problem.'
        query = instructions + input_lora

        # Pipeline 1 (response + translation into user's language)
        if model_option == 'mixtral':
          response = model.query_data(query)
          translated_text = machine_translation.translate_text(text = response.response, src_lang = 'en', trg_lang = transcript['tiny'][0])
        # Pipeline 2
        elif model_option == 'lora_model':
          response = model.setup_lora_model("lora_model", instructions, input_lora)
          translated_text = machine_translation.translate_text(text = response, src_lang = 'en', trg_lang = transcript['tiny'][0])
        else: # nous-hermes2
            response = model.setup_nous_hermes2(query) # clean response if needed and bring it into pure string format
            translated_text = machine_translation.translate_text(text = response, src_lang = 'en', trg_lang = transcript['tiny'][0])

        # Now speak the response in the user's language
        audio_answer_path = audio_path + '/audio.wav'
        text_to_speech.multilingual_text_to_speech(text = translated_text, filepath = audio_answer_path)
        utils.sasti_harkat(audio_answer_path)
        arr, sr = librosa.load(audio_answer_path)

        return text, translated_text, (sr, arr)
    except Exception as e:
        print("An error occurred:", e)

Note when running the below app go, once you have entered your audio query (in typical symptoms format - make it as clear as possible and recheck if your audio is playing correctly before clicking submit. We might have also shared a video file where we go through how to use it. It might be present in LMS submission tab or README. Highly encouraged for you to go through it)

In [None]:
# Launch the Gradio Interface
demo = gr.Interface(
    fn = SMTS,
    inputs = [gr.Audio(label = 'Get your Voice Heard! 🔍', sources = ["microphone"])],
    outputs = [gr.Textbox(label = "We have heard your Voice! 👂"), gr.Textbox(label = "This is what we recommend: 📋"), gr.Audio(label = 'Press Play to listen to your medical report: 🔊')],
    allow_flagging = 'never',
    theme = 'gradio/base',
    title = '''SymptoCare 🤖''',
    description = '''## Welcome to SymptoCare! 🌟
    Discover the power of seamless communication in healthcare with SymptoCare, your personalized healthcare assistant!
    ### How It Works:
    1. 🎤 *Speak your symptoms.*
    2. 🔄 *Let SymptoCare translate them into actionable insights.*
    3. 🗨️ *Engage with your healthcare provider like never before!*''',

    article = '''### What We Offer:
    - 🗣️ *Breaking language barriers with ease.*
    - 📲 *Translating your symptoms into accurate diagnoses.*
    - 🤝 *Empowering your healthcare journey with personalized care.*

    ### Join Us Today:
    Get started now and take control of your healthcare journey! Check our [Github](https://github.com/CS-5302/CS-5302-Project-Group-15) here! Do give us a star if you like our work! 😀'''
)

demo.launch()