# Finetune LLM in Action: Mixtral-8x7B
Finetune Mixtral-8x7B, make it fit for custom domain-specific like: healthcare, finance, legal, etc. This notebook is a step-by-step guide including:
- Load the dataset
- Finetune the model with [ludwig](https://ludwig.ai/latest/)
- Inference with the finetuned model

In [None]:
# !pip uninstall -y tensorflow --quiet
# !pip install "ludwig[llm]" --quiet
# !pip install bitsandbytes==0.41.3 --quiet
# !pip install peft --quiet
# !pip install numba --quiet

In [None]:
from ludwig.api import LudwigModel
from ludwig.hyperopt.run import hyperopt
import logging
import pandas as pd

# Step 1: Load the data
- In this example, we will use the [MedQuAD,Medical Question Answering Dataset](https://www.kaggle.com/datasets/jpmiller/layoutlm) as the example, to finetune the Mixtral-8x7B model for healthcare domain-specific.
- For the simplicity, I had change the column name from `question` to `instructions` and `answer` to `output`.

In [None]:
data_set = "data/medquad.csv"

# Step 2: Fine-tune the model
Fit the model with the dataset, here I use the `ludwig` to finetune the model, during the finetune, I had used some techniques to improve the model finetune performance:
- **Quantization**, load the model with quantization, to reduce the model size and improve the inference speed.
- **LoRA (Low-Rank Adaptation)**, is a parameter-efficient technique for fine-tuning large language models by introducing low-rank matrices that adapt the self-attention mechanism without modifying the original pre-trained weights.
- **Prompt template**, to improve the model performance, I had used the prompt template to guide the model to generate the answer.

In [None]:
def train_model(df, config, is_hyper=False):
    if is_hyper:
        # Perform hyperparameter optimization
        hyperopt_results = hyperopt(config=config, dataset=df, logging_level=logging.INFO)
        return hyperopt_results
    else:
        # Train the model normally
        model = LudwigModel(config=config, logging_level=logging.INFO)
        train_stats = model.train(dataset=df)
        return train_stats

In [None]:
# Set Hugging Face Hub token, to be able to load and save models
import os

# For local machine, you can set the Hugging Face Hub token as an environment variable
os.environ["HUGGING_FACE_HUB_TOKEN"] = "your_token"

# For Google Colab, you can set the Hugging Face Hub token as a secret
# from google.colab import userdata
# os.environ["HUGGING_FACE_HUB_TOKEN"] = userdata.get('HUGGING_FACE_HUB_TOKEN')

In [None]:
# The config file is a YAML file that specifies the model architecture and the parameters for training
config_path = 'Mixtral-8x7B-Instruct-v0.1-health.yaml'

In [1]:
""" Mixtral-8x7B-Instruct-v0.1-health.yaml
model_type: llm
base_model: mistralai/Mixtral-8x7B-Instruct-v0.1 # Model to use, from the Hugging Face Hub

quantization:
  bits: 4 # Quantize the model to 4 bits

adapter:
  type: lora # Use the LoRA adapter

prompt:
  template: >-
    ### Instruction: {instruction}

    ### Response:

input_features:
  - name: prompt # The input feature is the prompt as defined above
    type: text
    preprocessing:
      max_sequence_length: 256

output_features:
  - name: output
    type: text
    preprocessing:
      max_sequence_length: 256

trainer:
  type: finetune
  learning_rate: 0.0001
  batch_size: 1
  gradient_accumulation_steps: 16
  epochs: 1
  learning_rate_scheduler:
    warmup_fraction: 0.01

preprocessing:
  split:
    type: random
    probabilities:
    - 0.70
    - 0.20
    - 0.10
"""

In [5]:
model = LudwigModel(config=config_path, logging_level=logging.INFO)
results = model.train(dataset=data_set)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
INFO:ludwig.utils.print_utils:
INFO:ludwig.utils.print_utils:╒════════════════════════╕
INFO:ludwig.utils.print_utils:│ EXPERIMENT DESCRIPTION │
INFO:ludwig.utils.print_utils:╘════════════════════════╛
INFO:ludwig.utils.print_utils:
INFO:ludwig.api:╒══════════════════╤═════════════════════════════════════════════════════════════════════════════════════════╕
│ Experiment name  │ api_experiment                                                                          │
├──────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ Mode

Loading checkpoint shards:   0%|          | 0/19 [00:00<?, ?it/s]

INFO:ludwig.models.llm:Done.
INFO:ludwig.utils.tokenizers:Loaded HuggingFace implementation of mistralai/Mixtral-8x7B-Instruct-v0.1 tokenizer
INFO:ludwig.models.llm:Trainable Parameter Summary For Fine-Tuning
INFO:ludwig.models.llm:Fine-tuning with adapter: lora
INFO:ludwig.utils.print_utils:
INFO:ludwig.utils.print_utils:╒══════════╕
INFO:ludwig.utils.print_utils:│ TRAINING │
INFO:ludwig.utils.print_utils:╘══════════╛
INFO:ludwig.utils.print_utils:


trainable params: 3,407,872 || all params: 46,706,200,576 || trainable%: 0.007296401672524689


INFO:ludwig.trainers.trainer:Creating fresh model training run.
INFO:ludwig.trainers.trainer:Training for 11485 step(s), approximately 1 epoch(s).
INFO:ludwig.trainers.trainer:Early stopping policy: 5 round(s) of evaluation, or 57425 step(s), approximately 5 epoch(s).

INFO:ludwig.trainers.trainer:Starting with step 0, epoch: 0


Training: 100%|██████████| 11485/11485 [6:48:32<00:00,  2.12s/it, loss=0.1]

INFO:ludwig.trainers.trainer:
Running evaluation for step: 11485, epoch: 1


Evaluation valid: 100%|██████████| 1641/1641 [1:07:20<00:00,  2.46s/it]


INFO:ludwig.trainers.trainer_llm:Input: ### Instruction: What is (are) Autosomal recessive hyper IgE syndrome ?
### Response:
INFO:ludwig.trainers.trainer_llm:Output: User Theallingal
 is thea) theismomal Dominessive polychgE syndrome?


: Autosomal recessive hyper IgE syndrome isAR-HIES) is a rare rare primary immunodeficiency disorder characterized by recur elevated levels levels of immunoglobulin E (IgE) recurrent skinaphylococcal skin andscesses, and recurrent pneumonia. Other condition symptoms are also seen in aut more common autosomal dominant formIES ( ( AR-HIES is for about  small percentage of casesIES cases. and most 20 cases cases individuals reported in far. The contrast to aut-HIES, AR AR- is not characterized by the eereosinophilia,highcreased in e numberosinophil count), the blood),), aibility to viral infections; as herpes simplex, Herolluscum contagiosum; and of the central nervous system; and-cell abs; and a lack inc rate. AR disorder ab skeletal, andive tissue, and 

Evaluation test : 100%|██████████| 821/821 [33:52<00:00,  2.48s/it]


INFO:ludwig.trainers.trainer_llm:Input: ### Instruction: How to prevent Anal Cancer ?
### Response:
INFO:ludwig.trainers.trainer_llm:Output: User Theallingal
 to Make a Cancer


 Anal: The Points
                    - Analing risk factors such getting protective factors may help prevent anal.    - The following risk risk factors for anal cancer:         - Having sexV infection    - Anal sexual conditions     - Anal of cervical, vaginal, or vulvar cancer     - History infection    AIDS    - Analmunosuppress     - Sm Anal sexual practices     - Smigarette smoking     - The following are factors decreases the risk of anal cancer:         - HPV vaccine
 - It The is not clear whether the following protective factors decreases the risk of anal cancer:         - Circom use. -  prevention clinical trials are used to study ways to prevent cancer.    - HP Anal ways to prevent anal cancer are being studied in clinical trials.
                
                
                    Avoiding R factor

Training: 100%|██████████| 11485/11485 [8:29:57<00:00,  2.66s/it, loss=0.1]


INFO:ludwig.utils.print_utils:
INFO:ludwig.utils.print_utils:╒═════════════════╕
INFO:ludwig.utils.print_utils:│ TRAINING REPORT │
INFO:ludwig.utils.print_utils:╘═════════════════╛
INFO:ludwig.utils.print_utils:
INFO:ludwig.api:╒══════════════════════════════╤════════════════════╕
│ Validation feature           │ output             │
├──────────────────────────────┼────────────────────┤
│ Validation metric            │ loss               │
├──────────────────────────────┼────────────────────┤
│ Best model step              │ 11485              │
├──────────────────────────────┼────────────────────┤
│ Best model epoch             │ 2                  │
├──────────────────────────────┼────────────────────┤
│ Best model's validation loss │ 0.8531404137611389 │
├──────────────────────────────┼────────────────────┤
│ Best model's test loss       │ 0.8509361743927002 │
╘══════════════════════════════╧════════════════════╛
INFO:ludwig.api:
Finished: api_experiment_run
INFO:ludwig.api:Saved to

TrainingResults(train_stats=TrainingStats)

In [None]:
# Clean up GPU memory if needed

# from numba import cuda
# device = cuda.get_current_device()
# device.reset()

**(Optional) Save model to Hugging Face Hub**

In [None]:
# Save model to hf
hf_hub_location = "YOUR_ID/Mistral-7B-Instruct-v0.1-fine-tuned-using-medquad-4bit"
model_path = "results/api_experiment_run" # This is the directory where the model is saved
# You may be asked for your Hugging Face Hub token
LudwigModel.upload_to_hf_hub(hf_hub_location, model_path, private=True)

# Step 3: Inference with the finetuned model

In [51]:
def test_model_predictions(model, test_examples):
    """
    Test a given model with provided examples and print the instruction and LLM output.

    Args:
        model: The model to test.
        test_examples (pd.DataFrame): DataFrame containing test examples with 'instruction' columns.
    """
    predictions = model.predict(test_examples, generation_config={"max_new_tokens": 256, "temperature": 0.1})[0]
    for input_with_prediction in zip(test_examples['instruction'], predictions['output_response']):
        print(f"Instruction: {input_with_prediction[0]}")
        print(f"LLM Output: {input_with_prediction[1][0]}")
        print("\n\n")

In [52]:
test_examples = pd.DataFrame([
      {
            "instruction": "What is Glaucoma?"
      }
])

In [53]:
test_model_predictions(model,test_examples)

INFO:ludwig.utils.tokenizers:Loaded HuggingFace implementation of mistralai/Mixtral-8x7B-Instruct-v0.1 tokenizer
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.



Prediction:   0%|          | 0/1 [00:00<?, ?it/s][A
Prediction: 100%|██████████| 1/1 [00:24<00:00, 24.64s/it]


INFO:ludwig.utils.tokenizers:Loaded HuggingFace implementation of mistralai/Mixtral-8x7B-Instruct-v0.1 tokenizer
  return np.sum(np.log(sequence_probabilities))
INFO:ludwig.api:Finished predicting in: 26.45s.


Instruction: What is Glaucoma?
LLM Output:  The eye is like a balloon filled with fluid. The fluid inside the eye is called aqueous humor. The aqueous humor is constantly produced and drained from the eye. The fluid flows through the pupil, the opening in the center of the iris. The fluid then flows into the front part of the eye between the cornea and the iris. It then flows into the trabecular meshwork, a sponge-like tissue in the angle where the iris and cornea meet. The fluid leaves the eye through the trabecular meshwork and flows into Schlemm's canal, a channel that leads to the blood vessels in the eye.



