# LLM Instruction tuning Mistral 7B on Alpaca dataset

This notebook aims to fine-tune [Mistral 7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) using the [Alapca dataset](https://huggingface.co/datasets/tatsu-lab/alpaca). Mistral 7B is a large language model (LLM) that contains 7.3 billion parameters and is one of the most powerful models for its size. However, this base model is not instruction-tuned, meaning it may struggle to follow instructions and perform specific tasks.

The Alpaca dataset consists of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine. These can be used for instruction tuning, helping language models to better understand and follow instructions. By fine-tuning Mistral 7B on the Alpaca dataset, the model will significantly improve its capabilities to perform tasks such as conversation and answering questions accurately.

We will utilize [torchtune](https://github.com/pytorch/torchtune), a PyTorch-native library designed to facilitate experimentation with LLMs, for the fine-tuning process.

In [18]:
! pip install torchtune



## Downloading Mistral 7B

First, we need to donwload Mistral 7B. This can be achieved through torchtune with the following cell.

> **_NOTE:_** Set your environment variable `<HF_TOKEN>` or pass in --hf-token to the command in order to validate your access. You can find your token at https://huggingface.co/settings/tokens.

In [None]:
! tune download mistralai/Mistral-7B-v0.1 \
--output-dir ./mistral-7B \
--hf-token <HF_TOKEN>

## Inference using Mistral 7B

Now that we've downloaded the base model, let's employ it to generate an answer from a text input.

When using torchtune, several pieces of information need to be provided in a configuration file, including the type of model to use, its location, and which type of device should be utilized. The cell below generates a *.yaml* file containing all the necessary information to use Mistral 7B for inference on the GPU.

In [14]:
# mistral_generation.yaml

with open("mistral_generation.yaml", "w") as fp:
    fp.write(
        """
        # Config for running the InferenceRecipe in generate.py to generate output from an LLM
        #
        # To launch, run the following command from root torchtune directory:
        #    tune run generate --config generation
        
        # Model arguments
        model:
          _component_: torchtune.models.mistral.mistral_7b
        
        checkpointer:
          _component_: torchtune.utils.FullModelHFCheckpointer
          checkpoint_dir: /home/jovyan/content/mistral-7B
          checkpoint_files: [
            pytorch_model-00001-of-00002.bin,
            pytorch_model-00002-of-00002.bin
          ]
          recipe_checkpoint: null
          output_dir: /home/jovyan/content/mistral-7B
          model_type: MISTRAL
        resume_from_checkpoint: False
        
        device: cuda
        dtype: bf16
        
        seed: 1234
        
        # Tokenizer arguments
        tokenizer:
          _component_: torchtune.models.mistral.mistral_tokenizer
          path: /home/jovyan/content/mistral-7B/tokenizer.model
        
        # Generation arguments; defaults taken from gpt-fast
        prompt: "Hello, my name is"
        max_new_tokens: 300
        temperature: 0.6 # 0.8 and 0.6 are popular values to try
        top_k: 300
        
        quantizer: null
        """
    )

We will try to get an answer from Mistral 7B to the following prompt:

```python
"""
You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.
### Input:
Which Class has a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska?

### Context:
CREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license VARCHAR)

### Response:
"""
```


In [15]:
! tune run generate --config ./mistral_generation.yaml \
prompt="You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.\n\nYou must output the SQL query that answers the question.\n### Input:\nWhich Class has a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska?\n\n### Context:\nCREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license VARCHAR)\n\n### Response:\n"

INFO:torchtune.utils.logging:Running InferenceRecipe with resolved config:

checkpointer:
  _component_: torchtune.utils.FullModelHFCheckpointer
  checkpoint_dir: /home/jovyan/content/mistral-7B
  checkpoint_files:
  - pytorch_model-00001-of-00002.bin
  - pytorch_model-00002-of-00002.bin
  model_type: MISTRAL
  output_dir: /home/jovyan/content/mistral-7B
  recipe_checkpoint: null
device: cuda
dtype: bf16
max_new_tokens: 300
model:
  _component_: torchtune.models.mistral.mistral_7b
prompt: You are a powerful text-to-SQL model. Your job is to answer questions about
  a database. You are given a question and context regarding one or more tables.\n\nYou
  must output the SQL query that answers the question.\n### Input:\nWhich Class has
  a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska?\n\n###
  Context:\nCREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license
  VARCHAR)\n\n### Response:\n
quantizer: null
resume_from_checkpoint: false
s

As expected, using the base model, the answer is not exactly what we were hoping for. This is because the objective of the model is next word prediction.

To improve the model's understanding, we will fine tune it using the Alpaca dataset.

## Instruction tuning

This time, we need to create a configuration file that holds the relevant information to fine tune the model.

LoRA (Low-Rank Adaptation), a highly efficient method of LLM fine tuning, is here used via this configuration file.

In [16]:
# 7B_lora_single_device_mistral.yaml

with open("7B_lora_single_device_mistral.yaml", "w") as fp:
    fp.write(
        """
        # Tokenizer
        tokenizer:
          _component_: torchtune.models.mistral.mistral_tokenizer
          path: /home/jovyan/content/mistral-7B/tokenizer.model
        
        # Dataset
        dataset:
          _component_: torchtune.datasets.alpaca_dataset
          train_on_input: True
        seed: null
        shuffle: True
        
        # Model Arguments
        model:
          _component_: torchtune.models.mistral.lora_mistral_7b
          lora_attn_modules: ['q_proj', 'k_proj', 'v_proj']
          apply_lora_to_mlp: True
          apply_lora_to_output: True
          lora_rank: 64
          lora_alpha: 16
        
        checkpointer:
          _component_: torchtune.utils.FullModelHFCheckpointer
          checkpoint_dir: /home/jovyan/content/mistral-7B
          checkpoint_files: [
            pytorch_model-00001-of-00002.bin,
            pytorch_model-00002-of-00002.bin
          ]
          recipe_checkpoint: null
          output_dir: /home/jovyan/content/mistral-7B
          model_type: MISTRAL
        resume_from_checkpoint: False
        
        optimizer:
          _component_: torch.optim.AdamW
          lr: 2e-5
        
        lr_scheduler:
          _component_: torchtune.modules.get_cosine_schedule_with_warmup
          num_warmup_steps: 100
        
        loss:
          _component_: torch.nn.CrossEntropyLoss
        
        # Fine-tuning arguments
        batch_size: 4
        epochs: 3
        max_steps_per_epoch: null
        gradient_accumulation_steps: 4
        compile: False
        
        # Training env
        device: cuda
        
        # Memory management
        enable_activation_checkpointing: True
        
        # Reduced precision
        dtype: bf16
        
        # Logging
        metric_logger:
          _component_: torchtune.utils.metric_logging.DiskLogger
          log_dir: ${output_dir}
        output_dir: /home/jovyan/content/mistral-7B
        log_every_n_steps: null
        
        # Show case the usage of pytorch profiler
        # Set enabled to False as it's only needed for debugging training
        profiler:
          _component_: torchtune.utils.profiler
          enabled: False
          output_dir: /home/jovyan/content/mistral-7B/torchtune_perf_tracing.json
        """
    )

In [17]:
! tune run lora_finetune_single_device --config ./7B_lora_single_device_mistral.yaml

INFO:torchtune.utils.logging:Running LoRAFinetuneRecipeSingleDevice with resolved config:

batch_size: 4
checkpointer:
  _component_: torchtune.utils.FullModelHFCheckpointer
  checkpoint_dir: /home/jovyan/content/mistral-7B
  checkpoint_files:
  - pytorch_model-00001-of-00002.bin
  - pytorch_model-00002-of-00002.bin
  model_type: MISTRAL
  output_dir: /home/jovyan/content/mistral-7B
  recipe_checkpoint: null
compile: false
dataset:
  _component_: torchtune.datasets.alpaca_dataset
  train_on_input: true
device: cuda
dtype: bf16
enable_activation_checkpointing: true
epochs: 3
gradient_accumulation_steps: 4
log_every_n_steps: null
loss:
  _component_: torch.nn.CrossEntropyLoss
lr_scheduler:
  _component_: torchtune.modules.get_cosine_schedule_with_warmup
  num_warmup_steps: 100
max_steps_per_epoch: null
metric_logger:
  _component_: torchtune.utils.metric_logging.DiskLogger
  log_dir: /home/jovyan/content/mistral-7B
model:
  _component_: torchtune.models.mistral.lora_mistral_7b
  apply_lo

Using a NVIDIA H100 PCIe GPU, one epoch is completed in roughly 58 minutes.

## Inference using Mistral 7B fine-tuned

Once again, there is the configuration file that will be used to infere with the fine tuned version of the Mistral 7B model.

In [23]:
# mistral_fine-tuned_generation.yaml

with open("mistral_fine-tuned_generation.yaml", "w") as fp:
    fp.write(
        """
        # Config for running the InferenceRecipe in generate.py to generate output from an LLM
        #
        # To launch, run the following command from root torchtune directory:
        #    tune run generate --config generation
        
        # Model arguments
        model:
          _component_: torchtune.models.mistral.mistral_7b
        
        checkpointer:
          _component_: torchtune.utils.FullModelHFCheckpointer
          checkpoint_dir: /home/jovyan/content/mistral-7B
          checkpoint_files: [
            hf_model_0001_2.pt,
            hf_model_0002_2.pt,
          ]
          recipe_checkpoint: null
          output_dir: /home/jovyan/content/mistral-7B
          model_type: MISTRAL
        resume_from_checkpoint: False
        
        device: cuda
        dtype: bf16
        
        seed: 1234
        
        # Tokenizer arguments
        tokenizer:
          _component_: torchtune.models.mistral.mistral_tokenizer
          path: /home/jovyan/content/mistral-7B/tokenizer.model
        
        # Generation arguments; defaults taken from gpt-fast
        prompt: "Hello, my name is"
        max_new_tokens: 300
        temperature: 0.6 # 0.8 and 0.6 are popular values to try
        top_k: 300

        quantizer: null
        """
    )

We are using the exact same prompt as before.

In [24]:
! tune run generate --config ./mistral_fine-tuned_generation.yaml \
prompt="You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.\n\nYou must output the SQL query that answers the question.\n### Input:\nWhich Class has a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska?\n\n### Context:\nCREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license VARCHAR)\n\n### Response:\n"

INFO:torchtune.utils.logging:Running InferenceRecipe with resolved config:

checkpointer:
  _component_: torchtune.utils.FullModelHFCheckpointer
  checkpoint_dir: /home/jovyan/content/mistral-7B
  checkpoint_files:
  - hf_model_0001_2.pt
  - hf_model_0002_2.pt
  model_type: MISTRAL
  output_dir: /home/jovyan/content/mistral-7B
  recipe_checkpoint: null
device: cuda
dtype: bf16
max_new_tokens: 300
model:
  _component_: torchtune.models.mistral.mistral_7b
prompt: You are a powerful text-to-SQL model. Your job is to answer questions about
  a database. You are given a question and context regarding one or more tables.\n\nYou
  must output the SQL query that answers the question.\n### Input:\nWhich Class has
  a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska?\n\n###
  Context:\nCREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license
  VARCHAR)\n\n### Response:\n
quantizer: null
resume_from_checkpoint: false
seed: 1234
temperature: 0.6
t

This time, the output is more relevant, the model only outputs the answer to our question. The fine tuning process has worked!