> See the notebook [Fine_Tuning_Mistral7B_with_QLoRA_Native_PyTorch_Training.ipynb](https://github.com/BitwiseBrains/RagOptimize/blob/main/fine_tuning/Fine_Tuning_Mistral7B_with_QLoRA_Native_PyTorch_Training.ipynb) for details on how the fine-tuning is done.

# Preparation

In [1]:
!pip install -q peft==0.6.0

In [2]:
import torch
from transformers import MistralForCausalLM
from peft import PeftModel
import gc
import yaml

from huggingface_hub import login, hf_hub_download

# Config and Logins


- `peft_repo_id`: The ID of the Hugging Face repository where the model is stored. In this case, it's `hari31416/Mistral_Finance_Finetuning`.
- `base_model_id`: The ID of the base model to be fine-tuned. Here, it's `mistralai/Mistral-7B-Instruct-v0.1`.
- `fine_tuned_model_id`: The ID of repository where the fine-tuned model will be pushed after merging. Here, it's `hari31416/Mistral_Finance_Finetuned`.
- `head_file_name`: The name of the file containing the model head. In this case, it's `mistral_head.pt`.
- `trained_batches_number`: The number of batches the model has been trained on. Here, it's `800`. This will be used to create the commit message while pushing the final model to HF.

In [3]:
config = """---
peft_repo_id: hari31416/Mistral_Finance_Finetuning
base_model_id: mistralai/Mistral-7B-Instruct-v0.1
fine_tuned_model_id: hari31416/Mistral_Finance_Finetuned
head_file_name: mistral_head.pt
trained_batches_number: 800
"""
config = yaml.safe_load(config)

In [6]:
from kaggle_secrets import UserSecretsClient

user_secrets = UserSecretsClient()

HUGGING_FACE_API_KEY = user_secrets.get_secret("HUGGING_FACE_API_KEY")
login(HUGGING_FACE_API_KEY)

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


# Merging

We need to merge the LoRA layer with the base model. Since we also trained the head of the model, the head of the base model also needs to be replaced with the fine-tuned head.

Here are the steps involved in merging the LoRA layer and the head with the base model:

- Load the base model in **full precision** (note that the model must be loaded into full precision for unloading to happen)
- Get the PEFT model using the `PeftModel.from_pretrained` by passing it the base model and the location of the LoRA layer. The latter can be done by specifying the location of the repository where the model was saved during fine-tuning.
- Call `merge_and_unload` on the PEFT model. This will merge the LoRA layer with the base model.
- Download the weights of the fine-tuned head. The weights of the head of this unloaded model than will be replaced with the downloaded weight.

The following cells write the code for doing this.

In [7]:
model = MistralForCausalLM.from_pretrained(
    config["base_model_id"],
    torch_dtype=torch.bfloat16, # load in bfloat instead of 4bit or 8bit
    device_map={"": 0}
)

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Just as a precaution, we will make sure that the downloaded fine-tuned weights of the head are different from the weights of the original head. We need to move the weight tensors to CPU otherwise, they are getting updated. This means that we will not be able to verify if the weights of the head are different or not.

In [8]:
og_head = model.lm_head.state_dict()["weight"].to("cpu")
og_head

tensor([[-2.5940e-03,  8.8882e-04, -2.3499e-03,  ..., -2.7466e-04,
          4.1199e-03,  1.8616e-03],
        [-2.5635e-03,  1.4267e-03, -2.3956e-03,  ...,  5.3406e-04,
          5.9509e-03,  2.0447e-03],
        [ 3.4180e-03,  4.6997e-03,  2.7466e-03,  ..., -3.1281e-03,
         -3.4180e-03,  2.2278e-03],
        ...,
        [-3.7537e-03,  1.3504e-03, -4.8218e-03,  ...,  1.9684e-03,
          2.3651e-03, -1.9379e-03],
        [ 8.6427e-06, -4.0588e-03, -6.3171e-03,  ..., -1.8616e-03,
          3.8300e-03,  3.0365e-03],
        [ 2.7466e-03, -4.3945e-03,  3.6011e-03,  ..., -6.5613e-03,
          1.9455e-03, -7.8201e-05]], dtype=torch.bfloat16)

In [9]:
peft_model = PeftModel.from_pretrained(model, config["peft_repo_id"]) # get the PEFT (LORA adapater)
final_model = peft_model.merge_and_unload() # merge the adapter

adapter_config.json:   0%|          | 0.00/520 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/54.6M [00:00<?, ?B/s]

In [10]:
final_model

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNorm()
      )
    )
    (norm): MistralRMSNorm(

In [11]:
old_head = final_model.lm_head.state_dict()["weight"].to("cpu")
old_head

tensor([[-2.5940e-03,  8.8882e-04, -2.3499e-03,  ..., -2.7466e-04,
          4.1199e-03,  1.8616e-03],
        [-2.5635e-03,  1.4267e-03, -2.3956e-03,  ...,  5.3406e-04,
          5.9509e-03,  2.0447e-03],
        [ 3.4180e-03,  4.6997e-03,  2.7466e-03,  ..., -3.1281e-03,
         -3.4180e-03,  2.2278e-03],
        ...,
        [-3.7537e-03,  1.3504e-03, -4.8218e-03,  ...,  1.9684e-03,
          2.3651e-03, -1.9379e-03],
        [ 8.6427e-06, -4.0588e-03, -6.3171e-03,  ..., -1.8616e-03,
          3.8300e-03,  3.0365e-03],
        [ 2.7466e-03, -4.3945e-03,  3.6011e-03,  ..., -6.5613e-03,
          1.9455e-03, -7.8201e-05]], dtype=torch.bfloat16)

In [12]:
# load the head and change the weights of the head
head_file_path = hf_hub_download(config["peft_repo_id"], config["head_file_name"], local_dir = ".")
lm_head_state_dict = torch.load(head_file_path)
final_model.lm_head.load_state_dict(lm_head_state_dict)

mistral_head.pt:   0%|          | 0.00/524M [00:00<?, ?B/s]

<All keys matched successfully>

In [13]:
del lm_head_state_dict
gc.collect()
torch.cuda.empty_cache()

In [14]:
new_head = final_model.lm_head.state_dict()["weight"].to("cpu")
new_head

tensor([[-0.0012,  0.0001, -0.0036,  ..., -0.0005,  0.0045,  0.0032],
        [-0.0045,  0.0020, -0.0043,  ...,  0.0014,  0.0045,  0.0041],
        [ 0.0024,  0.0026,  0.0014,  ..., -0.0028, -0.0035,  0.0013],
        ...,
        [-0.0015,  0.0003, -0.0063,  ...,  0.0023,  0.0033, -0.0002],
        [ 0.0020, -0.0049, -0.0077,  ..., -0.0015,  0.0057,  0.0041],
        [ 0.0038, -0.0059,  0.0010,  ..., -0.0070,  0.0042, -0.0002]],
       dtype=torch.bfloat16)

In [15]:
# make sure that the wights are different, if not, raise error
matched = torch.equal(old_head, new_head)
if matched:
    raise ValueError("The weights are matching! Something is wrong")

Let us push the final merged model to HF.

In [16]:
final_model.push_to_hub(config["fine_tuned_model_id"], commit_message=f"Merging PEFT to the main model after {config['trained_batches_number']} batches")

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]

CommitInfo(commit_url='https://huggingface.co/hari31416/Mistral_Base_Finance_Finetuned_Trainer/commit/7a9747cfa60ef3f37bc999023dddbde151bfb593', commit_message='Merging PEFT to the main model after 500 batches', commit_description='', oid='7a9747cfa60ef3f37bc999023dddbde151bfb593', pr_url=None, pr_revision=None, pr_num=None)