# **LoRA Finetuning of Mistral 7B-Instruct**
### Following are the steps:
- Install required libraries: `transformers`, `peft`, `accelerate`, `bitsandbytes`, `datasets`  
- Load model in 4-bit using `BitsAndBytesConfig` with NF4 dtype  
- Configure LoRA targeting `q_proj` and `v_proj` layers  
- Prepare dataset using Mistral's `[INST]` prompt template  
- Set training arguments: batch size 1, gradient accumulation, low learning rate  
- Enable gradient checkpointing to save memory  
- Initialize trainer with LoRA config and start training  
- Monitor GPU usage during training  
- Merge adapters into base model after training  
- Test model outputs for quality and safety  
- Deploy with appropriate disclaimers and safeguards

In [4]:
!pip install transformers accelerate  peft -U bitsandbytes -U datasets torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 fastai==2.7.19

Collecting bitsandbytes
  Downloading bitsandbytes-0.46.0-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting datasets
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch==2.6.0)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch==2.6.0)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch==2.6.0)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch==2.6.0)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch==2.6.0)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting 

In [1]:
!pip install -U bitsandbytes



In [2]:
!pip install -q huggingface_hub
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [24]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,TrainingArguments, Trainer
from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model, PeftModel, PeftConfig
from datasets import load_dataset
import torch
import os

In [4]:
#quantization
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    llm_int8_enable_fp32_cpu_offload=True , # Enables CPU offloading in float32
    bnb_4bit_quant_type="nf4"  # This fixes the fp4 on CPU error
)

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.2",
    device_map="auto",  # Automatically places layers on GPU/CPU as needed
    quantization_config=quantization_config
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/2.10k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/596 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

In [5]:
#step 1: load dataset
dataset = load_dataset("empathetic_dialogues")

# Step 2: Save train split
train_dataset = dataset["train"]

print(train_dataset)

model = prepare_model_for_kbit_training(model)

README.md:   0%|          | 0.00/7.15k [00:00<?, ?B/s]

empathetic_dialogues.py:   0%|          | 0.00/4.51k [00:00<?, ?B/s]

The repository for empathetic_dialogues contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/empathetic_dialogues.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


Downloading data:   0%|          | 0.00/28.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/76673 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/12030 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/10943 [00:00<?, ? examples/s]

Dataset({
    features: ['conv_id', 'utterance_idx', 'context', 'prompt', 'speaker_idx', 'utterance', 'selfeval', 'tags'],
    num_rows: 76673
})


In [10]:
# to check the needed col-> 'utterance' is required for training
print(train_dataset[0])
print(train_dataset[1])
print(train_dataset[2])

{'conv_id': 'hit:0_conv:1', 'utterance_idx': 1, 'context': 'sentimental', 'prompt': 'I remember going to the fireworks with my best friend. There was a lot of people_comma_ but it only felt like us in the world.', 'speaker_idx': 1, 'utterance': 'I remember going to see the fireworks with my best friend. It was the first time we ever spent time alone together. Although there was a lot of people_comma_ we felt like the only people in the world.', 'selfeval': '5|5|5_2|2|5', 'tags': ''}
{'conv_id': 'hit:0_conv:1', 'utterance_idx': 2, 'context': 'sentimental', 'prompt': 'I remember going to the fireworks with my best friend. There was a lot of people_comma_ but it only felt like us in the world.', 'speaker_idx': 0, 'utterance': 'Was this a friend you were in love with_comma_ or just a best friend?', 'selfeval': '5|5|5_2|2|5', 'tags': ''}
{'conv_id': 'hit:0_conv:1', 'utterance_idx': 3, 'context': 'sentimental', 'prompt': 'I remember going to the fireworks with my best friend. There was a lot

In [6]:
# ftn to tokenize i.e. extract the utterance column
def tokenize(sample):
    prompt = f"{sample['utterance']}\n"
    encoding = tokenizer(prompt, truncation=True, padding="max_length", max_length=512)
    encoding["labels"] = encoding["input_ids"].copy()
    return encoding

In [8]:
tokenized_dataset = train_dataset.map(tokenize, batched=False)

Map:   0%|          | 0/76673 [00:00<?, ? examples/s]

In [10]:
# lora config
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)

In [12]:
training_args = TrainingArguments(
    output_dir="./mistral-finetuned",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=10,
    max_steps=100,  # For testing. Increase this.
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    save_steps=50,
    report_to="none"
)

trainer = Trainer(
    model=model,
    train_dataset=tokenized_dataset,
    args=training_args,
    tokenizer=tokenizer
)

trainer.train()

  trainer = Trainer(
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kw

Step,Training Loss
10,8.3553
20,0.1729
30,0.1382
40,0.135
50,0.1142
60,0.1121
70,0.1148
80,0.1087
90,0.1128
100,0.1021


  return fn(*args, **kwargs)


TrainOutput(global_step=100, training_loss=0.9466073095798493, metrics={'train_runtime': 1654.5441, 'train_samples_per_second': 0.484, 'train_steps_per_second': 0.06, 'total_flos': 1.7491908624384e+16, 'train_loss': 0.9466073095798493, 'epoch': 0.010433784594517046})

In [13]:
model.save_pretrained("mistral-7b-lora-empathetic")
tokenizer.save_pretrained("mistral-7b-lora-empathetic")

('mistral-7b-lora-empathetic/tokenizer_config.json',
 'mistral-7b-lora-empathetic/special_tokens_map.json',
 'mistral-7b-lora-empathetic/chat_template.jinja',
 'mistral-7b-lora-empathetic/tokenizer.model',
 'mistral-7b-lora-empathetic/added_tokens.json',
 'mistral-7b-lora-empathetic/tokenizer.json')

In [26]:
#testing
# Load base Mistral
tokenizer = AutoTokenizer.from_pretrained("mistral-7b-lora-empathetic")

# Load LoRA adapter
peft_config = PeftConfig.from_pretrained("mistral-7b-lora-empathetic")

model = PeftModel.from_pretrained(
    model=model,
    model_id="mistral-7b-lora-empathetic"
)
model.eval()



PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): PeftModelForCausalLM(
      (base_model): LoraModel(
        (model): MistralForCausalLM(
          (model): MistralModel(
            (embed_tokens): Embedding(32000, 4096)
            (layers): ModuleList(
              (0-31): 32 x MistralDecoderLayer(
                (self_attn): MistralAttention(
                  (q_proj): lora.Linear4bit(
                    (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.05, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=4096, out_features=8, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=8, out_features=4096, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
      

In [31]:
# testing
prompt = "I lost my job today."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(inputs["input_ids"], max_new_tokens=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


I lost my job today. I was let go due to budget cuts. I’m feeling a mix of emotions right now. I’m sad, scared, and angry. I’m sad because I enjoyed my job and the people I worked with. I’m scared because I don’t know what the future holds. I’m angry because I feel like I was betrayed by my employer. I know they didn’t do this to intentionally hurt me, but it still feels like a personal attack.


In [32]:
merged_model = model.merge_and_unload()
merged_model.save_pretrained("mistral-merged")
tokenizer.save_pretrained("mistral-merged")



('mistral-merged/tokenizer_config.json',
 'mistral-merged/special_tokens_map.json',
 'mistral-merged/chat_template.jinja',
 'mistral-merged/tokenizer.model',
 'mistral-merged/added_tokens.json',
 'mistral-merged/tokenizer.json')