<a href="https://colab.research.google.com/github/Aditya100300/LLMs_from_scratch/blob/main/Chapter_8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Enlighten Instruct – 4-bit LoRA Fine-Tuning with Mistral-7B**

In this notebook, I will demonstrate how to:

1. Clone the `Enlighten-Instruct` repo,  
2. Set up the environment for 4-bit training with bitsandbytes,  
3. Load the Mistral-7B-Instruct base model in 4-bit,  
4. Prepare a LoRA config,  
5. Train on a small dataset,  
6. Evaluate and push to Hugging Face.

We'll follow these steps carefully.


In [1]:
# Cell 1: Basic Setup and Installations

%%capture

!git clone 'https://github.com/ali7919/Enlighten-Instruct.git'  # project repo
!pip install -U bitsandbytes         # For 4-bit quantization
!pip install transformers==4.36.2    # Specific version tested
!pip install -U peft                 # Parameter Efficient Fine Tuning library
!pip install -U accelerate
!pip install -U trl                  # Transformers Reinforcement Learning
!pip install datasets==2.16.0        # Dataset version known to be stable
!pip install sentencepiece           # Some models need sentencepiece for tokenization

print("Done environment setup!")


Explanation:

We clone the GitHub repo containing the code and data.

Then we install bitsandbytes to handle 4-bit quantization, peft for LoRA, trl for training loops, and so forth.

We also pinned Transformers to 4.36.2 for compatibility.

In [2]:
# Cell 2: Import libraries

import os
import torch
import pandas as pd
import re
from datasets import Dataset
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,
                          TrainingArguments, pipeline, logging)
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
from trl import SFTTrainer

print("All libraries imported successfully!")



All libraries imported successfully!


Explanation:

We specifically import from peft the functions used for LoRA.

SFTTrainer from trl helps with the fine-tuning loop.

AutoModelForCausalLM is used to load Mistral in 4-bit.



In [3]:
# Cell 3: Define base model + new model name + paths

base_model = "databricks/dolly-v2-3b"  # an open-source 3B param LLM from Databricks
new_model = "Enlighten_Instruct"       # your new model name on HF
train_path = "/content/Enlighten-Instruct/Dataset/TrainData.csv"
test_path  = "/content/Enlighten-Instruct/Dataset/TestData.csv"

print("Base model:", base_model)
print("We'll create / push LoRA adapter name:", new_model)
print("Train CSV:", train_path)
print("Test CSV:", test_path)


Base model: databricks/dolly-v2-3b
We'll create / push LoRA adapter name: Enlighten_Instruct
Train CSV: /content/Enlighten-Instruct/Dataset/TrainData.csv
Test CSV: /content/Enlighten-Instruct/Dataset/TestData.csv


Explanation:

The official model name is mistralai/Mistral-7B-Instruct-v0.2, which we load from Hugging Face.

We define the path for the train/test CSVs from the cloned repo.

The new_model string is how the final adapter will appear on huggingface.co.

In [4]:
# Cell 4: (Optional) Hugging Face Login

from google.colab import userdata

secret_hf = userdata.get('HUGGINGFACE_TOKEN')
if secret_hf:
    !huggingface-cli login --token $secret_hf
else:
    print("No HF token found. If you want to push to HF, set a token in your colab secrets.")



SecretNotFoundError: Secret HUGGINGFACE_TOKEN does not exist.

In [5]:
# Cell 5: Read the training data + create a single text column

df = pd.read_csv(train_path)

# Suppose your CSV has columns 'Q', 'A', 'class', etc.
# We'll do the same approach: create a combined text with <s>[INST] etc.
df['text'] = '<s>[INST]@Enlighten. ' + df['Q'] + '[/INST]' + df['A'] + '</s>'

# Drop old columns
df = df.drop(['Q','A','class'], axis=1)

# Convert to a Hugging Face Dataset
dataset = Dataset.from_pandas(df)

print(dataset)
dataset.to_pandas().head(3)


Dataset({
    features: ['text'],
    num_rows: 1910
})


Unnamed: 0,text
0,<s>[INST]@Enlighten. What is the purpose of th...
1,<s>[INST]@Enlighten. What is the purpose of th...
2,<s>[INST]@Enlighten. What component does the '...


Explanation:

We carefully create the instruction format.

Removing columns we don’t need.

dataset is now a datasets.Dataset object with a 'text' feature.

In [6]:
# Cell 6: Load Dolly v2 3B in 8-bit or 4-bit (optional).

# Let's do 8-bit instead of 4-bit for Dolly v2 3b, to keep it simpler.
# If you want 4-bit, you can adapt the code similarly to your original approach.

bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_threshold=6.0
)

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    torch_dtype=torch.float16,
    device_map="auto",
)

model.config.use_cache = False
model.gradient_checkpointing_enable()

tokenizer = AutoTokenizer.from_pretrained(base_model)
tokenizer.padding_side = "right"
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_eos_token = True

print("Dolly v2 3B loaded in 8-bit. Tokenizer ready!")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/819 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/5.68G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/5.68G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/450 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/228 [00:00<?, ?B/s]

Dolly v2 3B loaded in 8-bit. Tokenizer ready!


Explanation:

Dolly-v2-3b is only 3B param, so 8-bit is typically feasible on a free Colab. If you'd like 4-bit, you can do so with bnb_config(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, etc.)

We do torch_dtype=torch.float16.

We set device_map="auto" so that accelerate places layers on GPU automatically.

In [7]:
# Cell 7: Prepare model for LoRA + define config

model = prepare_model_for_kbit_training(model)

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
    # Dolly's linear modules typically are named 'query_key_value' or so,
    # but let's be broad:
    target_modules=["query_key_value","dense","c_attn","q_proj","k_proj","v_proj","o_proj"]
)

model = get_peft_model(model, peft_config)
print("LoRA modules added to Dolly v2!")


LoRA modules added to Dolly v2!


Explanation:

We do prepare_model_for_kbit_training to fix any bitsandbytes–related nuances.

Then we define a LoraConfig with r=64, lora_alpha=16.

The target_modules can be adjusted depending on Dolly’s module names.

get_peft_model merges these in.



In [12]:
# Cell 8: Train Configuration and SFTTrainer

from transformers import TrainingArguments
from trl import SFTTrainer

training_arguments = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=50,
    logging_steps=10,
    learning_rate=2e-4,
    weight_decay=0.001,
    max_grad_norm=1.0,
    lr_scheduler_type="constant",
)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
 #   max_seq_length=1024,
 #   dataset_text_field="text",
 #   tokenizer=tokenizer,
    args=training_arguments,
 #   packing=False
)

print("Trainer created. Ready to do trainer.train().")


Converting train dataset to ChatML:   0%|          | 0/1910 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/1910 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/1910 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/1910 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Trainer created. Ready to do trainer.train().


Explanation:

We specify a constant LR of 2e-4, do 1 epoch, batch size of 4, no gradient accumulation.

SFTTrainer from trl handles the standard fine-tuning loop.

Setting max_seq_length=1024 for Dolly 3B is typical, but adapt to your needs.

In [13]:
# Cell 9: Actually Train

trainer.train()
print("Training finished!")


[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mbabaraditya07[0m ([33mbabaraditya07-fh-k-rnten[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


  return fn(*args, **kwargs)


Step,Training Loss
10,2.8136
20,1.8373
30,1.6668
40,1.6786
50,1.5596
60,1.5064
70,1.5056
80,1.4643
90,1.4921
100,1.5379


  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)
  return fn(*args, **kwargs)


Training finished!


In [None]:
# Cell 10: Save the adapter and push to Hugging Face

trainer.model.save_pretrained(new_model)
model.config.use_cache = True
model.eval()

print(f"LoRA adapter saved locally as {new_model}")

# If you want to push:
trainer.model.push_to_hub(new_model)
print("Pushed to HF Hub!")


Explanation:

We do save_pretrained to store LoRA weights.

Then push_to_hub to create/update the model on your HF account, if you are logged in.

In [None]:
# Cell 11: Quick Test with a Pipeline

from transformers import pipeline

logging.set_verbosity(logging.CRITICAL)

gen_pipe = pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=200
)

def build_prompt(question):
    return f"<s>[INST]@Enlighten. {question} [/INST]"

sample_question = "Explain the difference between AI and Machine Learning."
prompt = build_prompt(sample_question)

result = gen_pipe(prompt)
print("Model says:\n", result[0]['generated_text'])


Explanation:

We set up a pipeline for text-generation.

We create a quick prompt with your [INST]@Enlighten style.

Then see the model’s output.

In [None]:
# Cell 12: Evaluate on a small test CSV

df_test = pd.read_csv(test_path)
num_correct = 0
question_count = 0

for idx, row in df_test.iterrows():
    question_count += 1
    question = row['Question']  # or however your CSV is structured
    correct_ans = row['Answer']

    # Build the instruction
    # If you have multiple choice a, b, c, d:
    # question + "a) {row['a']} b) {row['b']} ..."
    # We'll keep it simple
    prompt = build_prompt(question)
    response = gen_pipe(prompt)[0]['generated_text']

    # Attempt to parse or just print
    print("Q:", question)
    print("Model:", response)

    # Some logic to see if model is correct
    # ...
    # We'll do a dummy check:
    if correct_ans in response:
        num_correct += 1

acc = num_correct / question_count
print(f"Test Accuracy: {acc:.2f}")


Explanation:

We do a naive loop over the test CSV, build the prompt, generate the model’s answer.

We do a simple check: if correct_ans substring is in the response, we consider it correct. (You can do more advanced parsing.)

## Wrap-Up

We have successfully:
1. Cloned the data from GitHub.  
2. Used Dolly v2 3b, an open-source 3B param model from Databricks.  
3. Fine-tuned it with LoRA on the CSV.  
4. Saved and optionally pushed the adapter to Hugging Face.  
5. Tested with a basic multi-choice or Q/A approach.

This is purely open-source. If you want a different open-source model, just replace `base_model` with your choice, and adapt the LoRA `target_modules` if needed.
