# **Instruction Fine-Tuning Pretrained Mistral-7B Model on Extensive Telecom Q&A Dataset**

## **Introduction**

This notebook documents the fine-tuning process of a **Continually Pretrained Mistral-7B** model on **Telecommunications** domain-specific dataset. The goal is to enhance the model's performance on telecom-related question-answering (Q&A) tasks for our downstream usecase **Using AI to Reduce the 6G Standards Barrier for African Contributors.** by exposing it to curated and task-aligned data using parameter-efficient fine-tuning methods (LoRA).

# **Our Approach**

We begin with a base model that has undergone continual pretraining, hosted at: `Agaba-Embedded4/Cintinually-Pre-trained-Mistral-7B`

This model is further fine-tuned using the **Unsloth** library, which enables efficient training using 4-bit quantization, memory optimization strategies, and LoRA adapters.

---

## **Dataset: Combined Telecom Q&A**

The dataset used in this notebook is a combination of three high-quality telecom-related datasets sourced from Hugging Face:

1. [`dinho1597/Telecom-QA-MultipleChoice`](https://huggingface.co/datasets/dinho1597/Telecom-QA-MultipleChoice)
2. [`netop/TeleQnA`](https://huggingface.co/datasets/netop/TeleQnA)
3. [`AliMaatouk/Tele-Eval`](https://huggingface.co/datasets/AliMaatouk/Tele-Eval)

These datasets have been combined and released as a unified dataset under: `Agaba-Embedded4/Combined-Telecom-QnA`

The dataset consists of question-answer pairs related to telecom operations, services, configurations, and terminology. This makes it a suitable benchmark for fine-tuning large language models in domain-specific task and to accurately follow instructions.

---

## **Training Framework**

We use the following tools and frameworks throughout the notebook:

- **Unsloth**: An optimized wrapper for efficient fine-tuning of LLMs, with built-in support for quantized models, LoRA adapters, and long-context support using gradient checkpointing.
- **Transformers** (by Hugging Face): For model APIs, tokenization, and training interfaces.
- **Datasets**: To load and manipulate the Telecom Q&A dataset.
- **WandB**: For experiment tracking, logging, and model monitoring.
- **LoRA (Low-Rank Adaptation)**: A parameter-efficient fine-tuning method to adapt pre-trained models using low-rank updates to select attention layers.

---

## **Training Strategy**

1. **Model Loading**: The continually pretrained Mistral-7B is loaded using Unsloth in 4-bit precision to reduce VRAM usage.
2. **Dataset Preprocessing**: The combined Q&A dataset is converted into the Alpaca-style instruction template and also adapted to a structured **chat format** with `system`, `user`, and `assistant` roles for instruct-tuning.
3. **Fine-tuning Setup**:
   - Instruction tuning using the `SFTTrainer` (Supervised Fine-Tuning Trainer from TRL).
   - Tokenization via Mistral’s chat template support.
   - Only the response (assistant part) is trained on using `train_on_responses_only`.
4. **Evaluation**: The model is tested with sample telecom-related queries to observe response quality.
5. **Saving & Pushing**: After training, the fine-tuned model is saved locally and pushed to the Hugging Face Hub.

---

## **Objectives**

- Improve Mistral-7B's ability to answer telecom-specific questions.
- Use parameter-efficient fine-tuning to minimize resource consumption while retaining strong performance.
- Deploy the final model for our downstream tasks **(Using AI to Reduce the 6G Standards Barrier for African Contributors.)**.

---

## **Outcome**

The result is a lightweight, domain-adapted, and instruction-tuned version of Mistral-7B that can be easily loaded for inference via the Hugging Face Hub from: `EYEDOL/MISTRAL7B_ON_TELE`


This model will be integrate to our system to bridge the telecom gap in 6G and can be used in telecom virtual assistants, chatbot platforms, or internal diagnostic systems that require specialized understanding of telecom data.

---

> **Author**: WINEST NIGERIA  
> **Project Title**: Using AI to Reduce the 6G Standards Barrier for African Contributors.  
> **Organization**: Federal University of Technology Minna  
> **Date**: July 2025


## **Setting up and Importations**

In [None]:
%%capture
%pip install unsloth

## **Section 1: Hugging Face Hub Authentication**

Login to Hugging Face Hub using a personal access token. This is necessary to:

- Download pretrained models from private or public repositories.
- Push fine-tuned models and tokenizers back to the Hub.
- Access any organization or user-specific resources hosted on Hugging Face.


In [None]:
from huggingface_hub import login

# Replace 'your_huggingface_token_here' with your urs
hf_token = "hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Login using the token
login(token=hf_token)

print("Successfully logged into Hugging Face Hub!")


## **Section 2: W&B (Weights & Biases) Setup**

Integrating [Weights & Biases (wandb)](https://wandb.ai/) for experiment tracking. W&B allows you to monitor key training metrics, visualize performance in real-time, and log model artifacts.


In [None]:
import wandb

# Configure wandb and training params
wandb.login(key="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")

run = wandb.init(
    project='Fine-tune Mistral on TELECOM_DATA',
    job_type="training",
    anonymous="allow"
)

## **Section 3: Loading the Continually Pretrained Mistral-7B Model**

In this section, we load the **continually pretrained version of the Mistral-7B** model using the [`Unsloth`](https://github.com/unslothai/unsloth) library. Unsloth is designed for efficient fine-tuning of large language models, enabling fast experimentation even on limited hardware via quantization and optimization techniques.

In [None]:
from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Agaba-Embedded4/Cintinually-Pre-trained-Mistral-7B",
    max_seq_length=2048,
    dtype=None,          # let Unsloth pick the best internally (float16 by default)
    load_in_4bit=True,   # 4‑bit quant for lower VRAM use
)

## **Section 4: Loading and Preprocessing the Combined Telecom Q&A Dataset**

In this section, we load the **combined Telecom Q&A dataset** and preprocess it using the **Alpaca instruction format**, which is commonly used in instruction tuning tasks.

### Dataset Source
The dataset is hosted on Hugging Face under:
**`Agaba-Embedded4/Combined-Telecom-QnA`**

This dataset combines multiple telecom-focused datasets and contains question-answer pairs labeled under `question` and `answer` columns.

### Preprocessing Strategy

We use an **Alpaca-style prompt template** to convert the dataset into a format suitable for instruction fine-tuning. Each example is transformed into the following structure:



In [None]:
from datasets import load_dataset

# prompt template for training the model
# Adapted to use 'Instruction' for 'question' and 'Response' for 'answer'.
# The 'Input' section is left empty as there's no corresponding column in the new dataset.
alpaca_template = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

# Load the new dataset: Agaba-Embedded4/Combined-Telecom-QnA
# Specify the split as 'train' if that's what you need, or adjust as necessary.
telecom_qna_dataset = load_dataset("Agaba-Embedded4/Combined-Telecom-QnA", split="train")

# Preprocess the dataset using the new 'question' and 'answer' columns
telecom_qna_dataset = telecom_qna_dataset.map(
    lambda x: {
        "text": alpaca_template.format(x["question"], "", x["answer"])
    }
)

# You can print the first few examples to verify the new 'text' column
print("First 5 examples of the processed dataset:")
for i in range(min(5, len(telecom_qna_dataset))):
    print(f"--- Example {i+1} ---")
    print(telecom_qna_dataset[i]["text"])
    print("\n")

# You can also check the new features/columns of the dataset
print("Dataset features after mapping:", telecom_qna_dataset.features)


## **Section 5: Dataset Sharding for Resource-Constrained Training**

To manage **limited compute resources**, this section introduces a simple helper function to **split the dataset into shards**. This allows the model to be fine-tuned on smaller portions (fractions) of the dataset one at a time.

### Why Shard the Dataset?

Fine-tuning large models like Mistral-7B, even with 4-bit quantization, can be resource-intensive. Sharding the dataset makes it possible to:
- Run training in stages across multiple sessions.
- Perform progressive fine-tuning by rotating through different segments.

### Helper Function: `get_fifth()`

This function splits the dataset into **25 equal shards** and returns the shard specified by the `part` index.



In [None]:
# helper to grab any “fifth” (0 through 4)
def get_fifth(dataset, part: int):
    """
    Returns the `part`-th fifth of `dataset` (part in [0..4]).
    """
    assert 0 <= part < 25, "`part` must be 0,1,2,3, or 4"
    return dataset.shard(num_shards=25, index=part)

# first one‑fifth:
first_fifth = get_fifth(telecom_qna_dataset, 0)
first_fifth

In [None]:
telecom_qna_dataset

## **Section 6: Formatting the Dataset Using Mistral’s Chat Template**

This section formats the dataset into a **chat-style structure** to align with how instruction-tuned models like Mistral are trained and expected to respond during inference.

### System Prompt Definition

A system-level instruction is defined to guide the model’s behavior consistently

In [None]:

# Define your system prompt for the chat template
instruction = """You are a professional assistant, answer the questions asked correctly."""

# Define a formatting function for your dataset using the chat template
# This function takes a row from the dataset and the tokenizer.
# IMPORTANT: This function assumes 'row' has 'question' and 'answer' columns.
def format_chat_template(row):
    # Structure the conversation into roles: system, user (question), assistant (answer)
    row_json = [
        {"role": "system", "content": instruction},
        # Use 'question' column for the user's instruction (as per Combined-Telecom-QnA)
        {"role": "user", "content": row["question"]},
        # Use 'answer' column for the assistant's response (as per Combined-Telecom-QnA)
        {"role": "assistant", "content": row["answer"]}
    ]
    # Apply the chat template using the tokenizer.
    # tokenize=False ensures that the output is a string, not a list of token IDs.
    row["text"] = tokenizer.apply_chat_template(row_json, tokenize=False)
    return row

dataset = first_fifth.map(format_chat_template, num_proc=4)

In [None]:
dataset["text"][0]

In [None]:
dataset = dataset.train_test_split(test_size=0.1)

## **Section 7: Applying Parameter-Efficient Fine-Tuning (PEFT) with LoRA**

In this section, we prepare the Mistral-7B model for **parameter-efficient fine-tuning (PEFT)** using the **LoRA (Low-Rank Adaptation)** technique, implemented via the `Unsloth` framework.

LoRA significantly reduces the number of trainable parameters by introducing lightweight low-rank matrices into specific layers of the model, allowing for:
- Faster training
- Lower memory usage
- Reusability of the base model

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

## **Section 8: Setting Up the Trainer and Training Configuration**

This section initializes the training pipeline using `SFTTrainer` (Supervised Fine-Tuning Trainer) from the **TRL (Transformers Reinforcement Learning)** library, in combination with `transformers.TrainingArguments`.

We also define hardware-aware settings (like `fp16` or `bf16`) and data collators for preparing batches efficiently.

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    dataset_text_field = "text",
    max_seq_length = 2048,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size=2,
        per_device_eval_batch_size=2,
        gradient_accumulation_steps=4,
        eval_strategy="steps",
        eval_steps=1000,
        warmup_steps = 5,
        #num_train_epochs = 2, # Set this for 1 full training run.
        max_steps = 5400,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 5,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

## **Section 9: Fine-Tuning Only on Model Responses**

In this section, we refine the training behavior by **training only on the assistant's response**, rather than the full instruction-response sequence. This technique is useful for **response-focused optimization**, helping the model concentrate learning effort on generating high-quality answers.

---

### Why Train on Responses Only?

By skipping backpropagation over the input (instruction) part, we:
- Save compute and memory
- Focus on response generation quality
- Avoid penalizing or overfitting on instructions, which are often static

---

In [None]:
from unsloth.chat_templates import train_on_responses_only

trainer = train_on_responses_only(
    trainer,
    instruction_part = "[INST]",    # marks the start of the *user* chunk
    response_part    = "[/INST]",   # everything after this is the *assistant* reply
    tokenizer        = tokenizer,   # your mistralai/Mistral-7B-Instruct-v0.1 tokenizer
)


In [None]:
print(trainer.train_dataset[0]["text"])

In [None]:
trainer_stats = trainer.train()

## **Section 10: Testing the Fine-Tuned Model with a Sample Telecom Query**

**Evaluating the fine-tuned model's performance** by generating a response to a sample telecom-specific question using the Mistral chat template.

---

In [None]:
# switch your model into its 2×‑faster inference mode
model = FastLanguageModel.for_inference(model)

# build a standard chat history
messages = [
    {"role": "system", "content": instruction},
    {"role": "user",   "content": "What is the purpose of the Nmfaf_3daDataManagement_Deconfigure service operation?."}   ##PLUG IN A TEST QUESTION TO C HOW MODEL PERFORM
]

# render it with the Mistral chat template
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# tokenize + move to device
inputs = tokenizer(
    prompt,
    return_tensors="pt",
    padding=True,
    truncation=True
).to("cuda")

# generate
outputs = model.generate(
    **inputs,
    max_new_tokens=150,
    num_return_sequences=1,
)

# decode & strip off everything up through the assistant marker
text = tokenizer.decode(outputs[0], skip_special_tokens=True)

# for Mistral you probably want everything after the "]" of your closing INST tag:
result = text.split("[/INST]")[-1].strip()
print(result)


In [None]:
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
outputs = model.generate(input_ids = inputs.input_ids, attention_mask = inputs.attention_mask,
                   streamer = text_streamer, max_new_tokens = 128)

## **Section 11: Saving and Uploading the Fine-Tuned Model to Hugging Face Hub**

Once training and testing are complete, the model and tokenizer are **saved locally** and then **pushed to the Hugging Face Hub** for easy retraining, deployment, or sharing.

In [None]:
# 2️⃣ Save ONLY the PyTorch weights (.bin) + tokenizer
checkpoint = "EYEDOL/MISTRAL7B_ON_TELE"
model.save_pretrained(checkpoint)         # writes pytorch_model.bin + config.json
tokenizer.save_pretrained(checkpoint)     # writes tokenizer.json + vocab files :contentReference[oaicite:0]{index=0}

# 3️⃣ Push that repo to the Hub (will contain only .bin, config, tokenizer files)
model.push_to_hub(checkpoint, use_auth_token=True)
tokenizer.push_to_hub(checkpoint, use_auth_token=True)


In [None]:
#from unsloth import FastLanguageModel
#from transformers import AutoTokenizer

#model_d, tokenizer = FastLanguageModel.from_pretrained(
#    "EYEDOL/MISTRAL7B_ON_ALPACA1",
#    load_in_4bit=False,
#    use_gradient_checkpointing=True
#)