## Training Pipeline :

Here, I will fine-tune Qwen 2.5 model based on the following reasons ..

* Licence : Some model licenses only allow non-commercial work, which is a problem if we want to fine-tune for a company.

* Budget: Models with smaller parameter sizes (<10 B) are a lot cheaper to fine-tune and deploy for inference than larger models. This is due to the fact that they can be run on cheaper GPUs and process more tokens per second.

* Performance : Qwen performs well in coding, maths and general tasks

* Multilingual capabilities : Qwen 2.5 supports over 29 languages

I will Fine-tune using the Unsloth Library ( to speed up training by 2-5x)  which is based on trl ( a huggingface tool used to train LLMs using SFT and preference alignment)

In [None]:
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

from torch import __version__; from packaging.version import Version as V
xformers = "xformers==0.0.27" if V(__version__) < V("2.4.0") else "xformers"
!pip install --no-deps {xformers} trl peft accelerate bitsandbytes triton

In [None]:
!pip install comet-ml>=3.43.2 -q

In [None]:
from google.colab import userdata

In [None]:
HF_TOKEN = userdata.get("HF_TOKEN")
COMET_API_KEY = userdata.get("COMET_API_KEY")

In [None]:
import os

In [None]:
os.environ["COMET_API_KEY"] = COMET_API_KEY

In [None]:
import comet_ml
import torch
import unsloth
from trl import SFTTrainer
from datasets import load_dataset, concatenate_datasets
from transformers import TrainingArguments, TextStreamer
from unsloth import FastLanguageModel, is_bfloat16_supported


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [None]:
max_seq_length = 2048
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen2.5-7B",
    max_seq_length = max_seq_length,
    load_in_4bit = True,
)

==((====))==  Unsloth 2025.3.19: Fast Qwen2 patching. Transformers: 4.50.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 64,
    lora_alpha = 32,
    lora_dropout = 0,
    target_modules=["q_proj", "k_proj", "v_proj", "up_proj", "down_proj", "o_proj", "gate_proj"],
)

Unsloth 2025.3.19 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


In [None]:
cyber_instruct_dataset = load_dataset("Tiamz/cybersecurity-instruction-dataset")

In [None]:
train_ds = cyber_instruct_dataset['train']
eval_ds = cyber_instruct_dataset['eval']

In [None]:
train_ds

Dataset({
    features: ['instruction', 'answer'],
    num_rows: 12190
})

In [None]:
eval_ds

Dataset({
    features: ['instruction', 'answer'],
    num_rows: 1000
})

In [None]:
alpaca_template = """Below is an instruction that describes a task.
                    Write a response that appropriately completes the request.

### Instruction:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token

def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    outputs = examples["answer"]
    texts = []
    for instruction, output in zip(instructions, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_template.format(instruction, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }


train_ds = train_ds.map(formatting_prompts_func, batched = True)
eval_ds = eval_ds.map(formatting_prompts_func, batched = True)



Map:   0%|          | 0/12190 [00:00<?, ? examples/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [None]:
eval_ds[0]

{'instruction': 'What is the main work product of a security assessment addressed to management?',
 'answer': 'Assessment Report',
 'text': 'Below is an instruction that describes a task.\n                    Write a response that appropriately completes the request.\n\n### Instruction:\nWhat is the main work product of a security assessment addressed to management?\n\n### Response:\nAssessment Report<|endoftext|>'}

In [None]:
training_args = TrainingArguments(
        learning_rate=2e-4,
        lr_scheduler_type="linear",
        per_device_train_batch_size=2,
        gradient_accumulation_steps=8,
        num_train_epochs=3,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        warmup_steps=10,
        output_dir="output",
        report_to="comet_ml",
        seed=3407,
        max_steps = 60,
        evaluation_strategy = "steps",
        eval_steps = 5,

    )



In [None]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_ds,
    eval_dataset=eval_ds,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False, # makes training faster for short sequences which is what our cybersecurity answer looks like
    args=training_args
)

trainer.train()

comet_ml.get_global_experiment().end()



Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/12190 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/1000 [00:00<?, ? examples/s]

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 12,190 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 8
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 8 x 1) = 16
 "-____-"     Trainable parameters = 161,480,704/7,000,000,000 (2.31% trained)
[1;38;5;39mCOMET INFO:[0m An experiment with the same configuration options is already running and will be reused.


Step,Training Loss,Validation Loss
5,1.3548,1.149718
10,1.1553,1.162394
15,1.0747,1.164142
20,1.2194,1.168061
25,1.211,1.169112
30,1.2208,1.148807
35,1.1697,1.142604
40,1.1749,1.151671
45,1.273,1.144813
50,1.1947,1.138851


[1;38;5;39mCOMET INFO:[0m ---------------------------------------------------------------------------------------
[1;38;5;39mCOMET INFO:[0m Comet.ml Experiment Summary
[1;38;5;39mCOMET INFO:[0m ---------------------------------------------------------------------------------------
[1;38;5;39mCOMET INFO:[0m   Data:
[1;38;5;39mCOMET INFO:[0m     display_summary_level : 1
[1;38;5;39mCOMET INFO:[0m     name                  : yummy_fish_1405
[1;38;5;39mCOMET INFO:[0m     url                   : https://www.comet.com/aboomardiyyah/general/ca6e44b9f6c94dd4bf3fd3e704972c02
[1;38;5;39mCOMET INFO:[0m   Metrics [count] (min, max):
[1;38;5;39mCOMET INFO:[0m     eval/loss [15]                 : (1.1377638578414917, 1.7482560873031616)
[1;38;5;39mCOMET INFO:[0m     eval/runtime [15]              : (111.5359, 138.8294)
[1;38;5;39mCOMET INFO:[0m     eval/samples_per_second [15]   : (7.203, 8.966)
[1;38;5;39mCOMET INFO:[0m     eval/steps_per_second [15]     : (3.602, 4.483)
[1

In [None]:
instruction = alpaca_template.format(
    "What is the purpose of periodic reviews of user accounts and access privileges?.", ""
)

In [None]:
FastLanguageModel.for_inference(model)
inputs = tokenizer([instruction], return_tensors="pt").to("cuda")

text_streamer = TextStreamer(tokenizer)

_ = model.generate(
    **inputs,
    streamer=text_streamer,
    max_new_tokens=256,
    use_cache=True
)

Below is an instruction that describes a task.
                    Write a response that appropriately completes the request.

### Instruction:
What is the purpose of periodic reviews of user accounts and access privileges?.

### Response:
To ensure that access privileges are appropriate and necessary<|endoftext|>


In [None]:
model.save_pretrained_merged("model", tokenizer, save_method="merged_16bit")

model.push_to_hub_merged("Tiamz/CyberQwen2.5-7B", tokenizer,
save_method="merged_16bit")

Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which might take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 7.5G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 4.74 out of 12.67 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


 14%|█▍        | 4/28 [00:00<00:02, 10.51it/s]
We will save to Disk and not RAM now.
100%|██████████| 28/28 [01:17<00:00,  2.77s/it]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving model/pytorch_model-00001-of-00004.bin...
Unsloth: Saving model/pytorch_model-00002-of-00004.bin...
Unsloth: Saving model/pytorch_model-00003-of-00004.bin...
Unsloth: Saving model/pytorch_model-00004-of-00004.bin...
Done.


Unsloth: You are pushing to hub, but you passed your HF username = Tiamz.
We shall truncate Tiamz/CyberQwen2.5-7B to CyberQwen2.5-7B


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 4.74 out of 12.67 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 28/28 [01:05<00:00,  2.32s/it]


Unsloth: Saving tokenizer...

  0%|          | 0/1 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

 Done.
Unsloth: Saving CyberQwen2.5-7B/pytorch_model-00001-of-00004.bin...
Unsloth: Saving CyberQwen2.5-7B/pytorch_model-00002-of-00004.bin...
Unsloth: Saving CyberQwen2.5-7B/pytorch_model-00003-of-00004.bin...
Unsloth: Saving CyberQwen2.5-7B/pytorch_model-00004-of-00004.bin...


README.md:   0%|          | 0.00/588 [00:00<?, ?B/s]

  0%|          | 0/4 [00:00<?, ?it/s]

pytorch_model-00003-of-00004.bin:   0%|          | 0.00/4.33G [00:00<?, ?B/s]

pytorch_model-00001-of-00004.bin:   0%|          | 0.00/4.88G [00:00<?, ?B/s]

pytorch_model-00004-of-00004.bin:   0%|          | 0.00/1.09G [00:00<?, ?B/s]

pytorch_model-00002-of-00004.bin:   0%|          | 0.00/4.93G [00:00<?, ?B/s]

Done.
Saved merged model to https://huggingface.co/Tiamz/CyberQwen2.5-7B
