#### 1. Install required libraries
#### 2. Loading the Pre-Trained model and Tokenization
#### 3. Loading the dataset
#### 3. Create Bitsandbytes configuration
#### 5. Test the Model with Zero Shot Inferencing
#### 8. Preparing the model for QLoRA
#### 9. Setup PEFT for Fine-Tuning
#### 10. Train PEFT Adapter
#### 11. Evaluate the Model Qualitatively (Human Evaluation)
#### 12. Evaluate the Model Quantitatively (with ROUGE Metric)

### 1. Install required libraries

In [None]:
!pip install -q accelerate peft bitsandbytes transformers trl

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.1/69.1 MB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m365.7/365.7 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m24.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.3/179.3 kB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m14.5 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the foll

Let’s understand the importance of some of these libraries.

**accelerate:** A library that simplifies the process of scaling machine learning models across various hardware setups, including multiple GPUs and TPUs, without changing model code. It helps optimize performance for deep learning tasks.

**peft (Parameter-Efficient Fine-Tuning):** A library designed for fine-tuning large language models using fewer parameters. It helps to reduce computational costs by fine-tuning only a small portion of the model, making it more efficient.

**bitsandbytes:** A library for 8-bit optimizers and quantization techniques that reduce the memory and computational footprint of large models, allowing users to train and run large models on limited hardware resources.

**transformers:** Hugging Face's core library for state-of-the-art natural language processing (NLP) models, providing easy-to-use implementations of various models like BERT, GPT, T5, etc., along with pre-trained versions for a wide range of tasks.

**trl (Transformers Reinforcement Learning):** A Hugging Face library that integrates reinforcement learning algorithms with transformer models, enabling advanced fine-tuning of language models using reward-based training. It's often used for tasks like aligning models with specific goals or user preferences.

In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

#### Loading the required libraries

In [None]:
# Installing More Dependencies
import torch  # PyTorch library for tensor operations, deep learning, and GPU support
from datasets import load_dataset, Dataset
# 'load_dataset' is used to load predefined datasets from the Hugging Face library,
# and 'Dataset' allows the creation and manipulation of custom datasets

from peft import LoraConfig, AutoPeftModelForCausalLM
# 'LoraConfig' is a configuration class for setting up Low-Rank Adaptation (LoRA),
# a technique to fine-tune only a few layers or parameters to save computational resources.
# 'AutoPeftModelForCausalLM' automatically loads a causal language model that supports parameter-efficient fine-tuning like LoRA.

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments
# 'AutoModelForCausalLM' loads a pre-trained language model for causal language modeling tasks (e.g., autoregressive text generation).
# 'AutoTokenizer' is used to load a tokenizer that processes raw text into tokens the model can understand.
# 'BitsAndBytesConfig' is a configuration class to apply memory-efficient model loading techniques like 4-bit quantization.
# 'TrainingArguments' defines various parameters for training the model, such as learning rate, batch size, and output directories.

from trl import SFTTrainer
# 'SFTTrainer' is a trainer class from the 'trl' library used for Supervised Fine-Tuning (SFT),
# often applied to fine-tune large language models, potentially in conjunction with reinforcement learning.

import os  # Python's built-in module to interact with the operating system,
# used to manage files, directories, and environment variables.

import pandas as pd

In [None]:
# Configuring the quantization parameters for loading a model in 4-bit precision using the bitsandbytes library
bnb_config = BitsAndBytesConfig(

    # Load the model using 4-bit quantization
    load_in_4bit=True,  # Enables 4-bit quantization to reduce memory usage by storing model weights in 4-bit precision

    # Specify the type of 4-bit quantization to use. Options are 'fp4' (standard 4-bit) and 'nf4' (normalized floating-point 4-bit)
    bnb_4bit_quant_type="nf4",  # Using NF4 (Normalized Floating Point 4-bit) for better accuracy preservation during quantization

    # Specify the data type used for computations when model weights are in 4-bit. Options are 'float16' and 'bfloat16'
    bnb_4bit_compute_dtype="float16",  # Using float16 for computation to balance memory savings and precision

    # Enable double quantization, which quantizes weights first to 8-bit and then to 4-bit for better memory efficiency
    bnb_4bit_use_double_quant=True  # Double quantization helps to save more memory while retaining model performance
)

#### **Explanation:**
#### - **`load_in_4bit=True`:** Loads the model with 4-bit precision, which is essential for saving memory in large models.
####  - **`bnb_4bit_quant_type="nf4"**: Uses the NF4 quantization method, which normalizes weight values for better precision retention.
####  - **`bnb_4bit_compute_dtype="float16"`:** Performs computations in float16 precision, commonly used on GPUs for speed and efficiency.
####  - **`bnb_4bit_use_double_quant=True`:** Adds another layer of quantization (from 8-bit to 4-bit), which can provide further memory savings.


### 3. Loading the Pre-Trained model and Tokenization

In [None]:
# Set the model ID
model_id="meta-llama/Meta-Llama-3.1-8B"
output_model = "CyberDost1.2"

In [None]:
def get_model_and_tokenizer(model_id):
  tokenizer = AutoTokenizer.from_pretrained(model_id)
  tokenizer.pad_token = tokenizer.eos_token

  model = AutoModelForCausalLM.from_pretrained(
      model_id, quantization_config=bnb_config, device_map="auto"
  )
  model.config.use_cache=False
  model.config.pretraining_tp=1
  return model, tokenizer
model, tokenizer = get_model_and_tokenizer(model_id)

tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

In [None]:
def formatted_train(input,response)->str:
    return f"<|im_start|>user\n{input}<|im_end|>\n<|im_start|>assistant\n{response}<|im_end|>\n"

### 3. Loading the dataset and Preprocessing data

In [None]:
from google.colab import drive
drive.mount("/content/drive/")

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [None]:
import pandas as pd
from datasets import Dataset  # Ensure `datasets` library is installed
from sklearn.model_selection import train_test_split

def prepare_train_validate_data(csv_file, test_size=0.2, random_state=42):
    # Try reading the file with different encodings if the default 'utf-8' fails
    try:
        data_df = pd.read_csv(csv_file, encoding="utf-8")
    except UnicodeDecodeError:
        data_df = pd.read_csv(csv_file, encoding="ISO-8859-1")  # Common alternative encoding

    # Fill NaN values with empty strings to avoid TypeError
    data_df[["Question", "Answer"]] = data_df[["Question", "Answer"]].fillna("")

    # Ensure that the columns are named 'Question' and 'Answer'
    if 'Question' not in data_df.columns or 'Answer' not in data_df.columns:
        raise ValueError("The CSV file must contain 'Question' and 'Answer' columns.")

    # Create a new column called "text" that formats the data for fine-tuning
    data_df["text"] = data_df[["Question", "Answer"]].apply(
        lambda x: "<|im_start|>user\n" + x["Question"] + " <|im_end|>\n<|im_start|>assistant\n" + x["Answer"] + "<|im_end|>\n", axis=1
    )

    # Split the dataset into training and validation sets
    train_df, val_df = train_test_split(data_df, test_size=test_size, random_state=random_state)

    # Create Hugging Face Datasets from the DataFrames
    train_dataset = Dataset.from_pandas(train_df)
    val_dataset = Dataset.from_pandas(val_df)

    return train_dataset, val_dataset

train_dataset, val_dataset=prepare_train_validate_data("/content/drive/MyDrive/Final_Projects/CyberSecurityDatasetQA.csv", test_size=0.2, random_state=42)

In [None]:
import torch
import os

# Set PYTORCH_CUDA_ALLOC_CONF to handle fragmentation
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

# LoRA Config (Reduced rank and alpha)
peft_config = LoraConfig(
    r=4,  # Reduced rank
    lora_alpha=8,  # Reduced alpha
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Training arguments with disabled push to hub and reduced batch size
training_arguments = TrainingArguments(
    output_dir=output_model,
    per_device_train_batch_size=2,  # Reduced batch size
    gradient_accumulation_steps=32,  # Increased gradient accumulation
    optim="paged_adamw_32bit",
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    save_strategy="epoch",
    logging_steps=10,
    num_train_epochs=3,  # Removed max_steps for simpler control
    fp16=True,  # Mixed precision training
    push_to_hub=True  # Temporarily disable pushing to hub
)

# Trainer with reduced sequence length
trainer = SFTTrainer(
    model=model,
    train_dataset=data,
    peft_config=peft_config,
    dataset_text_field="text",
    args=training_arguments,
    tokenizer=tokenizer,
    packing=False,
    max_seq_length=512  # Reduced max sequence length
)

# Enable gradient checkpointing to reduce memory consumption
model.gradient_checkpointing_enable()

# Training loop with frequent cache clearing
for step in range(training_arguments.max_steps):
    trainer.train()
    torch.cuda.empty_cache()  # Clear after each step to avoid memory overflow
trainer.push_to_hub()


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Map:   0%|          | 0/927 [00:00<?, ? examples/s]

adapter_model.safetensors:   0%|          | 0.00/6.83M [00:00<?, ?B/s]

Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.56k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/Bakhshial/CyberDost1.2/commit/8d8a2f3bb565f616f2bd03bcdf05664c63108184', commit_message='End of training', commit_description='', oid='8d8a2f3bb565f616f2bd03bcdf05664c63108184', pr_url=None, repo_url=RepoUrl('https://huggingface.co/Bakhshial/CyberDost1.2', endpoint='https://huggingface.co', repo_type='model', repo_id='Bakhshial/CyberDost1.2'), pr_revision=None, pr_num=None)

In [None]:
training_arguments = TrainingArguments(
    output_dir=output_model,
    per_device_train_batch_size=1,  # Reduce batch size
    gradient_accumulation_steps=64,  # Increase accumulation steps
    optim="adamw_torch",
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    save_strategy="epoch",
    logging_steps=10,
    evaluation_strategy="steps",
    eval_steps=50,
    num_train_epochs=3,
    fp16=True,  # Mixed precision
    push_to_hub=False
)

# Clear GPU memory before training
torch.cuda.empty_cache()

# Gradient checkpointing
model.gradient_checkpointing_enable()

# Initialize Trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    args=training_arguments,
    tokenizer=tokenizer,
    packing=False,
    max_seq_length=512
)

try:
    trainer.train()
except RuntimeError as e:
    print(f"RuntimeError during training: {e}")
    torch.cuda.empty_cache()


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Map:   0%|          | 0/909 [00:00<?, ? examples/s]

Map:   0%|          | 0/228 [00:00<?, ? examples/s]



RuntimeError during training: CUDA out of memory. Tried to allocate 1002.00 MiB. GPU 0 has a total capacity of 14.75 GiB of which 195.06 MiB is free. Process 37209 has 14.55 GiB memory in use. Of the allocated memory 13.96 GiB is allocated by PyTorch, and 474.78 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)


In [None]:
# Save the model locally
trainer.save_model("/content/drive/MyDrive/Project_work/cyberdost1.1")  # Specify the directory where the model will be saved

# Optionally, save the tokenizer and other resources
tokenizer.save_pretrained("/content/drive/MyDrive/Project_work/cyberdost1.1")

No files have been modified since last commit. Skipping to prevent empty commit.


('/content/drive/MyDrive/Project_work/cyberdost1.1/tokenizer_config.json',
 '/content/drive/MyDrive/Project_work/cyberdost1.1/special_tokens_map.json',
 '/content/drive/MyDrive/Project_work/cyberdost1.1/tokenizer.json')

In [None]:
!pip install -q accelerate peft bitsandbytes transformers trl

In [None]:
# Installing More Dependencies
import torch
from datasets import load_dataset, Dataset
from peft import LoraConfig, AutoPeftModelForCausalLM
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments
from trl import SFTTrainer
import os

In [None]:
#model_id = "/content/drive/MyDrive/Project_work/cyberdost1.1"
model_id='Bakhshial/CyberDost1.1'
# from peft import PeftModel, PeftConfig
# from transformers import AutoModelForCausalLM

# config = PeftConfig.from_pretrained("Bakhshial/CyberDost1.1")
# base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3.1-8B")
# model = PeftModel.from_pretrained(base_model, "Bakhshial/CyberDost1.1")

In [None]:
def get_model_and_tokenizer(model_id):
  tokenizer = AutoTokenizer.from_pretrained(model_id)
  tokenizer.pad_token = tokenizer.eos_token
  bnb_config = BitsAndBytesConfig(
      load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype="float16", bnb_4bit_use_double_quant=True
  )
  model = AutoModelForCausalLM.from_pretrained(
      model_id, quantization_config=bnb_config, device_map="auto"
  )
  model.config.use_cache=False
  model.config.pretraining_tp=1
  return model, tokenizer

In [None]:
model, tokenizer = get_model_and_tokenizer(model_id)

In [None]:
from transformers import GenerationConfig
from time import perf_counter
def generate_response(user_input):
  prompt = formatted_prompt(user_input)
  inputs = tokenizer([prompt], return_tensors="pt")
  generation_config = GenerationConfig(penalty_alpha=0.6,do_sample = True,
      top_k=5,temperature=0.5,repetition_penalty=1.2,
      max_new_tokens=60,pad_token_id=tokenizer.eos_token_id
  )
  start_time = perf_counter()
  inputs = tokenizer(prompt, return_tensors="pt").to('cuda')
  outputs = model.generate(**inputs, generation_config=generation_config)
  theresponse = (tokenizer.decode(outputs[0], skip_special_tokens=True))
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
  output_time = perf_counter() - start_time
  print(f"Time taken for inference: {round(output_time,2)} seconds")

In [None]:
def formatted_prompt(question)-> str:
    return f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant:"

In [None]:
generate_response(user_input='what is phishing email?')

<|im_start|>user
What is a firewall in cybersecurity?<|im_end|>
<|im_start|>assistant: What is a firewall in cybersecurity?
A firewall is a network security device that monitors incoming and outgoing network traffic and decides whether to allow or block the data flow based on an applied set of security rules.
Firewalls are often categorized as either “stateful” or “stateless.” Stateful firewalls
Time taken for inference: 6.02 seconds


In [None]:
generate_response(user_input='What is a firewall in cybersecurity?')