<a href="https://colab.research.google.com/github/MohamedElquesni/Large-Language-Models-for-Cybersecurity/blob/main/LLm's_for_Cybersecurity_Tasks_(Thesis).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧠 Fine-Tuning LLMs for Cybersecurity Tasks using Unsloth

This notebook demonstrates how to fine-tune various decoder-based LLMs (e.g., `Mistral`, `DeepSeek`, etc.) using the [Unsloth](https://github.com/unslothai/unsloth) library with QLoRA (4-bit) for efficient, low-resource training.

We'll apply this to a range of cybersecurity tasks, including:

- 📧 Phishing Email Detection
- 📄 Log Anomaly Detection
- 🧠 Threat Intelligence Extraction
- 🚨 Automated Incident Response
- 🔍 Threat Hunting Reasoning

We'll use:
- 🦥 **Unsloth** for resource-efficient LLM fine-tuning
- 💾 Optional GGUF export for deployment with Ollama or llama.cpp
- ⚙️ Modular architecture to support different LLMs per task (e.g., Mistral for phishing, DeepSeek for log analysis, etc.)

> 🛠️ Note: You can set the model of choice per domain by configuring the `MODEL_NAME` and `DOMAIN` in the setup section.


# I - 📧 Email Phishing Detection – Mistral-7B Prompt Engineering & Inference

This section focuses on using `Mistral-7B` to detect phishing emails through **prompt engineering**, without any fine-tuning or training.

The model is prompted to return **structured JSON responses** with the following fields:
- `"Is_Phishing"`: Whether the email is phishing (true/false)
- `"Risk"`: Risk level – High, Medium, or Low
- `"Suspicious_Links"`: Any suspicious links present
- `"Social_Engineering_Elements"`: Techniques like urgency, fear, enticement, impersonation, etc.
- `"Actions"`: Recommended next steps (e.g., delete, report, ignore)
- `"Reason"`: Brief explanation for the decision

This is a **zero-shot / few-shot inference setup**, designed to verify the model’s phishing detection performance using crafted prompts.

🔐 No fine-tuning, training, or model weight modification is done. The focus is strictly on prompt engineering and evaluation through inference.


## 🧱 Section 1 – Complete Setup (Install, Load, Patch, Test)

This section sets up the environment for **prompt-based inference** using the `Mistral-7B` model.

It includes:

1. Installing required dependencies
2. Loading the `Mistral-7B` model (4-bit or other optimized format)
3. Patching or configuring inference settings (if needed)
4. Testing the model with a basic phishing detection prompt

✅ Run this once per session to initialize the environment for **prompt engineering and phishing detection inference**.

⚠️ No fine-tuning, LoRA, QLoRA, or training is performed in this notebook. The focus is purely on zero-shot or few-shot **inference and evaluation**.


### 📦 1.1 – Install Dependencies

We'll begin by installing the required libraries to run **Mistral-7B** for phishing detection using **prompt engineering only**.

This installs:
- `transformers` for loading the Mistral-7B model  
- `bitsandbytes` for efficient 4-bit model loading (optional)  
- `accelerate` for device handling  
- `torch` for GPU support  

No training, fine-tuning, or adapter loading is required — this setup is for **inference only**.


In [1]:
# ✅ Clean install for inference-only use (no training tools)
!pip install -q transformers accelerate bitsandbytes torch


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.1/76.1 MB[0m [31m31.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m84.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m88.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m50.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m44.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

### 📦 1.2 – Load the Mistral-7B Model for Inference

In this step, we load the `Mistral-7B-Instruct-v0.2` model from Hugging Face in 4-bit precision using the `transformers` library.

This model is used **only for inference** — no fine-tuning or LoRA adapter loading is involved.

We also configure padding tokens and device placement (CPU/GPU) for smooth operation in Google Colab.


In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# ✅ Choose the official Mistral 7B Instruct model
model_name = "unsloth/mistral-7b-instruct-v0.3-bnb-4bit"


# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load the model (4-bit for memory efficiency)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
    load_in_4bit=True  # ✅ Optional: Uses bitsandbytes
)

# Configure tokenizer padding
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

print("✅ Mistral-7B loaded successfully and ready for inference.")


tokenizer_config.json:   0%|          | 0.00/141k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/446 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/4.14G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/157 [00:00<?, ?B/s]

✅ Mistral-7B loaded successfully and ready for inference.


### 🧪 1.3 – Test the Model Before Training

Now that the `Mistral-Small-Instruct-2409` model is loaded and patched with LoRA, let’s test how it responds *before* any fine-tuning.

This helps you:
- ✅ Confirm the model and tokenizer are working end-to-end  
- 📊 Get a baseline (pre-trained) response to your phishing prompt  
- 🐛 Debug any issues with prompt formatting or tokenization


In [3]:
import re
import torch
from torch import inference_mode

# Sample phishing email
email_body = """From: notifications@corpfiles.com
Subject: Shared Document – Q2 Budget Final.xlsx

Hi,

John has shared a document with you:
📎 **Q2 Budget Final.xlsx**

You can access it securely here:
http://corpfiles.com/download/Q2BudgetFinal

Please review before tomorrow’s meeting.

Best regards,
Finance Department
"""

# Instruction-based phishing detection prompt
prompt = f"""### Instruction:
You are a cybersecurity expert working in a company's Security Operations Center (SOC).

Your task is to analyze the following email and return a structured JSON response. Be extremely strict and assume worst-case risk posture when any of the following are present:

- A link to a document/file from an unfamiliar or suspicious domain (e.g. fake Google Drive, Dropbox, corpfiles.net instead of corpfile.com)
- Urgent language or pressure to act quickly
- Generic greetings ("Hi", "Dear user") with no name
- Requests to click, download, or input sensitive data
- Email sender addresses mimicking known brands or internal departments
- Unexpected attachments or shared documents
- Impersonation of executives, HR, IT, or Finance
- Spelling mistakes or inconsistencies in formatting

### Respond in **this exact JSON format**:
{{
  "Is_Phishing": boolean,
  "Risk": "High" | "Medium" | "Low",
  "Suspicious_Links": ["..."],
  "Social_Engineering_Elements": ["..."],
  "Actions": ["..."],
  "Reason": "..."
}}

### Email:
{email_body}

### Response:
"""


# 🔁 Run model inference
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

with inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.0,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

# 🧠 Full decoded output
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("\n🧠 Full Raw Output:\n")
print(decoded_output)

# ✅ Extract the last JSON object using regex
matches = list(re.finditer(r"\{[\s\S]+?\}", decoded_output))
if matches:
    clean_json = matches[-1].group()
    print("\n✅ Extracted JSON Only:\n")
    print(clean_json)
else:
    print("\n⚠️ Could not extract JSON block cleanly.")





🧠 Full Raw Output:

### Instruction:
You are a cybersecurity expert working in a company's Security Operations Center (SOC).

Your task is to analyze the following email and return a structured JSON response. Be extremely strict and assume worst-case risk posture when any of the following are present:

- A link to a document/file from an unfamiliar or suspicious domain (e.g. fake Google Drive, Dropbox, corpfiles.net instead of corpfile.com)
- Urgent language or pressure to act quickly
- Generic greetings ("Hi", "Dear user") with no name
- Requests to click, download, or input sensitive data
- Email sender addresses mimicking known brands or internal departments
- Unexpected attachments or shared documents
- Impersonation of executives, HR, IT, or Finance
- Spelling mistakes or inconsistencies in formatting

### Respond in **this exact JSON format**:
{
  "Is_Phishing": boolean,
  "Risk": "High" | "Medium" | "Low",
  "Suspicious_Links": ["..."],
  "Social_Engineering_Elements": ["..."],

## 🧪 Section 2 – Phishing Email Detection with Structured JSON Output

This section benchmarks the `Mistral-7B` model on phishing email detection using **prompt engineering only**. The model operates in a zero-shot setting to analyze email content and return structured JSON responses that are easily machine-readable—ideal for integration into automated workflows within a Security Operations Center (SOC).

The full evaluation pipeline consists of:

---

### 📥 Dataset Handling

We begin by uploading a phishing dataset and computing high-level statistics. The dataset is previewed, and a structured prompt is dynamically generated for each email sample.

---

### 🤖 Inference via Prompt Engineering

Each email is embedded into a standardized prompt that instructs the model to analyze the message and respond in a **strict JSON format**, such as:

```json
{
  "Is_Phishing": true,
  "Risk": "High",
  "Suspicious_Links": ["..."],
  "Social_Engineering_Elements": ["..."],
  "Actions": ["..."],
  "Reason": "..."
}


### 2.1 🗂️ Dataset: `UTwente` (Binary Email Classification)

This dataset contains labeled email samples for phishing detection. Each sample includes the full email content and a binary label indicating whether the email is safe or malicious.

Each entry consists of:
- `Email Text`: the complete email body, potentially including subject lines
- `Email Type`: the true classification of the email:
  - `Phishing Email` → malicious
  - `Safe Email` → legitimate

This dataset is already preprocessed and balanced across the two classes. Once uploaded, we will:
- Map the labels to binary values (1 = phishing, 0 = safe)
- Count phishing vs. safe samples
- Preview sample emails
- Generate structured prompts for zero-shot model inference
- Compare the model's predictions to ground truth in a later step


#### 2.1.1 🗂️ Upload and Load the Phishing Dataset

In this step, we upload a phishing detection dataset containing labeled email samples for evaluation.

The dataset must include the following columns:
- **`Email Text`**: the full content of the email (subject + body)
- **`Email Type`**: ground truth labels as either:
  - `Phishing Email` → malicious
  - `Safe Email` → legitimate

Once uploaded, we will load the data into memory and inspect its structure (row count, column names) to confirm it is ready for processing.


In [None]:
### 🧠 Section 2.2.1 – Benchmarking: Phishing Detection with Structured Output (Inference-Only)

from google.colab import files
import pandas as pd
import io

# 📤 Step 1: Upload the dataset
print("📤 Please upload your phishing dataset with 'Email Text' and 'Email Type' columns.")
uploaded = files.upload()
file_name = list(uploaded.keys())[0]

# ✅ Step 2: Load dataset
df = pd.read_csv(io.BytesIO(uploaded[file_name]))
print(f"\n✅ Dataset loaded: {file_name}")
print(f"📦 Rows: {len(df)} | Columns: {df.columns.tolist()}")


#### 2.1.2 🧠 Preprocess Dataset and Generate Structured Prompts

Next, we preprocess the dataset by:
- Renaming columns for consistency
- Mapping label values to binary format (`1` = phishing, `0` = safe)

We then construct a **structured prompt** for each email. These prompts instruct the model to analyze the email and return a standardized JSON response, containing:

- `Is_Phishing`: true or false
- `Risk`: level of severity
- `Suspicious_Links`: list of flagged links
- `Social_Engineering_Elements`: manipulative techniques found
- `Actions`: recommended security steps
- `Reason`: rationale for the decision

Finally, we preview the full prompt for one randomly selected phishing email to verify correctness before running model inference.


In [None]:
import json

# ✅ Step 3: Preprocess labels and columns
df = df.rename(columns={"Email Text": "text", "Email Type": "label"})
df["label"] = df["label"].map({"Phishing Email": 1, "Safe Email": 0})

# ✅ Step 4: Generate structured prompts
def build_prompt(email_body):
    return f"""### Instruction:
You are a cybersecurity expert working in a company's Security Operations Center (SOC).

Your task is to analyze the following email and return a structured JSON response. Be extremely strict and assume worst-case risk posture when any of the following are present:

- A link to a document/file from an unfamiliar or suspicious domain (e.g. fake Google Drive, Dropbox, corpfiles.net instead of corpfile.com)
- Urgent language or pressure to act quickly
- Generic greetings ("Hi", "Dear user") with no name
- Requests to click, download, or input sensitive data
- Email sender addresses mimicking known brands or internal departments
- Unexpected attachments or shared documents
- Impersonation of executives, HR, IT, or Finance
- Spelling mistakes or inconsistencies in formatting

### Respond in **this exact JSON format**:
{{
  "Is_Phishing": boolean,
  "Risk": "High" | "Medium" | "Low",
  "Suspicious_Links": ["..."],
  "Social_Engineering_Elements": ["..."],
  "Actions": ["..."],
  "Reason": "..."
}}

### Email:
{email_body}

### Response:"""

df["prompt"] = df["text"].apply(build_prompt)

# ✅ Step 5: Display label distribution
print(f"\n🟢 Phishing Emails: {df['label'].sum()}")
print(f"🔵 Legitimate Emails: {len(df) - df['label'].sum()}")

# 🧪 Step 6: Show full prompt for one phishing email
sample_index = df[df["label"] == 1].sample(1, random_state=42).index[0]
print("\n📌 Full Prompt Example (Phishing Email):\n")
print(df.loc[sample_index, "prompt"])


📤 Please upload your phishing dataset with 'Email Text' and 'Email Type' columns.


Saving Phishing_validation_emails.csv to Phishing_validation_emails (2).csv

✅ Dataset loaded: Phishing_validation_emails (2).csv
📦 Rows: 2000 | Columns: ['Email Text', 'Email Type']

🟢 Phishing Emails: 1000
🔵 Legitimate Emails: 1000

📌 Full Prompt Example (Phishing Email):

### Instruction:
You are a cybersecurity expert working in a company's Security Operations Center (SOC).

Your task is to analyze the following email and return a structured JSON response. Be extremely strict and assume worst-case risk posture when any of the following are present:

- A link to a document/file from an unfamiliar or suspicious domain (e.g. fake Google Drive, Dropbox, corpfiles.net instead of corpfile.com)
- Urgent language or pressure to act quickly
- Generic greetings ("Hi", "Dear user") with no name
- Requests to click, download, or input sensitive data
- Email sender addresses mimicking known brands or internal departments
- Unexpected attachments or shared documents
- Impersonation of executives

#### 2.1.3 🤖 Run Batched Inference and Extract Structured JSON Output

In this step, we run **batched inference** using the loaded model (e.g., Mistral-7B) to process structured prompts and generate phishing detection outputs in JSON format.

The process involves:
- Tokenizing prompts with padding and truncation
- Running the model in batches using `inference_mode()` for efficiency
- Decoding the model’s raw output into readable text
- Extracting the JSON response block using a regular expression
- Parsing each JSON block into Python dictionaries for downstream analysis

Each result is appended to the DataFrame under the `model_output` column for later evaluation against ground truth labels.

> ⚠️ Note: The generation is zero-shot; the model has not been fine-tuned on phishing-specific data.


In [None]:
from torch import inference_mode
from tqdm import tqdm
import re
import json

batch_size = 16
max_prompt_length = 1024  # Optional: Lower if still slow
all_predictions = []

print(f"🔁 Running batched inference (batch size = {batch_size})...")

for i in tqdm(range(0, len(df), batch_size)):
    batch_prompts = df["prompt"].iloc[i:i+batch_size].tolist()

    # Tokenize with truncation
    inputs = tokenizer(batch_prompts, return_tensors="pt", padding=True, truncation=True, max_length=max_prompt_length).to("cuda")

    with inference_mode():
        outputs = model.generate(
            **inputs,
            max_new_tokens=384,
            temperature=0.0,
            do_sample=False,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id
        )

    decoded_outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)

    # Extract JSON from each response
    for decoded in decoded_outputs:
        matches = list(re.finditer(r"\{[\s\S]+?\}", decoded))
        json_text = matches[-1].group() if matches else ""
        try:
            parsed = json.loads(json_text) if json_text else {"error": "no_json_found", "raw": decoded}
        except:
            parsed = {"error": "invalid_json", "raw": decoded}
        all_predictions.append(parsed)

df["model_output"] = all_predictions
print("✅ Batched inference complete.")


🔁 Running batched inference (batch size = 16)...


100%|██████████| 125/125 [31:47<00:00, 15.26s/it]

✅ Batched inference complete.





#### 2.1.4 📊 Inference Results Summary

After running zero-shot inference on the full phishing email dataset using Mistral-7B, the model produced structured JSON outputs for all 2,000 examples without any parsing failures.

The following performance metrics were calculated based on the model’s `"Is_Phishing"` predictions vs. ground truth labels (`1` for phishing, `0` for legitimate):





In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# ✅ Extract model-predicted labels
def extract_prediction(obj):
    if isinstance(obj, dict) and "Is_Phishing" in obj:
        return int(obj["Is_Phishing"]) if isinstance(obj["Is_Phishing"], bool) else None
    return None

df["predicted_label"] = df["model_output"].apply(extract_prediction)

# ✅ Filter out rows where prediction failed
valid = df.dropna(subset=["predicted_label"])
y_true = valid["label"]
y_pred = valid["predicted_label"]

# ✅ Calculate metrics
accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)

# ✅ Summary
print("📊 Evaluation Metrics (on valid responses):")
print(f"🧮 Accuracy : {accuracy:.4f}")
print(f"✅ Precision: {precision:.4f}")
print(f"🔁 Recall   : {recall:.4f}")
print(f"⭐ F1 Score : {f1:.4f}")
print(f"\n📦 Valid Predictions: {len(valid)} / {len(df)} total")

📊 Evaluation Metrics (on valid responses):
🧮 Accuracy : 0.8360
✅ Precision: 0.7530
🔁 Recall   : 1.0000
⭐ F1 Score : 0.8591

📦 Valid Predictions: 2000 / 2000 total


#### 2.1.5 🏁 Results Summary – `UTwente` Dataset

#### ✅ Evaluation Metrics (on 2,000 valid outputs):
- **Accuracy**: `83.6%`  
- **Precision**: `75.3%`  
- **Recall**: `100.0%`  
- **F1 Score**: `85.9%`  
- **Total Processed Emails**: `2,000`  
- **Valid JSON Responses**: `100%`

---

### 🧠 Interpretation:

The model showed exceptional **recall**, detecting all phishing attempts with zero false negatives. While a few legitimate emails were incorrectly flagged as phishing (lowering precision), this behavior aligns well with real-world SOC needs, where it’s safer to over-alert than to miss a threat.

These results indicate that the model is highly effective at catching phishing threats, though further tuning or rule-based filtering might be helpful to reduce false positives in production settings.


### 2.2 🗂️ Dataset: `Ahmad Tijjani Kaggle` (Phishing Detection with Category Context)

This dataset contains labeled phishing emails enriched with contextual **categories** describing the attack style or psychological tactic. It is ideal for evaluating LLMs on their ability to detect phishing attempts and understand the underlying method of deception.

Each entry includes:
- `text`: the full email content (subject + body)
- `label`: phishing classification (`phishing` or `safe`)
- `category`: the thematic phishing type (e.g., “urgency”, “authority”)

Once uploaded, we will:
- ✅ Convert textual labels to binary (`1` = phishing, `0` = safe)
- ✅ Preview dataset structure and basic stats
- ✅ Visualize the distribution of phishing categories
- ✅ Generate prompts for structured zero-shot inference

> 🧠 This evaluation uses **zero-shot prompt-based inference only**, with no fine-tuning.


#### 2.2.1 🗂️ Upload and Load the `Ahmad Tijjani Kaggle` Dataset

In this step, we upload the phishing dataset for zero-shot benchmarking. Each sample includes:
- The email content under the `text` column
- A phishing label under the `label` column (`phishing` or `safe`)
- A `category` column representing the type of phishing tactic (e.g., urgency, authority)

Once uploaded, we will:
- Load the dataset into memory using Pandas
- Display the number of rows and column names
- Preview a few sample entries to confirm structure and readiness for preprocessing


In [8]:
# 🧠 Section 2.2.1 – Upload and Load the Ahmad Tijjani Kaggle Dataset

from google.colab import files
import pandas as pd
import io

# 📤 Upload dataset
print("📤 Please upload the 'phishing_dataset_with_category.csv' file.")
uploaded = files.upload()
file_name = list(uploaded.keys())[0]

# ✅ Load dataset
df = pd.read_csv(io.BytesIO(uploaded[file_name]))
print(f"\n✅ Dataset loaded: {file_name}")
print(f"📦 Rows: {len(df)}")
print(f"🧾 Columns: {df.columns.tolist()}")

# ✅ Preview first few rows
display(df.head())

# ✅ Show label distribution
label_counts = df["label"].value_counts()
print("\n🔎 Label Distribution:")
print(label_counts)

# ✅ Show phishing category breakdown
category_counts = df["category"].value_counts()
print("\n📂 Category Breakdown:")
print(category_counts)

# ✅ Show a sample phishing email
print("\n📌 Sample Phishing Email:")
sample = df[df["label"] == "phishing"].sample(1, random_state=42)
print(sample[["text", "category"]].to_string(index=False))


📤 Please upload the 'phishing_dataset_with_category.csv' file.


Saving phishing_dataset_with_category.csv to phishing_dataset_with_category (1).csv

✅ Dataset loaded: phishing_dataset_with_category (1).csv
📦 Rows: 1000
🧾 Columns: ['text', 'category', 'label']


Unnamed: 0,text,category,label
0,Warning: Unusual login attempt detected on you...,urgency,phishing
1,Urgent! Your Google has been compromised. Clic...,urgency,phishing
2,This is an official notice from Amazon. Your a...,authority,phishing
3,"As per HMRC regulations, you must update your ...",authority,phishing
4,Immediate action required: Your Spotify subscr...,urgency,phishing



🔎 Label Distribution:
label
phishing    1000
Name: count, dtype: int64

📂 Category Breakdown:
category
authority     350
persuasion    328
urgency       322
Name: count, dtype: int64

📌 Sample Phishing Email:
                                                                         text category


#### 2.2.2 🧠 Preprocess Dataset and Generate Structured Prompts

In this step, we prepare the dataset for prompt-based inference.

Steps include:
- Converting the `label` column to binary format:
  - `phishing` → `1`
  - `safe` → `0`
- Generating a structured **instruction prompt** for each email
- Appending the prompt as a new column for inference use

Each prompt instructs the model to act as a cybersecurity expert and respond in a strict JSON format, enabling structured output extraction and evaluation.


In [5]:
# 🧠 Section 2.2.2 – Preprocess Dataset and Generate Structured Prompts

# ✅ Convert labels to binary
df["label"] = df["label"].map({"phishing": 1, "safe": 0})

# ✅ Define prompt template
def build_prompt(email_body):
    return f"""### Instruction:
You are a cybersecurity expert working in a company's Security Operations Center (SOC).

Your task is to analyze the following email and return a structured JSON response. Be extremely strict and assume worst-case risk posture when any of the following are present:

- A link to a document/file from an unfamiliar or suspicious domain (e.g. fake Google Drive, Dropbox, corpfiles.net instead of corpfile.com)
- Urgent language or pressure to act quickly
- Generic greetings ("Hi", "Dear user") with no name
- Requests to click, download, or input sensitive data
- Email sender addresses mimicking known brands or internal departments
- Unexpected attachments or shared documents
- Impersonation of executives, HR, IT, or Finance
- Spelling mistakes or inconsistencies in formatting

### Respond in **this exact JSON format**:
{{
  "Is_Phishing": boolean,
  "Risk": "High" | "Medium" | "Low",
  "Suspicious_Links": ["..."],
  "Social_Engineering_Elements": ["..."],
  "Actions": ["..."],
  "Reason": "..."
}}

### Email:
{email_body}

### Response:"""

# ✅ Generate prompt column
df["prompt"] = df["text"].apply(build_prompt)

# ✅ Preview one phishing prompt
sample_index = df[df["label"] == 1].sample(1, random_state=42).index[0]
print("\n📌 Full Prompt Example (Phishing Email):\n")
print(df.loc[sample_index, "prompt"])



📌 Full Prompt Example (Phishing Email):

### Instruction:
You are a cybersecurity expert working in a company's Security Operations Center (SOC).

Your task is to analyze the following email and return a structured JSON response. Be extremely strict and assume worst-case risk posture when any of the following are present:

- A link to a document/file from an unfamiliar or suspicious domain (e.g. fake Google Drive, Dropbox, corpfiles.net instead of corpfile.com)
- Urgent language or pressure to act quickly
- Generic greetings ("Hi", "Dear user") with no name
- Requests to click, download, or input sensitive data
- Email sender addresses mimicking known brands or internal departments
- Unexpected attachments or shared documents
- Impersonation of executives, HR, IT, or Finance
- Spelling mistakes or inconsistencies in formatting

### Respond in **this exact JSON format**:
{
  "Is_Phishing": boolean,
  "Risk": "High" | "Medium" | "Low",
  "Suspicious_Links": ["..."],
  "Social_Engineerin

#### 2.2.3 🤖 Run Batched Inference and Extract Structured JSON Output

In this step, we perform **batched inference** on the structured prompts using the loaded LLM (e.g., Mistral-7B).

Steps include:
- Tokenizing prompts with appropriate padding and truncation
- Generating responses using `inference_mode()` for speed and memory efficiency
- Extracting the model’s JSON output block using regular expressions
- Parsing each output into Python dictionaries for evaluation

The parsed JSON is stored under a new `model_output` column in the DataFrame for further analysis.


In [6]:
from torch import inference_mode
from tqdm import tqdm
import re
import json

# 🔧 Inference configuration
batch_size = 16
max_prompt_length = 1024
all_predictions = []

print(f"🔁 Running batched inference (batch size = {batch_size})...")

# 🔄 Batched generation loop
for i in tqdm(range(0, len(df), batch_size)):
    batch_prompts = df["prompt"].iloc[i:i+batch_size].tolist()

    # Tokenize input prompts
    inputs = tokenizer(batch_prompts, return_tensors="pt", padding=True, truncation=True, max_length=max_prompt_length).to("cuda")

    with inference_mode():
        outputs = model.generate(
            **inputs,
            max_new_tokens=384,
            temperature=0.0,
            do_sample=False,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id
        )

    # Decode and extract JSON
    decoded_outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    for decoded in decoded_outputs:
        matches = list(re.finditer(r"\{[\s\S]+?\}", decoded))
        json_text = matches[-1].group() if matches else ""
        try:
            parsed = json.loads(json_text) if json_text else {"error": "no_json_found", "raw": decoded}
        except:
            parsed = {"error": "invalid_json", "raw": decoded}
        all_predictions.append(parsed)

# ✅ Store model outputs
df["model_output"] = all_predictions
print("✅ Batched inference complete.")


🔁 Running batched inference (batch size = 16)...


100%|██████████| 63/63 [11:00<00:00, 10.48s/it]

✅ Batched inference complete.





#### 2.2.4 📊 Inference Results Summary and Evaluation

After running zero-shot inference on the `Ahmad Tijjani Kaggle` dataset, the model produced structured JSON outputs for all entries.

In this step, we:
- Extract the predicted `Is_Phishing` value from each JSON response
- Filter out invalid or unparseable results
- Compare predictions against ground truth labels
- Calculate evaluation metrics including:
  - Accuracy
  - Precision
  - Recall
  - F1 Score

This provides a performance snapshot of the model's phishing detection capabilities based solely on prompt engineering.

> 📌 These metrics help assess practical utility in real-world SOC automation pipelines.


In [7]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# ✅ Extract binary predictions
def extract_prediction(obj):
    if isinstance(obj, dict) and "Is_Phishing" in obj:
        return int(obj["Is_Phishing"]) if isinstance(obj["Is_Phishing"], bool) else None
    return None

df["predicted_label"] = df["model_output"].apply(extract_prediction)

# ✅ Filter valid rows
valid = df.dropna(subset=["predicted_label"])
y_true = valid["label"]
y_pred = valid["predicted_label"]

# ✅ Compute evaluation metrics
accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)

# 📊 Display results
print("📊 Evaluation Metrics (on valid responses):")
print(f"🧮 Accuracy : {accuracy:.4f}")
print(f"✅ Precision: {precision:.4f}")
print(f"🔁 Recall   : {recall:.4f}")
print(f"⭐ F1 Score : {f1:.4f}")
print(f"\n📦 Valid Predictions: {len(valid)} / {len(df)} total")


📊 Evaluation Metrics (on valid responses):
🧮 Accuracy : 1.0000
✅ Precision: 1.0000
🔁 Recall   : 1.0000
⭐ F1 Score : 1.0000

📦 Valid Predictions: 1000 / 1000 total


#### 2.2.5 🏁 Results Summary – `Ahmad Tijjani Kaggle` Dataset

After completing inference and evaluation, the model demonstrated **perfect performance** on the `Ahmad Tijjani Kaggle` dataset.

### ✅ Final Evaluation Metrics:
- **Accuracy**: 100.0%
- **Precision**: 100.0%
- **Recall**: 100.0%
- **F1 Score**: 100.0%
- **Valid Outputs**: 1000 / 1000 structured responses parsed successfully

### 🧠 Interpretation:
- The model correctly identified all phishing and legitimate emails with no false positives or false negatives.
- Structured output formatting was followed strictly, enabling reliable parsing.
- These results are ideal for automation in high-risk environments such as Security Operations Centers (SOC).

> 📌 Note: These results are dataset-specific. Additional datasets should be tested to validate generalizability and uncover potential blind spots.


### 2.3 🗂️ Dataset: `Charlotte Hall` (Classified by Attack Strategy)

This dataset contains phishing and legitimate emails, organized by **type of phishing attack**, such as traditional and spear-phishing.

Each entry includes:
- `Email`: the full message content (subject + body)
- `Type`: the type of email, indicating its phishing subtype
- `Label`: classification as phishing or legitimate

Once loaded, we will:
- ✅ Inspect column names and row count
- ✅ Standardize the structure
- ✅ Preview email content
- ✅ Count phishing vs. safe samples
- ✅ Analyze the distribution of phishing types

> 🧠 This dataset supports evaluating model performance on multiple phishing styles, not just binary classification.


#### 2.3.1 🗂️ Upload and Load the `Phishing Email Data by Type` Dataset

In this step, we upload and inspect a dataset that contains phishing and legitimate emails labeled by type (e.g., traditional phishing, spear phishing, etc.).

The dataset includes:
- `Email`: the raw email content (subject + body)
- `Type`: the category of phishing attack (e.g., "Invoice Scam", "Credential Theft")
- `Label`: whether the email is a phishing attempt (`phishing`) or not (`legitimate` or `safe`)

Once uploaded, we will:
- Load the dataset using Pandas
- Check the total number of rows and column names
- Preview the first few rows to confirm the structure before preprocessing


In [4]:
# 🧠 Section 2.3.1 – Upload and Load the Phishing Email Data by Type Dataset

from google.colab import files
import pandas as pd
import io

# 📤 Upload dataset
print("📤 Please upload the 'phishing_data_by_type.csv' file.")
uploaded = files.upload()
file_name = list(uploaded.keys())[0]

# ✅ Load dataset
df = pd.read_csv(io.BytesIO(uploaded[file_name]))
print(f"\n✅ Dataset loaded: {file_name}")
print(f"📦 Rows: {len(df)}")
print(f"🧾 Columns: {df.columns.tolist()}")

# ✅ Preview first few rows
df.head()


📤 Please upload the 'phishing_data_by_type.csv' file.


Saving phishing_data_by_type.csv to phishing_data_by_type.csv

✅ Dataset loaded: phishing_data_by_type.csv
📦 Rows: 159
🧾 Columns: ['Subject', 'Text', 'Type']


Unnamed: 0,Subject,Text,Type
0,URGENT BUSINESS ASSISTANCE AND PARTNERSHIP,URGENT BUSINESS ASSISTANCE AND PARTNERSHIP.\n\...,Fraud
1,URGENT ASSISTANCE /RELATIONSHIP (P),"Dear Friend,\n\nI am Mr. Ben Suleman a custom ...",Fraud
2,GOOD DAY TO YOU,FROM HIS ROYAL MAJESTY (HRM) CROWN RULER OF EL...,Fraud
3,from Mrs.Johnson,Goodday Dear\n\n\nI know this mail will come t...,Fraud
4,Co-Operation,FROM MR. GODWIN AKWESI\nTEL: +233 208216645\nF...,Fraud


#### 2.3.2 🧠 Preprocess Dataset and Generate Structured Prompts

In this step, we prepare the `Phishing Email Data by Type` dataset for prompt-based inference by:

- Merging the `Subject` and `Text` columns into a single `text` field
- Renaming `Type` to `category` for clarity
- Mapping the following categories to binary labels:
  - `Phishing`, `Fraud`, `Commercial Spam` → `1` (phishing/malicious)
  - `False Positives` → `0` (safe)
- Generating a structured **LLM prompt** for each email based on the combined subject and body
- Storing the generated prompt in a new `prompt` column for batch inference

This configuration reflects a strict security posture: **even commercial spam is treated as phishing**, as it may contain risky links or deceptive elements that can compromise employee safety.


In [16]:
# 🧠 Section 2.3.2 – Preprocess and Generate Structured Prompts (Adjusted for phishing_data_by_type.csv)

# ✅ Combine subject + body into a unified 'text' column
df["text"] = "SUBJECT: " + df["Subject"].fillna("") + "\n\n" + df["Text"].fillna("")

# ✅ Rename the phishing type column
df = df.rename(columns={"Type": "category"})

# ✅ Map 'category' to binary phishing labels
# We'll treat only 'Phishing' and 'Fraud' as phishing, others as safe
phishing_labels = {"Phishing": 1, "Fraud": 1, "False Positives": 0, "Commercial Spam": 0}
df["label"] = df["category"].map(phishing_labels)

# ✅ Prompt generation function (kept unchanged)
def build_prompt(email_body):
    return f"""### Instruction:
You are a cybersecurity expert working in a company's Security Operations Center (SOC).

Your task is to analyze the following email and return a structured JSON response. Be extremely strict and assume worst-case risk posture when any of the following are present:

- A link to a document/file from an unfamiliar or suspicious domain
- Urgent language or pressure to act quickly
- Generic greetings ("Hi", "Dear user") with no name
- Requests to click, download, or input sensitive data
- Email sender addresses mimicking known brands or internal departments
- Unexpected attachments or shared documents
- Impersonation of executives, HR, IT, or Finance
- Spelling mistakes or inconsistencies in formatting

### Respond in **this exact JSON format**:
{{
  "Is_Phishing": boolean,
  "Risk": "High" | "Medium" | "Low",
  "Suspicious_Links": ["..."],
  "Social_Engineering_Elements": ["..."],
  "Actions": ["..."],
  "Reason": "..."
}}

### Email:
{email_body}

### Response:"""

# ✅ Generate prompts
df["prompt"] = df["text"].apply(build_prompt)

# ✅ Preview a prompt from a real phishing email
sample_index = df[df["label"] == 1].sample(1, random_state=42).index[0]
print("\n📌 Full Prompt Example (Phishing Email):\n")
print(df.loc[sample_index, "prompt"])



📌 Full Prompt Example (Phishing Email):

### Instruction:
You are a cybersecurity expert working in a company's Security Operations Center (SOC).

Your task is to analyze the following email and return a structured JSON response. Be extremely strict and assume worst-case risk posture when any of the following are present:

- A link to a document/file from an unfamiliar or suspicious domain
- Urgent language or pressure to act quickly
- Generic greetings ("Hi", "Dear user") with no name
- Requests to click, download, or input sensitive data
- Email sender addresses mimicking known brands or internal departments
- Unexpected attachments or shared documents
- Impersonation of executives, HR, IT, or Finance
- Spelling mistakes or inconsistencies in formatting

### Respond in **this exact JSON format**:
{
  "Is_Phishing": boolean,
  "Risk": "High" | "Medium" | "Low",
  "Suspicious_Links": ["..."],
  "Social_Engineering_Elements": ["..."],
  "Actions": ["..."],
  "Reason": "..."
}

### Emai

#### 2.3.3 🤖 Run Batched Inference and Extract Structured JSON Output

In this step, we use a large language model (e.g., Mistral, LLaMA) to process the structured prompts in batches.

Steps include:
- Tokenizing each prompt with padding and truncation
- Running the model in inference-only mode for efficiency
- Decoding the model’s raw response
- Extracting and parsing the **structured JSON output**
- Appending each prediction to the dataset for downstream evaluation

The output is stored in a new column called `model_output`.


In [20]:
from torch import inference_mode
from tqdm import tqdm
import re
import json

# 🔧 Inference config
# Inference configuration
batch_size = 16
max_prompt_length = 2048

print(f"🔁 Running batched inference (batch size = {batch_size})...")

# 🔄 Batched inference loop
for i in tqdm(range(0, len(df), batch_size)):
    batch_prompts = df["prompt"].iloc[i:i+batch_size].tolist()

    # Tokenize inputs
    inputs = tokenizer(batch_prompts, return_tensors="pt", padding=True, truncation=True, max_length=max_prompt_length).to("cuda")

    with inference_mode():
        outputs = model.generate(
            **inputs,
            max_new_tokens=2048,
            temperature=0.0,
            do_sample=False,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id
        )

    # Decode and extract structured JSON
    decoded_outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    for decoded in decoded_outputs:
        matches = list(re.finditer(r"\{[\s\S]+?\}", decoded))
        json_text = matches[-1].group() if matches else ""
        try:
            parsed = json.loads(json_text) if json_text else {"error": "no_json_found", "raw": decoded}
        except:
            parsed = {"error": "invalid_json", "raw": decoded}
        all_predictions.append(parsed)

# ✅ Store model outputs
df["model_output"] = all_predictions
print("✅ Batched inference complete.")


🔁 Running batched inference (batch size = 16)...


100%|██████████| 10/10 [18:24<00:00, 110.48s/it]

✅ Batched inference complete.





#### 2.3.4 📊 Inference Results Summary and Evaluation

After running structured inference on the `Phishing Email Data by Type` dataset, we now evaluate the model’s predictions against the true labels.

In this step, we:
- Extract the `"Is_Phishing"` prediction from the model’s JSON output
- Compare the model’s output with ground truth binary labels
- Compute key classification metrics:
  - Accuracy
  - Precision
  - Recall
  - F1 Score
- Identify how well the model handles various phishing types, commercial spam, and false positives

> 📌 These metrics help assess whether the model is too aggressive (many false positives) or too permissive (missed phishing threats).


In [21]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# ✅ Extract model-predicted labels from the structured JSON
def extract_prediction(obj):
    if isinstance(obj, dict) and "Is_Phishing" in obj:
        return int(obj["Is_Phishing"]) if isinstance(obj["Is_Phishing"], bool) else None
    return None

df["predicted_label"] = df["model_output"].apply(extract_prediction)

# ✅ Filter only valid predictions
valid = df.dropna(subset=["label", "predicted_label"])
y_true = valid["label"]
y_pred = valid["predicted_label"]

# ✅ Calculate metrics
accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)

# ✅ Summary
print("📊 Evaluation Metrics (on valid responses):")
print(f"🧮 Accuracy : {accuracy:.4f}")
print(f"✅ Precision: {precision:.4f}")
print(f"🔁 Recall   : {recall:.4f}")
print(f"⭐ F1 Score : {f1:.4f}")
print(f"\n📦 Valid Predictions: {len(valid)} / {len(df)} total")


📊 Evaluation Metrics (on valid responses):
🧮 Accuracy : 0.7217
✅ Precision: 0.7037
🔁 Recall   : 1.0000
⭐ F1 Score : 0.8261

📦 Valid Predictions: 115 / 159 total


#### 2.3.5 🏁 Results Summary – `Phishing Email Data by Type` Dataset

#### ✅ Evaluation Metrics (on valid responses):
- **Accuracy**: `72.2%`
- **Precision**: `70.4%`
- **Recall**: `100.0%`
- **F1 Score**: `82.6%`
- **Valid JSON Responses**: `115 / 159` emails

---

### 🧠 Interpretation:

The model achieved **perfect recall**, successfully flagging all phishing, fraud, and commercial spam emails without missing a single threat — aligning well with a security-first policy.

However, only **72.3% of emails produced valid structured JSON**, which limits full evaluation and raises concerns for deployment in automation pipelines. This points to a common limitation with current large language models: even with strict prompts, structured outputs are not always guaranteed.

### ⚠️ JSON Reliability Issue

Despite excellent detection accuracy, **44 emails (27.7%) failed to produce valid JSON**, highlighting :
- Long mails could cause a problem for the models, espically the smaller ones.

> 📌 These results show that while LLMs are highly capable of detecting phishing threats, using them reliably in production workflows requires further handling of output formatting stability.


### 2.4 🗂️ Dataset: Improving Phishing Detection Via Psychological Trait Scoring

This dataset includes real-world phishing and legitimate emails sourced from:
- The ENRON corpus
- University phishing simulation archives
- Public phishing training sites

Each entry includes:
- `text`: the full email message (subject + body)
- `is_phishing`: ground truth label (`1` = phishing, `0` = legitimate)
- `source`: where the email was collected from (e.g., ENRON, Stanford, etc.)

Once uploaded, we will:
- ✅ Verify label distribution
- ✅ Preview email content
- ✅ Generate structured prompts for zero-shot LLM inference
- ✅ Evaluate model predictions against the `is_phishing` field


#### 2.4.1 🗂️ Upload and Load the `Improving Phishing Detection Via Psychological Trait Scoring` Dataset

In this step, we upload and inspect the dataset used in the study *"Improving Phishing Detection via Psychological Trait Scoring."*

The dataset contains labeled email samples from multiple sources, including ENRON and university phishing education portals.

The structure includes:
- `text`: the full email content (subject + body)
- `source`: the origin of the email (e.g., ENRON, Stanford, University of Washington)
- `is_phishing`: binary label
  - `1` = phishing
  - `0` = safe/legitimate

Once loaded, we will preview the first few entries, confirm the structure, and proceed to standardize and prepare the data for prompt-based inference.


In [4]:
# 🧠 Section 2.3.1 – Upload and Load the Phishing Email Data by Type Dataset

from google.colab import files
import pandas as pd
import io

# 📤 Upload dataset
print("📤 Please upload the 'phishing_data_by_type.csv' file.")
uploaded = files.upload()
file_name = list(uploaded.keys())[0]

# ✅ Load dataset
df = pd.read_csv(io.BytesIO(uploaded[file_name]))
print(f"\n✅ Dataset loaded: {file_name}")
print(f"📦 Rows: {len(df)}")
print(f"🧾 Columns: {df.columns.tolist()}")

# ✅ Preview first few rows
df.head()


📤 Please upload the 'phishing_data_by_type.csv' file.


Saving curated_set.csv to curated_set.csv

✅ Dataset loaded: curated_set.csv
📦 Rows: 326
🧾 Columns: ['Unnamed: 0', 'text', 'source', 'is_phishing']


Unnamed: 0.1,Unnamed: 0,text,source,is_phishing
0,0,Subject: ena offsite\nmy suggestions :\n1 ) mo...,ENRON,0
1,1,Subject: allegheny energy s - 3\ni received wo...,ENRON,0
2,2,The University of Washington System is sharing...,https://ciso.uw.edu/education/more-phishing-ex...,1
3,3,"Dear user@stanford.edu,\n\nA private document ...",https://uit.stanford.edu/phishing,1
4,4,Subject: james valverde - interview schedule\n...,ENRON,0


#### 2.4.2 🧠 Preprocess Dataset and Generate Structured Prompts

In this step, we prepare the `Improving Phishing Detection Via Psychological Trait Scoring` dataset for model inference by:

- Renaming the column `is_phishing` to `label` for consistency
- Ensuring the `text` field is used as the full email body
- Generating a structured, instruction-following prompt for each email
- Appending the prompt in a new column `prompt` for batch inference

Each prompt is designed to instruct the model to return a detailed phishing risk assessment in strict JSON format.


In [5]:
# 🧠 Section 2.4.2 – Preprocess and Generate Structured Prompts

# ✅ Standardize label column name
df = df.rename(columns={"is_phishing": "label"})

# ✅ Prompt generation function
def build_prompt(email_body):
    return f"""### Instruction:
You are a cybersecurity expert working in a company's Security Operations Center (SOC).

Your task is to analyze the following email and return a structured JSON response. Be extremely strict and assume worst-case risk posture when any of the following are present:

- A link to a document/file from an unfamiliar or suspicious domain
- Urgent language or pressure to act quickly
- Generic greetings ("Hi", "Dear user") with no name
- Requests to click, download, or input sensitive data
- Email sender addresses mimicking known brands or internal departments
- Unexpected attachments or shared documents
- Impersonation of executives, HR, IT, or Finance
- Spelling mistakes or inconsistencies in formatting

### Respond in **this exact JSON format**:
{{
  "Is_Phishing": boolean,
  "Risk": "High" | "Medium" | "Low",
  "Suspicious_Links": ["..."],
  "Social_Engineering_Elements": ["..."],
  "Actions": ["..."],
  "Reason": "..."
}}

### Email:
{email_body}

### Response:"""

# ✅ Apply prompt generator
df["prompt"] = df["text"].apply(build_prompt)

# ✅ Preview a sample phishing prompt
sample_index = df[df["label"] == 1].sample(1, random_state=42).index[0]
print("\n📌 Full Prompt Example (Phishing Email):\n")
print(df.loc[sample_index, "prompt"])



📌 Full Prompt Example (Phishing Email):

### Instruction:
You are a cybersecurity expert working in a company's Security Operations Center (SOC).

Your task is to analyze the following email and return a structured JSON response. Be extremely strict and assume worst-case risk posture when any of the following are present:

- A link to a document/file from an unfamiliar or suspicious domain
- Urgent language or pressure to act quickly
- Generic greetings ("Hi", "Dear user") with no name
- Requests to click, download, or input sensitive data
- Email sender addresses mimicking known brands or internal departments
- Unexpected attachments or shared documents
- Impersonation of executives, HR, IT, or Finance
- Spelling mistakes or inconsistencies in formatting

### Respond in **this exact JSON format**:
{
  "Is_Phishing": boolean,
  "Risk": "High" | "Medium" | "Low",
  "Suspicious_Links": ["..."],
  "Social_Engineering_Elements": ["..."],
  "Actions": ["..."],
  "Reason": "..."
}

### Emai

#### 2.4.3 🤖 Run Batched Inference and Extract Structured JSON Output

In this step, we process each structured prompt using a large language model (e.g., Mistral-7B) in batches.

This involves:
- Tokenizing prompts with appropriate padding and truncation
- Generating structured responses using `inference_mode()` for performance
- Decoding model outputs and extracting JSON blocks using regex
- Parsing each structured JSON response and appending it to the dataset

The results are stored in a new `model_output` column for further evaluation.


In [6]:
from torch import inference_mode
from tqdm import tqdm
import re
import json

# 🔧 Inference configuration
batch_size = 16
max_prompt_length = 2048
max_new_tokens = 2048
all_predictions = []

# 🧹 Clear previous predictions in case re-running
all_predictions.clear()
print(f"🔁 Running batched inference (batch size = {batch_size})...")

# 🔄 Batched inference loop
for i in tqdm(range(0, len(df), batch_size)):
    batch_prompts = df["prompt"].iloc[i:i+batch_size].tolist()

    # Tokenize
    inputs = tokenizer(batch_prompts, return_tensors="pt", padding=True, truncation=True, max_length=max_prompt_length).to("cuda")

    with inference_mode():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.0,
            do_sample=False,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id
        )

    # Decode and extract structured JSON
    decoded_outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    for decoded in decoded_outputs:
        matches = list(re.finditer(r"\{[\s\S]+?\}", decoded))
        json_text = max(matches, key=lambda m: len(m.group())).group() if matches else ""
        try:
            parsed = json.loads(json_text) if json_text else {"error": "no_json_found", "raw": decoded}
        except:
            parsed = {"error": "invalid_json", "raw": decoded}
        all_predictions.append(parsed)

# ✅ Final check before assignment
if len(all_predictions) != len(df):
    print(f"❌ Mismatch: predictions ({len(all_predictions)}) vs rows ({len(df)})")
else:
    df["model_output"] = all_predictions
    print("✅ Batched inference complete.")


🔁 Running batched inference (batch size = 16)...


100%|██████████| 21/21 [12:49<00:00, 36.63s/it]

✅ Batched inference complete.





#### 2.4.4 📊 Inference Results Summary and Evaluation

After running inference on the `Improving Phishing Detection Via Psychological Trait Scoring` dataset, we now compare the model’s predictions to the ground truth labels.

This step involves:
- Extracting the `Is_Phishing` field from the model's JSON response
- Comparing it with the `label` column
- Calculating evaluation metrics:
  - Accuracy
  - Precision
  - Recall
  - F1 Score
- Assessing how well the model balances threat detection vs. false positives

> 📌 These metrics help evaluate model effectiveness in identifying socially engineered emails while avoiding misclassification of safe content.


In [8]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# ✅ Extract model-predicted labels
def extract_prediction(obj):
    if isinstance(obj, dict) and "Is_Phishing" in obj:
        return int(obj["Is_Phishing"]) if isinstance(obj["Is_Phishing"], bool) else None
    return None

df["predicted_label"] = df["model_output"].apply(extract_prediction)

# ✅ Filter for valid predictions
valid = df.dropna(subset=["predicted_label"])
y_true = valid["label"]
y_pred = valid["predicted_label"]

# ✅ Calculate metrics
accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)

# ✅ Summary
print("📊 Evaluation Metrics (on valid responses):")
print(f"🧮 Accuracy : {accuracy:.4f}")
print(f"✅ Precision: {precision:.4f}")
print(f"🔁 Recall   : {recall:.4f}")
print(f"⭐ F1 Score : {f1:.4f}")
print(f"\n📦 Valid Predictions: {len(valid)} / {len(df)} total")


📊 Evaluation Metrics (on valid responses):
🧮 Accuracy : 0.6554
✅ Precision: 0.5941
🔁 Recall   : 0.9877
⭐ F1 Score : 0.7419

📦 Valid Predictions: 325 / 326 total


#### 2.4.5 🏁 Results Summary – `Improving Phishing Detection Via Psychological Trait Scoring` Dataset

#### ✅ Evaluation Metrics (on valid responses):
- **Accuracy**: `65.5%`
- **Precision**: `59.4%`
- **Recall**: `98.8%`
- **F1 Score**: `74.2%`
- **Valid JSON Responses**: `325 / 326` emails

---

### 🧠 Interpretation:

The model achieved **exceptional recall**, successfully detecting nearly every phishing email, making it highly effective for use in environments where **false negatives are unacceptable**.

However, with a **precision of 59.4%**, a noticeable number of legitimate emails we
