<a href="https://colab.research.google.com/github/ArneshDorsatwar/Finetuning/blob/master/v2/notebooks/train_v2_network_security.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Network Security Expert v2 - Fine-Tuning Notebook

## Overview
This notebook fine-tunes **Llama 3.1 8B Instruct** to create a specialized AI for:
- **Conceptual/Theoretical** network security knowledge
- **Chain-of-thought reasoning** for complex workflows
- **FireWeave orchestration** with function calling
- **Compliance framework** expertise (PCI-DSS, SOC2, NIST, HIPAA, ISO 27001)

## Key Differences from v1
- Focus on WHY (theory) not just WHAT (commands)
- Multi-step reasoning chains
- Function calling for FireWeave API
- Multi-turn conversations
- ~10,500 training examples (V1 base + V2 tool calling)

## Requirements
- Google Colab with GPU (T4 minimum, A100 recommended)
- ~16GB GPU VRAM
- Training data: `v2/data/processed/v2_training_data.json`

## 1. Environment Setup

In [1]:
# Install Unsloth (optimized fine-tuning library)
%%capture
import torch
major_version, minor_version = torch.cuda.get_device_capability()

# Install the correct version based on CUDA capability
if major_version >= 8:
    # Ampere or newer (A100, RTX 30xx, etc.)
    !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
else:
    # Older GPUs (T4, V100, etc.)
    !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

!pip install --no-deps xformers trl peft accelerate bitsandbytes triton

In [3]:
# Verify GPU
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

PyTorch version: 2.9.0+cu126
CUDA available: True
GPU: NVIDIA A100-SXM4-40GB
VRAM: 42.5 GB


## 2. Load Base Model with Unsloth

In [4]:
from unsloth import FastLanguageModel
import torch

# V2 Configuration - Longer context for reasoning chains
max_seq_length = 4096  # Increased for chain-of-thought
dtype = None  # Auto-detect
load_in_4bit = True  # Use 4-bit quantization for memory efficiency

# Load Llama 3.1 8B Instruct (4-bit quantized)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

print(f"Model loaded successfully!")
print(f"Max sequence length: {max_seq_length}")


Please restructure your imports with 'import unsloth' at the top of your file.
  from unsloth import FastLanguageModel


ðŸ¦¥ Unsloth: Will patch your computer to enable 2x faster free finetuning.
ðŸ¦¥ Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2026.1.3: Fast Llama patching. Transformers: 4.57.3.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu126. CUDA: 8.0. CUDA Toolkit: 12.6. Triton: 3.5.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.33.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

Model loaded successfully!
Max sequence length: 4096


## 3. Configure LoRA Adapters

In [5]:
# Add LoRA adapters for efficient fine-tuning
# V2: Slightly higher rank for better conceptual understanding
model = FastLanguageModel.get_peft_model(
    model,
    r=32,  # Increased from 16 for better reasoning
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    lora_alpha=32,
    lora_dropout=0,  # No dropout for inference stability
    bias="none",
    use_gradient_checkpointing="unsloth",  # Memory optimization
    random_state=42,
    use_rslora=True,  # Rank-stabilized LoRA
    loftq_config=None,
)

# Print trainable parameters
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())
print(f"Trainable parameters: {trainable_params:,} ({100 * trainable_params / total_params:.2f}%)")

Unsloth 2026.1.3 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


Trainable parameters: 83,886,080 (1.81%)


## 4. Load and Prepare Training Data

In [None]:
# Upload your training data or mount Google Drive
from google.colab import files, drive

# Option 1: Upload directly
# uploaded = files.upload()

# Option 2: Mount Google Drive
drive.mount('/content/drive')

# Set the path to your training data
# TRAINING_DATA_PATH = '/content/v2_training_data.json'  # If uploaded directly
TRAINING_DATA_PATH = '/content/v2_training_data.json'  # If from Drive

In [None]:
import json
from datasets import Dataset

# Load training data
with open(TRAINING_DATA_PATH, 'r', encoding='utf-8') as f:
    training_data = json.load(f)

print(f"Loaded {len(training_data):,} training examples")

# Analyze data
topics = {}
has_reasoning = 0
has_function = 0
multi_turn = 0

for item in training_data:
    topic = item.get('topic', 'unknown')
    topics[topic] = topics.get(topic, 0) + 1
    if item.get('has_reasoning', False):
        has_reasoning += 1
    if item.get('has_function_call', False):
        has_function += 1
    if len(item.get('conversations', [])) > 2:
        multi_turn += 1

print(f"\nData Distribution:")
print(f"  Topics: {len(topics)}")
print(f"  With reasoning: {has_reasoning:,} ({100*has_reasoning/len(training_data):.1f}%)")
print(f"  With function calls: {has_function:,} ({100*has_function/len(training_data):.1f}%)")
print(f"  Multi-turn: {multi_turn:,} ({100*multi_turn/len(training_data):.1f}%)")

print(f"\nTop 10 Topics:")
for topic, count in sorted(topics.items(), key=lambda x: -x[1])[:10]:
    print(f"  {topic}: {count}")

In [None]:
# V2 System Prompt - Focused on conceptual understanding and orchestration
V2_SYSTEM_PROMPT = """You are a Network Security Expert AI integrated with FireWeave - an enterprise firewall automation platform.

**Your Capabilities:**
- Deep understanding of network security CONCEPTS and THEORY
- Expertise in compliance frameworks (PCI-DSS, SOC2, NIST, HIPAA, ISO 27001)
- Multi-cloud security (AWS, Azure, GCP) and Palo Alto Panorama
- Attack path analysis and blast radius calculation
- ServiceNow integration for change management

**Your Approach:**
1. REASON step-by-step through complex problems
2. Explain the WHY behind security decisions
3. Reference relevant compliance requirements
4. Use FireWeave functions when action is needed
5. Consider security trade-offs and risks

Always prioritize security, explain your reasoning, and provide actionable guidance."""

# Llama 3.1 NATIVE Tool Calling Format
def format_conversation_with_tools(example):
    """Format conversation for Llama 3.1 with NATIVE tool calling support."""
    conversations = example.get('conversations', [])
    tools = example.get('tools', [])

    # Build the formatted text
    text = f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{V2_SYSTEM_PROMPT}"

    # Add tool definitions if present
    if tools:
        text += "\n\nYou have access to the following tools:\n\n"
        for tool in tools:
            if "function" in tool:
                func = tool["function"]
                text += f"Use the function '{func['name']}' to '{func['description']}':\n"
                text += json.dumps({"name": func["name"], "parameters": func.get("parameters", {})}, indent=2)
                text += "\n\n"

    text += "<|eot_id|>"

    for turn in conversations:
        role = turn['from']
        content = turn['value']

        if role == 'human':
            text += f"<|start_header_id|>user<|end_header_id|>\n\n{content}<|eot_id|>"
        elif role == 'gpt':
            # Check if this is a tool call (contains <|python_tag|>)
            if "<|python_tag|>" in content:
                text += f"<|start_header_id|>assistant<|end_header_id|>\n\n{content}<|eom_id|>"
            else:
                text += f"<|start_header_id|>assistant<|end_header_id|>\n\n{content}<|eot_id|>"
        elif role == 'tool':
            # Tool response uses ipython header
            text += f"<|start_header_id|>ipython<|end_header_id|>\n\n{content}<|eot_id|>"

    return {'text': text}

# Convert to dataset
dataset = Dataset.from_list(training_data)
dataset = dataset.map(format_conversation_with_tools)

print(f"Dataset prepared: {len(dataset):,} examples")

# Analyze tool calling examples
tool_examples = sum(1 for item in training_data if item.get('has_tool_call', False))
print(f"Examples with tool calls: {tool_examples:,} ({100*tool_examples/len(training_data):.1f}%)")

print(f"\nSample formatted text (first 800 chars):")
print(dataset[0]['text'][:800])

## 5. Configure Trainer

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

# V2 Training Configuration
# Optimized for longer sequences and reasoning
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,  # Disable packing for multi-turn conversations
    args=TrainingArguments(
        # Output
        output_dir="./outputs_v2",

        # Training duration
        num_train_epochs=3,
        max_steps=-1,

        # Batch size (adjust based on GPU memory)
        per_device_train_batch_size=1,  # Reduced for longer sequences
        gradient_accumulation_steps=8,  # Effective batch size = 8

        # Learning rate
        learning_rate=1e-4,  # Slightly lower for stability
        lr_scheduler_type="cosine",
        warmup_ratio=0.05,

        # Optimization
        optim="adamw_8bit",
        weight_decay=0.01,
        max_grad_norm=1.0,

        # Precision
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),

        # Logging
        logging_steps=10,
        logging_dir="./logs_v2",

        # Checkpointing
        save_strategy="steps",
        save_steps=500,
        save_total_limit=3,

        # Other
        seed=42,
        report_to="none",
    ),
)

print("Trainer configured!")
print(f"  Epochs: 3")
print(f"  Effective batch size: 8")
print(f"  Learning rate: 1e-4")
print(f"  Max sequence length: {max_seq_length}")

## 6. Train the Model

In [None]:
# Start training
print("Starting v2 training...")
print(f"Training on {len(dataset):,} examples")
print(f"This will take several hours depending on your GPU.")
print("="*50)

trainer_stats = trainer.train()

print("="*50)
print("Training complete!")
print(f"  Training time: {trainer_stats.metrics['train_runtime']:.0f} seconds")
print(f"  Final loss: {trainer_stats.metrics['train_loss']:.4f}")

## 7. Test the Model

In [None]:
# Enable inference mode
FastLanguageModel.for_inference(model)

def ask_v2_model(question: str, max_new_tokens: int = 1024) -> str:
    """Ask the v2 model a question."""
    prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{V2_SYSTEM_PROMPT}<|eot_id|><|start_header_id|>user<|end_header_id|>

{question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

"""

    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

    outputs = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=False)
    # Extract assistant response
    assistant_start = response.rfind("<|start_header_id|>assistant<|end_header_id|>")
    if assistant_start != -1:
        response = response[assistant_start + len("<|start_header_id|>assistant<|end_header_id|>"):]
        response = response.replace("<|eot_id|>", "").strip()

    return response

In [None]:
# Test: Chain-of-thought reasoning
print("Test 1: Chain-of-thought reasoning")
print("="*50)
response = ask_v2_model(
    "We need to allow our web servers to access a new payment API. Walk me through the security considerations and what compliance requirements apply."
)
print(response)

In [None]:
# Test: Function calling
print("\nTest 2: Function calling")
print("="*50)
response = ask_v2_model(
    "Check if traffic from 10.1.1.100 to our database server 192.168.50.10 on port 5432 is allowed, and tell me the security implications."
)
print(response)

In [None]:
# Test: Compliance knowledge
print("\nTest 3: Compliance knowledge")
print("="*50)
response = ask_v2_model(
    "What are the key PCI-DSS requirements for firewall configuration, and how can FireWeave help us achieve compliance?"
)
print(response)

In [None]:
# Test: Attack path analysis
print("\nTest 4: Attack path analysis")
print("="*50)
response = ask_v2_model(
    "Explain how to analyze attack paths from the internet to our internal database tier, and what factors determine blast radius."
)
print(response)

## 8. Save the Model

In [None]:
# Save LoRA adapter
model.save_pretrained("/content/network-security-expert-v2-lora")
tokenizer.save_pretrained("/content/network-security-expert-v2-lora")
print("LoRA adapter saved!")

In [None]:
# Merge and save as 16-bit
print("Merging LoRA with base model...")
model.save_pretrained_merged(
    "/content/network-security-expert-v2-merged",
    tokenizer,
    save_method="merged_16bit"
)
print("Merged model saved!")

## 9. Export to GGUF

In [None]:
# Convert to GGUF for Ollama deployment
print("Converting to GGUF format...")
print("This will create Q4_K_M, Q5_K_M, and Q8_0 versions.")

model.save_pretrained_gguf(
    "/content/network-security-expert-v2-gguf",
    tokenizer,
    quantization_method=["q4_k_m", "q5_k_m", "q8_0"]
)

print("\nGGUF files created:")
!ls -lh /content/network-security-expert-v2-gguf/*.gguf

In [None]:
# Copy to Google Drive for download
!mkdir -p /content/drive/MyDrive/Finetuning/v2/models/gguf
!cp /content/network-security-expert-v2-gguf/*.gguf /content/drive/MyDrive/Finetuning/v2/models/gguf/
print("GGUF files copied to Google Drive!")

## 10. Create Modelfile for Ollama

In [None]:
# Generate Modelfile for v2
v2_modelfile = f'''# Ollama Modelfile for Network Security Expert v2
#
# This version focuses on:
# - Conceptual/theoretical understanding
# - Chain-of-thought reasoning
# - FireWeave orchestration
# - Compliance framework expertise

FROM ./Llama-3.1-8B-Instruct.Q4_K_M.gguf

# Llama 3 Chat Template
TEMPLATE """{{{{ if .System }}}}<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{{{ .System }}}}<|eot_id|>{{{{ end }}}}{{{{ if .Prompt }}}}<|start_header_id|>user<|end_header_id|>

{{{{ .Prompt }}}}<|eot_id|>{{{{ end }}}}<|start_header_id|>assistant<|end_header_id|>

{{{{ .Response }}}}<|eot_id|>"""

# Stop tokens
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|reserved_special_token"

# Generation parameters
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER repeat_penalty 1.1
PARAMETER num_ctx 4096

# V2 System Prompt
SYSTEM """{V2_SYSTEM_PROMPT}"""
'''

# Save Modelfile
with open('/content/network-security-expert-v2-gguf/Modelfile', 'w') as f:
    f.write(v2_modelfile)

# Copy to Drive
!cp /content/network-security-expert-v2-gguf/Modelfile /content/drive/MyDrive/Finetuning/v2/models/

print("Modelfile created!")
print("\nTo import to Ollama:")
print("  cd /path/to/gguf/folder")
print("  ollama create network-security-expert-v2 -f Modelfile")

## 11. Upload to Hugging Face (Optional)

In [None]:
# Optional: Upload to Hugging Face Hub
# Uncomment and run if you want to share the model

# !pip install huggingface_hub
# from huggingface_hub import login, HfApi, create_repo

# login()  # Enter your HF token

# HF_USERNAME = "YOUR_USERNAME"  # Change this!
# REPO_NAME = "network-security-expert-v2-gguf"

# repo_id = f"{HF_USERNAME}/{REPO_NAME}"
# create_repo(repo_id, repo_type="model", exist_ok=True)

# api = HfApi()
# api.upload_folder(
#     folder_path="/content/network-security-expert-v2-gguf",
#     repo_id=repo_id,
#     repo_type="model",
#     commit_message="Upload Network Security Expert v2 GGUF"
# )

# print(f"Uploaded to: https://huggingface.co/{repo_id}")

## Summary

**v2 Model Training Complete!**

Key differences from v1:
- LoRA rank: 32 (vs 16) for better conceptual understanding
- Context length: 4096 (vs 2048) for reasoning chains
- Learning rate: 1e-4 (vs 2e-4) for stability
- Training data: ~20,000 conceptual examples with chain-of-thought

**Next Steps:**
1. Download GGUF file from Google Drive
2. Place in `v2/models/gguf/`
3. Run: `ollama create network-security-expert-v2 -f Modelfile`
4. Test: `ollama run network-security-expert-v2`

**Test Questions:**
- "Explain the principle of least privilege and how it applies to firewall rules"
- "What are the PCI-DSS requirements for network segmentation?"
- "Analyze the attack path from our DMZ to the database tier"
- "Check if traffic from 10.0.0.1 to 192.168.1.50 on port 443 is allowed"