# Fine-tuning Llama 3.1 8B for Network Security Expert v2

This notebook fine-tunes Llama 3.1 8B Instruct to create a specialized **Network Security Expert AI** with:
- **Advanced tool calling** using native Llama 3.1 format (`<|python_tag|>`)
- **FireWeave orchestration** capabilities
- **Infosec conversational expertise**

**Training Configuration (Optimized for Tool Calling):**
- LoRA: r=16, alpha=32, dropout=0.05
- Learning rate: 3e-4 with cosine scheduler
- Max sequence length: 4096 for multi-turn conversations
- Dataset: 17,341 examples (41% tool calling, 59% conversational)

**Runtime:** Use GPU (T4/A100) - Runtime > Change runtime type > GPU

## 1. Install Dependencies

Run this cell if packages aren't installed yet.

In [None]:
# Install Unsloth and dependencies
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

## 2. Load Model

Load Llama 3.1 8B Instruct with 4-bit quantization.

In [None]:
from unsloth import FastLanguageModel
import torch

# Configuration (Optimized for Tool Calling)
max_seq_length = 4096  # Increased for multi-turn tool calling conversations
dtype = None  # Auto-detect (float16 for T4, bfloat16 for Ampere+)
load_in_4bit = True  # Use 4-bit quantization for QLoRA

# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3.1-8b-Instruct-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

print(f"‚úì Model loaded: Llama 3.1 8B Instruct (4-bit)")
print(f"‚úì Max sequence length: {max_seq_length}")
print(f"‚úì Data type: {dtype if dtype else 'Auto'}")

## 3. Configure LoRA

Add LoRA adapters for parameter-efficient fine-tuning.

In [None]:
# Add LoRA adapters (Optimized for Tool Calling)
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,  # LoRA rank
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 32,  # 2x rank for better learning
    lora_dropout = 0.05,  # Regularization for tool calling
    bias = "none",
    use_gradient_checkpointing = "unsloth",  # Memory efficient
    random_state = 42,
    use_rslora = False,
    loftq_config = None,
)

print("‚úì LoRA adapters configured (Optimized for Tool Calling)")
print(f"  - Rank: 16")
print(f"  - Alpha: 32 (2x rank)")
print(f"  - Dropout: 0.05")
print(f"  - Target modules: All attention and MLP layers")
print(f"  - Trainable parameters: ~{sum(p.numel() for p in model.parameters() if p.requires_grad) / 1e6:.1f}M")

## 4. Load Dataset

Load your network security training data in ChatML/ShareGPT format.

In [None]:
from datasets import load_dataset
import json

# Upload your training data to Colab first:
# 1. Click the folder icon on the left
# 2. Upload: v2/data/processed/training_data_final.json (76.9 MB)

dataset_path = "training_data_final.json"

try:
    dataset = load_dataset("json", data_files=dataset_path, split="train")
    print(f"‚úì Dataset loaded: {len(dataset)} examples")
    
    # Count tool calling vs conversational
    tool_count = sum(1 for ex in dataset if '<|python_tag|>' in str(ex.get('conversations', [])))
    print(f"  - Tool calling: {tool_count} ({100*tool_count/len(dataset):.1f}%)")
    print(f"  - Conversational: {len(dataset) - tool_count}")
    
    # Show a sample
    print("\nSample conversation:")
    print("-" * 80)
    sample = dataset[0]['conversations']
    for msg in sample[:2]:
        role = msg['from']
        text = msg['value'][:200]
        print(f"{role.upper()}: {text}...\n")
    
except FileNotFoundError:
    print(f"‚ùå Dataset not found at {dataset_path}")
    print("\nPlease upload your training data:")
    print("1. Click folder icon on left panel")
    print("2. Upload: v2/data/processed/training_data_final.json")
    raise

## 5. Format Dataset for Training

Apply Llama 3 chat template to the dataset.

In [None]:
from unsloth.chat_templates import get_chat_template
import json

# Apply Llama 3.1 chat template
tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)

# System prompt for tool calling
SYSTEM_PROMPT = """You are a Network Security Expert AI with FireWeave orchestration capabilities.

Available tools: check_traffic_flow, analyze_attack_path, run_compliance_scan, find_shadowed_rules, create_firewall_rule, get_rule_hit_count, calculate_blast_radius, fetch_jira_issues

When calling tools, use the format: <|python_tag|>{"name": "tool_name", "parameters": {...}}

Provide accurate, detailed technical guidance with specific commands and configurations."""

def formatting_prompts_func(examples):
    """Format conversations for Llama 3.1 native tool calling."""
    conversations = examples["conversations"]
    tools_list = examples.get("tools", [None] * len(conversations))
    texts = []
    
    for convo, tools in zip(conversations, tools_list):
        # Build text manually for proper tool calling format
        text = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n"
        text += SYSTEM_PROMPT
        
        # Add tool definitions if available
        if tools:
            text += "\n\nAvailable tools:\n"
            text += json.dumps(tools, indent=2)
        
        text += "<|eot_id|>"
        
        for turn in convo:
            role = turn.get("from", "")
            value = turn.get("value", "")
            
            if role == "human":
                text += f"<|start_header_id|>user<|end_header_id|>\n\n{value}<|eot_id|>"
            elif role == "gpt":
                # Check if this is a tool call (contains <|python_tag|>)
                if "<|python_tag|>" in value:
                    # Tool call ends with <|eom_id|> (end of message, expecting tool response)
                    text += f"<|start_header_id|>assistant<|end_header_id|>\n\n{value}<|eom_id|>"
                else:
                    # Regular response ends with <|eot_id|>
                    text += f"<|start_header_id|>assistant<|end_header_id|>\n\n{value}<|eot_id|>"
            elif role == "tool":
                # Tool response uses ipython role
                text += f"<|start_header_id|>ipython<|end_header_id|>\n\n{value}<|eot_id|>"
        
        texts.append(text)
    
    return {"text": texts}

# Apply formatting
dataset = dataset.map(formatting_prompts_func, batched=True)

print("‚úì Native Llama 3.1 tool calling format applied")
print("\nFormatted example (first 800 chars):")
print("-" * 80)
print(dataset[0]['text'][:800] + "...")

## 6. Configure Training

Set up training hyperparameters.

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments

# Training configuration (Optimized for Tool Calling)
training_args = TrainingArguments(
    # Output
    output_dir = "outputs/network-security-v2",
    
    # Batch size & accumulation
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 4,  # Effective batch size = 8
    
    # Training duration
    num_train_epochs = 3,
    
    # Learning rate (Optimized for tool calling)
    learning_rate = 3e-4,  # Higher LR for tool calling
    lr_scheduler_type = "cosine",  # Cosine decay
    warmup_ratio = 0.03,  # 3% warmup
    
    # Optimization
    weight_decay = 0.01,
    optim = "adamw_8bit",
    
    # Logging & saving
    logging_steps = 25,
    save_strategy = "steps",
    save_steps = 500,
    save_total_limit = 2,
    
    # Mixed precision
    fp16 = not torch.cuda.is_bf16_supported(),
    bf16 = torch.cuda.is_bf16_supported(),
    
    # Misc
    seed = 42,
    report_to = "none",
)

print("‚úì Training configuration (Optimized for Tool Calling):")
print(f"  - Epochs: {training_args.num_train_epochs}")
print(f"  - Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print(f"  - Learning rate: {training_args.learning_rate}")
print(f"  - LR Scheduler: {training_args.lr_scheduler_type}")
print(f"  - Warmup ratio: {training_args.warmup_ratio}")
print(f"  - Mixed precision: {'BF16' if training_args.bf16 else 'FP16'}")

## 7. Initialize Trainer

Create the SFTTrainer for supervised fine-tuning.

In [None]:
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False,  # Can make training 5x faster for short sequences
    args = training_args,
)

print("‚úì Trainer initialized")
print(f"\nTraining {len(dataset)} examples")
print(f"Estimated training time: ~{len(dataset) * 3 / 3600:.1f} hours (very rough estimate)")

## 8. Start Training

Begin the fine-tuning process. Monitor the loss - it should decrease over time.

**Target loss:** 0.5-1.0 is generally good
**Red flags:**
- Loss not decreasing ‚Üí adjust learning rate
- Loss near 0 ‚Üí overfitting, reduce epochs

In [None]:
# Start training
print("Starting training...")
print("Watch the loss values - they should decrease over time.\n")

trainer_stats = trainer.train()

print("\n" + "="*80)
print("‚úÖ TRAINING COMPLETE!")
print("="*80)
print(f"Final loss: {trainer_stats.training_loss:.4f}")
print(f"Training time: {trainer_stats.metrics['train_runtime']:.2f} seconds")

## 9. Test the Model

Try out your fine-tuned model with some network security questions.

In [None]:
# Enable inference mode
FastLanguageModel.for_inference(model)

# Test questions
test_questions = [
    "How do I configure port security on a Cisco switch?",
    "Explain the difference between AWS Security Groups and Network ACLs.",
    "What Snort rules would detect SQL injection attempts?",
    "My VPN tunnel keeps dropping. How do I troubleshoot this?",
]

print("Testing the fine-tuned model...\n")
print("="*80)

for i, question in enumerate(test_questions, 1):
    print(f"\nTest {i}/{len(test_questions)}")
    print("-"*80)
    print(f"Question: {question}\n")
    
    # Format as chat
    messages = [
        {"role": "user", "content": question}
    ]
    
    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to("cuda")
    
    # Generate response
    outputs = model.generate(
        input_ids=inputs,
        max_new_tokens=512,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )
    
    # Decode and print
    response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
    # Extract just the assistant's response
    response = response.split("assistant\n\n")[-1] if "assistant" in response else response
    
    print(f"Answer:\n{response}")
    print("\n" + "="*80)

print("\n‚úÖ Testing complete!")

## 10. Save the Model

Save the LoRA adapter (small ~100-200MB file).

In [None]:
# Save locally
model.save_pretrained("../models/network-security-lora")
tokenizer.save_pretrained("../models/network-security-lora")

print("‚úì Model saved locally to: models/network-security-lora")
print("\nOptional: Upload to Hugging Face Hub")
print("Uncomment and run the cell below to upload.")

In [None]:
# Optional: Push to Hugging Face Hub
# Replace 'your-username' with your HF username

# model.push_to_hub("your-username/llama3-network-security-lora", token="YOUR_HF_TOKEN")
# tokenizer.push_to_hub("your-username/llama3-network-security-lora", token="YOUR_HF_TOKEN")

# print("‚úì Model uploaded to Hugging Face Hub!")

## 11. Export to GGUF for Ollama

Convert to GGUF format for use with Ollama on your local machine.

In [None]:
# First, save merged 16-bit model
print("Step 1: Merging LoRA with base model...")
model.save_pretrained_merged(
    "models/merged-16bit",
    tokenizer,
    save_method="merged_16bit"
)
print("‚úì Merged model saved\n")

# Convert to GGUF with multiple quantization levels
print("Step 2: Converting to GGUF format...")
print("This will create 3 quantized versions (Q4_K_M, Q5_K_M, Q8_0)\n")

model.save_pretrained_gguf(
    "models/gguf",
    tokenizer,
    quantization_method=["q4_k_m", "q5_k_m", "q8_0"]
)

print("\n" + "="*80)
print("‚úÖ GGUF CONVERSION COMPLETE!")
print("="*80)
print("\nCreated files:")
print("  - models/gguf/unsloth.Q4_K_M.gguf (~4.5GB) - Fastest, good quality")
print("  - models/gguf/unsloth.Q5_K_M.gguf (~5.5GB) - Balanced [RECOMMENDED]")
print("  - models/gguf/unsloth.Q8_0.gguf (~8GB) - Highest quality")
print("\nNext steps:")
print("1. Download the Q4_K_M or Q5_K_M file")
print("2. Rename to: network-security-expert-v2.Q4_K_M.gguf")
print("3. Place in v2/models/gguf/ folder")
print("4. Run: cd v2/models && ollama create network-security-expert-v2 -f Modelfile")
print("5. Test: ollama run network-security-expert-v2")

## Summary

Congratulations! You've fine-tuned Llama 3 8B to be a Network Security expert.

### What You've Done:
1. ‚úÖ Loaded Llama 3.1 8B Instruct with 4-bit quantization
2. ‚úÖ Configured LoRA adapters for efficient training
3. ‚úÖ Trained on your network security dataset
4. ‚úÖ Tested the model with example questions
5. ‚úÖ Saved the LoRA adapter
6. ‚úÖ Converted to GGUF for Ollama deployment

### Next Steps:

**To use with Ollama:**
```bash
# 1. Create Modelfile (see models/Modelfile)
cd models
ollama create network-security-expert -f Modelfile

# 2. Run your model
ollama run network-security-expert
```

**To improve your model:**
- Generate more training data for weak areas
- Increase training epochs if underfitting
- Add more diverse scenarios and edge cases
- Combine with RAG for up-to-date CVE information

**Questions or issues?**
- Check the plan file for troubleshooting tips
- Review training loss curves
- Validate your dataset quality

üéâ Happy network securing!