# üéØ CTI-to-Hunt Logic Fine-Tuning (MacBook Local)

**Goal**: Fine-tune Mistral 7B locally on Apple Silicon to convert CTI text into hunt logic.

**Hardware**: MacBook with MPS (Metal Performance Shaders)
**Model**: Mistral 7B Instruct (optimal for MacBook)

## üîß Step 1: Environment Check

In [9]:
import torch
import sys

print(f"Python version: {sys.version}")
print(f"PyTorch version: {torch.__version__}")
print(f"MPS available: {torch.backends.mps.is_available()}")
print(f"MPS built: {torch.backends.mps.is_built()}")

device = "mps" if torch.backends.mps.is_available() else "cpu"
print(f"Using device: {device}")

Python version: 3.11.5 (v3.11.5:cce6ba91b3, Aug 24 2023, 10:50:31) [Clang 13.0.0 (clang-1300.0.29.30)]
PyTorch version: 2.8.0
MPS available: True
MPS built: True
Using device: mps


## üì¶ Step 2: Load Phi-3-mini-4k-instruct

In [10]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "microsoft/Phi-3-mini-4k-instruct"
print(f"Loading {model_name} (no auth required)...")

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)

if torch.backends.mps.is_available():
    model = model.to("mps")

print("Model loaded successfully!")
print(f"Parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"Device: {next(model.parameters()).device}")

Loading microsoft/Phi-3-mini-4k-instruct (no auth required)...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Model loaded successfully!
Parameters: 3,821,079,552
Device: mps:0


## üß™ Step 3: Test Base Model

In [11]:
test_prompt = "[INST] Convert this cyber threat intelligence into concise hunt logic: The malware establishes persistence by creating a new Windows service named WindowsUpdateService that executes a payload from C:\\Windows\\Temp\\update.exe. Hunt Logic: [/INST]"

inputs = tokenizer(test_prompt, return_tensors="pt")
if torch.backends.mps.is_available():
    inputs = {k: v.to("mps") for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=100,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
hunt_logic = response.split("[/INST]")[-1].strip()

print("üéØ Base Model Output:")
print("=" * 50)
print(hunt_logic)
print("=" * 50)

üéØ Base Model Output:
To create a hunt logic for detecting the mentioned malware behavior, we need to define the indicators of compromise (IoCs) and the logic to search for them. Here's an example of hunt logic that could be used to detect the malware's persistence mechanism:

```powershell
# Define the search parameters
$serviceName = "WindowsUpdateService"
$payloadPath = "C:\Windows\Temp\update.exe"


## üìù Step 4: Training Data

In [None]:
import pandas as pd
from pathlib import Path

# Load your CSV training data
csv_file_path = input("Enter path to your CSV file (or drag and drop): ").strip()

# Remove quotes if drag-and-drop added them
csv_file_path = csv_file_path.strip('"').strip("'")

if Path(csv_file_path).exists():
    print(f"Loading training data from: {csv_file_path}")
    
    # Load CSV
    df = pd.read_csv(csv_file_path)
    print(f"‚úÖ Loaded {len(df)} rows")
    print(f"üìä Columns: {list(df.columns)}")
    
    # Show first few rows to understand format
    print("\nüîç Sample data:")
    print(df.head(3))
    
    # Ask user to identify the columns
    print(f"\nüìù Please identify your columns:")
    input_column = input("Which column contains the CTI text? ").strip()
    output_column = input("Which column contains the hunt logic? (leave empty if none): ").strip()
    
    if input_column in df.columns:
        cti_texts = df[input_column].dropna().tolist()
        
        if output_column and output_column in df.columns:
            hunt_logics = df[output_column].dropna().tolist()
            print(f"‚úÖ Found {len(cti_texts)} CTI texts and {len(hunt_logics)} hunt logics")
            
            # Create training pairs
            training_data = []
            for i, (cti, hunt) in enumerate(zip(cti_texts, hunt_logics)):
                if pd.notna(cti) and pd.notna(hunt):
                    training_data.append({
                        "input": f"<|user|>Convert this threat intelligence into concise hunt logic: {cti}<|end|><|assistant|>",
                        "output": str(hunt)
                    })
            
            print(f"üéØ Created {len(training_data)} training examples")
            
        else:
            print(f"‚úÖ Found {len(cti_texts)} CTI texts (no hunt logic column)")
            print("üí° You can manually add hunt logic or use the model to generate examples")
            
            # Show some examples for manual labeling
            print("\nüìã First 3 examples for manual review:")
            for i, text in enumerate(cti_texts[:3]):
                print(f"\n{i+1}. CTI: {text[:200]}...")
                
    else:
        print(f"‚ùå Column '{input_column}' not found in CSV")
        
else:
    print(f"‚ùå File not found: {csv_file_path}")
    print("üí° Make sure the path is correct or drag and drop the file")

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output

def create_hunt_logic_interface():
    # Create input widget
    input_text = widgets.Textarea(
        value='',
        placeholder='Paste your CTI text here (e.g., "The malware creates a service...")',
        description='CTI Input:',
        layout=widgets.Layout(width='100%', height='150px')
    )
    
    # Create output widget
    output_area = widgets.Output()
    
    # Create button
    generate_button = widgets.Button(
        description='üéØ Generate Hunt Logic',
        button_style='primary',
        layout=widgets.Layout(width='200px')
    )
    
    # Create clear button
    clear_button = widgets.Button(
        description='üóëÔ∏è Clear',
        button_style='warning',
        layout=widgets.Layout(width='100px')
    )
    
    def on_generate_click(b):
        with output_area:
            clear_output()
            
            cti_input = input_text.value.strip()
            if not cti_input:
                print("‚ùå Please enter some CTI text first!")
                return
                
            print("üîÑ Generating hunt logic...")
            print(f"üìù Input: {cti_input[:100]}{'...' if len(cti_input) > 100 else ''}")
            print("\n" + "="*50)
            
            try:
                # Format prompt for Phi-3
                prompt = f"<|user|>Convert this cyber threat intelligence into concise hunt logic:\n\n{cti_input}\n\nHunt Logic:<|end|><|assistant|>"
                
                # Tokenize and generate
                inputs = tokenizer(prompt, return_tensors="pt")
                if torch.backends.mps.is_available():
                    inputs = {k: v.to("mps") for k, v in inputs.items()}
                
                with torch.no_grad():
                    outputs = model.generate(
                        **inputs,
                        max_new_tokens=150,
                        temperature=0.3,
                        do_sample=True,
                        pad_token_id=tokenizer.eos_token_id,
                        eos_token_id=tokenizer.eos_token_id
                    )
                
                # Decode response
                full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
                hunt_logic = full_response.split("<|assistant|>")[-1].strip()
                
                print("üéØ HUNT LOGIC GENERATED:")
                print("-" * 30)
                print(hunt_logic)
                print("-" * 30)
                print(f"‚úÖ Generated {len(hunt_logic)} characters")
                
            except Exception as e:
                print(f"‚ùå Error generating hunt logic: {str(e)}")
    
    def on_clear_click(b):
        input_text.value = ''
        with output_area:
            clear_output()
    
    # Bind functions to buttons
    generate_button.on_click(on_generate_click)
    clear_button.on_click(on_clear_click)
    
    # Create layout
    buttons = widgets.HBox([generate_button, clear_button])
    interface = widgets.VBox([
        widgets.HTML("<h3>üéØ CTI-to-Hunt Logic Generator</h3>"),
        input_text,
        buttons,
        output_area
    ])
    
    return interface

# Create and display the interface
print("üöÄ Creating interactive CTI-to-Hunt Logic interface...")
interface = create_hunt_logic_interface()
display(interface)
print("‚úÖ Interface ready! Paste CTI text above and click 'Generate Hunt Logic'")

In [12]:
training_examples = [
    {
        "input": "[INST] Convert this threat intelligence into concise hunt logic: The malware creates a scheduled task named SystemUpdate that runs C:\\Users\\Public\\svchost.exe every 5 minutes with SYSTEM privileges. Hunt Logic: [/INST]",
        "output": "Scheduled Task Creation: Name=SystemUpdate\nProcess Execution: C:\\Users\\Public\\svchost.exe\nPrivilege Escalation: SYSTEM context\nPersistence: Auto-start"
    },
    {
        "input": "[INST] Convert this threat intelligence into concise hunt logic: Network traffic shows malware communicating with 192.168.1.100:8080 via HTTP POST with Mozilla User-Agent and base64 encoded data. Hunt Logic: [/INST]",
        "output": "Network Connection: 192.168.1.100:8080\nProtocol: HTTP POST\nUser-Agent: Mozilla/5.0\nData Encoding: Base64"
    }
]

print(f"üìä Training examples ready: {len(training_examples)}")

üìä Training examples ready: 2


## üéØ Step 5: Setup Complete

Your MacBook fine-tuning environment is ready!
- Mistral 7B loaded with MPS acceleration
- Training data in Mistral format
- Ready for LoRA fine-tuning