# Scenario Template — SOC + ML Hybrid Notebook

This notebook is a template for creating new training scenarios.

## Instructions for Scenario Authors
- Replace all placeholder text marked with **AUTHOR TODO**.
- Do not remove required sections (Setup, SOC Analysis, ML Analysis, Output Generation).
- Ensure your data generator writes logs to `logs/` and an answer key to `evaluation/answer_key.json`.
- Ensure evaluation scripts expect the outputs generated in the final section.

## Instructions for Students
You will:
1. Load scenario data
2. Perform SOC-style log analysis
3. Perform ML-based anomaly detection or classification
4. Generate structured outputs for automated grading

Follow each step carefully. Cells marked **STUDENT TASK** require your input.

## Step 1 — Scenario Overview

**AUTHOR TODO:** Replace this with the narrative for your scenario.

Example:
> A user account has exhibited unusual login behavior. You must determine whether this activity is malicious using both SOC analysis and ML modeling.

## Step 2 — Setup

This cell loads required libraries and sets up paths.

Students should not modify this unless instructed.

In [ ]:
import pandas as pd
import numpy as np
import json
import os
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import LabelEncoder

print("Libraries loaded.")

## Step 3 — Load Scenario Data

**AUTHOR TODO:** Update the file paths below to match your scenario's log files.

**STUDENT TASK:** Review the logs and begin identifying suspicious activity.

In [ ]:
# AUTHOR TODO: Update this path
log_path = "logs/generated_logs.csv"

df = pd.read_csv(log_path)
df.head()

## Step 4 — SOC Analysis

In this section, you will:
- Identify suspicious events
- Extract Indicators of Compromise (IOCs)
- Map activity to MITRE ATT&CK
- Write a detection rule

### STUDENT TASK: Fill in the analysis below.

In [ ]:
# STUDENT TASK: Identify suspicious events
suspicious_df = df[df['source_ip'] != df['source_ip'].mode()[0]]  # Example heuristic
suspicious_df

In [ ]:
# STUDENT TASK: Extract IOCs
ioc_list = list(suspicious_df['source_ip'].unique())
ioc_list

In [ ]:
# STUDENT TASK: MITRE ATT&CK mapping
mitre_mapping = ["T1078"]  # Example placeholder
mitre_mapping

In [ ]:
# STUDENT TASK: Detection rule
detection_rule = "source_ip == 'REPLACE_ME'"  # Replace with real logic
detection_rule

## Step 5 — ML Analysis

In this section, you will:
- Engineer features
- Train an anomaly detection or classification model
- Score suspicious events

### STUDENT TASK: Implement ML logic below.

In [ ]:
# STUDENT TASK: Feature engineering example
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['hour'] = df['timestamp'].dt.hour

le = LabelEncoder()
df['src_enc'] = le.fit_transform(df['source_ip'])
df['dst_enc'] = le.fit_transform(df['destination_host'])

features = df[['hour', 'src_enc', 'dst_enc']]
features.head()

In [ ]:
# STUDENT TASK: Train model
model = IsolationForest(contamination=0.05, random_state=42)
model.fit(features)
print("Model trained.")

In [ ]:
# STUDENT TASK: Score suspicious event
sus_features = features.loc[suspicious_df.index]
anomaly_score = float(model.decision_function(sus_features).mean())
anomaly_score

## Step 6 — Generate Required Outputs

This section creates the files required for automated grading:
- `soc_output.json`
- `ml_output.json`

### Students should not modify the structure of these files.

In [ ]:
os.makedirs("student_output", exist_ok=True)

soc_output = {
    "ioc_list": ioc_list,
    "mitre_mapping": mitre_mapping,
    "triage_summary": "STUDENT TODO: Summarize findings.",
    "detection_rule": detection_rule
}

ml_output = {
    "anomaly_score": anomaly_score,
    "model_used": "IsolationForest",
    "features": list(features.columns),
    "explanation": "STUDENT TODO: Explain ML findings."
}

with open("student_output/soc_output.json", "w") as f:
    json.dump(soc_output, f, indent=4)

with open("student_output/ml_output.json", "w") as f:
    json.dump(ml_output, f, indent=4)

print("Outputs saved to student_output/")