# Scenario 01 — Suspicious Login + Lateral Movement (Guided)

Welcome to Scenario 01. In this guided notebook, you will:

1. Load authentication logs
2. Perform SOC-style triage
3. Extract Indicators of Compromise (IOCs)
4. Map activity to MITRE ATT&CK
5. Load historical login data
6. Engineer features for anomaly detection
7. Train a simple ML model
8. Score the suspicious login
9. Generate your SOC + ML outputs
10. Push results back to GitHub

This notebook is fully guided — each step explains what you're doing and why.

## Step 1 — Setup
This step loads required libraries and mounts your GitHub repository.

If you're running this in Google Colab, you'll authenticate with GitHub.

In [None]:
import pandas as pd
import numpy as np
from sklearn.ensemble import IsolationForest
import json
import os

print("Libraries loaded.")

## Step 2 — Load Authentication Logs (SOC Analysis)

These logs contain normal activity **and** a suspicious login event.

Your job: identify what looks unusual.

In [None]:
auth_log_path = "scenarios/scenario_01/logs/auth_log.csv"
auth_df = pd.read_csv(auth_log_path)
auth_df

### Step 2A — Identify Suspicious Activity

Look for:
- New IP addresses
- New devices
- Odd login hours
- New destination hosts
- Unusual geo locations

Fill in your observations below.

In [None]:
suspicious_events = auth_df[auth_df['source_ip'] != '10.0.1.15']
suspicious_events

### Step 2B — Extract IOCs

Fill in the IOC list manually or programmatically.

In [None]:
ioc_list = list(suspicious_events['source_ip'].unique())
ioc_list

## Step 3 — MITRE ATT&CK Mapping

Based on the suspicious activity, map relevant techniques.

Examples:
- T1078 — Valid Accounts
- T1021 — Remote Services
- T1110 — Brute Force

Add your mappings below.

In [None]:
mitre_mapping = ["T1078", "T1021"]
mitre_mapping

## Step 4 — Load Historical Login Dataset (ML Analysis)

This dataset represents normal login behavior for the user.

In [None]:
hist_path = "datasets/synthetic/historical_logins.csv"
hist_df = pd.read_csv(hist_path)
hist_df

## Step 5 — Feature Engineering

We will use simple features:
- Login hour
- Geo location encoded
- Device encoded

This is intentionally simple for learning purposes.

In [None]:
def encode_column(df, col):
    return df[col].astype('category').cat.codes

hist_df['geo_enc'] = encode_column(hist_df, 'geo_location')
hist_df['device_enc'] = encode_column(hist_df, 'device_id')

features = hist_df[['hour', 'geo_enc', 'device_enc']]
features

## Step 6 — Train Anomaly Detection Model

We use Isolation Forest because it's simple and works well for outlier detection.

In [None]:
model = IsolationForest(contamination=0.05, random_state=42)
model.fit(features)
print("Model trained.")

## Step 7 — Score the Suspicious Login

We encode the suspicious login using the same feature logic.

In [None]:
sus = suspicious_events.iloc[0]

sus_features = pd.DataFrame({
    'hour': [pd.to_datetime(sus['timestamp']).hour],
    'geo_enc': [encode_column(pd.DataFrame({'geo_location':[sus['geo_location']]}), 'geo_location')[0]],
    'device_enc': [encode_column(pd.DataFrame({'device_id':[sus['device_id']]}), 'device_id')[0]]
})

anomaly_score = model.decision_function(sus_features)[0]
anomaly_score

## Step 8 — Save SOC + ML Outputs

These files will be evaluated automatically by GitHub Actions.

In [None]:
os.makedirs("student_output", exist_ok=True)

soc_output = {
    "ioc_list": ioc_list,
    "mitre_mapping": mitre_mapping,
    "triage_summary": "Suspicious login from new IP, new device, new geo, odd hour.",
    "detection_rule": "source_ip == '185.199.110.153'"
}

ml_output = {
    "anomaly_score": float(anomaly_score),
    "model_used": "IsolationForest",
    "features": ["hour", "geo_enc", "device_enc"],
    "explanation": "Higher anomaly score indicates unusual login behavior."
}

with open("student_output/soc_output.json", "w") as f:
    json.dump(soc_output, f, indent=4)

with open("student_output/ml_output.json", "w") as f:
    json.dump(ml_output, f, indent=4)

print("Outputs saved.")