[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Binkerton13/cyber-ml-training/blob/main/scenarios/scenario_03/notebook.ipynb)

# Scenario 03B — AWS IAM & CloudTrail API Abuse (SOC Only)

In this scenario, you will investigate suspicious activity in an AWS environment using:

- **IAM events** (role changes, policy updates, access key activity)
- **CloudTrail API calls** (enumeration, data access, privilege misuse)
- Optionally, **network events** if lateral movement or exfiltration is present

Your goals:

1. Identify suspicious IAM and API activity.
2. Extract Indicators of Compromise (IOCs).
3. Map activity to MITRE ATT&CK techniques (cloud‑relevant).
4. Write a clear triage summary of what happened.
5. Save your findings in a structured JSON file for grading.

This is a **SOC‑only** scenario — no ML modeling here.
You will rely on investigative reasoning, log analysis, and ATT&CK mapping.

## 1. Configure repository path

We will load logs directly from the GitHub repository using a **fixed repo root**.

- When this repo moves to a new org or user, update **only** the `repo_root` line.
- The rest of the notebook will continue to work without modification.

**Instructor note:** This avoids relying on unstable Colab metadata and keeps the notebook portable.

In [None]:
# TODO (Instructor when migrating repos):
# If this repo moves to a new GitHub org or user, update ONLY this line:
repo_root = "https://raw.githubusercontent.com/Binkerton13/cyber-ml-training/main"
scenario_path = "scenarios/scenario_03"
log_base = f"{repo_root}/{scenario_path}/logs/"
log_base

## 2. Load AWS IAM and CloudTrail logs

We will load:

- `cloud_iam.csv` — IAM‑related events (role changes, policy updates, access keys, MFA changes).
- `cloud_api.csv` — CloudTrail API calls (enumeration, data access, privilege use).
- Optionally, `network.csv` — if present, may show exfiltration or unusual egress.

**Your focus:**

- Understand what each log represents.
- Identify key fields: user, role, action, resource, region, timestamp.

In [None]:
import pandas as pd

iam_path = log_base + "cloud_iam.csv"
api_path = log_base + "cloud_api.csv"

cloud_iam_df = pd.read_csv(iam_path)
cloud_api_df = pd.read_csv(api_path)

cloud_iam_df.head()

In [None]:
cloud_api_df.head()

In [None]:
# Optional: load network logs if present
try:
    net_path = log_base + "network.csv"
    network_df = pd.read_csv(net_path)
    network_df.head()
except Exception as e:
    network_df = None
    print("No network.csv found or failed to load (this is OK for this scenario).", e)

## 3. Normalize timestamps

AWS logs often include ISO‑8601 timestamps with timezone information.

We will:

- Parse timestamps as timezone‑aware datetimes.
- Extract useful time‑based features (e.g., hour of day, date).

**Why this matters:**

- Attackers may operate at unusual times.
- Time correlation between IAM and API events is critical for understanding the attack chain.

In [None]:
# Normalize timestamps in IAM logs
cloud_iam_df['timestamp'] = pd.to_datetime(
    cloud_iam_df['timestamp'].astype(str).str.replace('Z', '', regex=False),
    utc=True,
    errors='coerce'
)
cloud_iam_df['date'] = cloud_iam_df['timestamp'].dt.date
cloud_iam_df['hour'] = cloud_iam_df['timestamp'].dt.hour

# Normalize timestamps in API logs
cloud_api_df['timestamp'] = pd.to_datetime(
    cloud_api_df['timestamp'].astype(str).str.replace('Z', '', regex=False),
    utc=True,
    errors='coerce'
)
cloud_api_df['date'] = cloud_api_df['timestamp'].dt.date
cloud_api_df['hour'] = cloud_api_df['timestamp'].dt.hour

cloud_iam_df[['timestamp', 'hour']].head()

In [None]:
cloud_api_df[['timestamp', 'hour']].head()

## 4. Initial SOC investigation — IAM events

Start by examining IAM events for signs of:

- New role creation or modification
- Policy attachment that increases privileges
- MFA disable events
- Access key creation for high‑privilege users

    
**Your job:**

- Identify IAM events that look suspicious.
- Pay attention to which user or role performed the action, and when.

Use the starter cells below as a base and extend them with your own queries.

In [None]:
# Example: look at IAM events by event_name
cloud_iam_df['event_name'].value_counts().head(20)

In [None]:
# TODO: Explore IAM events for suspicious activity
# Ideas:
# - Filter for events like CreateAccessKey, AttachRolePolicy, UpdateAssumeRolePolicy, DeactivateMFADevice
# - Group by user or role to see who is making changes
# - Look at events occurring close together in time

# Write your exploration code here.


## 5. Initial SOC investigation — CloudTrail API calls

Next, examine CloudTrail API calls for signs of:

- Unusual enumeration (e.g., ListUsers, ListRoles, ListBuckets)
- Data access (e.g., GetObject on sensitive S3 buckets)
- Activity from unusual regions
- Activity from unexpected IAM roles or assumed roles

**Your job:**

- Identify API calls that look out of place or risky.
- Correlate them with IAM events when possible (same user/role, similar time).

In [None]:
# Example: top API actions
cloud_api_df['event_name'].value_counts().head(20)

In [None]:
# TODO: Explore CloudTrail API calls for suspicious behavior
# Ideas:
# - Filter for List*, Describe*, GetObject, AssumeRole, GetCallerIdentity
# - Look at calls from unusual regions
# - Look at calls made by roles that recently changed in IAM

# Write your exploration code here.


## 6. Optional — Network perspective (if available)

If `network.csv` is present, you can:

- Look for outbound connections to unusual destinations.
- Correlate timing with data access events (e.g., S3 downloads followed by egress).

This step is optional and depends on whether the scenario includes network logs.

In [None]:
# TODO: If network_df is available, explore for possible exfiltration or unusual egress.
# Example ideas:
# - Look at connections to rare IPs or regions
# - Look at spikes in outbound traffic

if network_df is not None:
    display(network_df.head())
    # Add your exploration code here.
else:
    print("No network data available for this scenario.")

## 7. Extract Indicators of Compromise (IOCs)

Based on your investigation, extract IOCs such as:

- Suspicious IAM users or roles
- Suspicious AWS account IDs
- Suspicious IP addresses (if present)
- Suspicious ARNs (roles, policies, resources)

**Your job:**

- Build a list of IOCs that you believe are relevant to this scenario.
- These should be values you might use in detections, alerts, or block lists.

In [None]:
# TODO: Populate this list with your identified IOCs.
# Examples (replace with your findings):
# ioc_list = [
#     "arn:aws:iam::123456789012:user/suspicious_user",
#     "arn:aws:iam::123456789012:role/CompromisedRole",
#     "198.51.100.10"  # suspicious IP, if present
# ]

ioc_list = []  # Replace with your IOCs
ioc_list

## 8. MITRE ATT&CK mapping (Cloud‑relevant)

Map the observed behavior to MITRE ATT&CK techniques.

Some cloud‑relevant examples (not exhaustive):

- `T1078` — Valid Accounts (use of compromised IAM user/role)
- `T1098` — Account Manipulation (adding keys, changing policies)
- `T1087` — Account Discovery (ListUsers, ListRoles)
- `T1530` — Data from Cloud Storage Object (S3 GetObject)
- `T1110` — Brute Force (if applicable)

**Your job:**

- Choose one or more ATT&CK technique IDs that best describe the activity.
- Store them as a list of strings.

In [None]:
# TODO: Add relevant MITRE ATT&CK technique IDs.
# Example (replace with your choices):
# mitre_mapping = ["T1078", "T1098", "T1530"]

mitre_mapping = []  # Replace with your chosen techniques
mitre_mapping

## 9. Triage summary

Write a short narrative summary of what you believe happened in this AWS environment.

Consider including:

- How initial access was obtained (if visible)
- How privileges were escalated or misused
- What resources were accessed or exfiltrated
- Key IOCs and MITRE techniques
- Your assessment of impact and severity

**Your job:**

- Write a concise but clear triage summary in your own words.

In [None]:
# TODO: Write your triage summary here.

triage_summary = ""  # Replace with your summary
triage_summary

## 10. Save SOC output

This cell saves your work in a structured format for automated grading.

**Do not change the keys or filename.**

You may change the *values* by updating your work in previous cells.

In [None]:
import os
import json

os.makedirs("student_output", exist_ok=True)

soc_output = {
    "ioc_list": ioc_list,
    "mitre_mapping": mitre_mapping,
    "triage_summary": triage_summary,
    "detection_rule": "Describe your detection logic here (e.g., suspicious IAM events, API patterns, or regions)."
}

with open("student_output/soc_output.json", "w") as f:
    json.dump(soc_output, f, indent=4)

print("SOC output saved to student_output/soc_output.json")