# üõ°Ô∏è Intermediate Security Analytics Bootcamp (50 Problems)

**Role:** Junior Security Data Scientist
**Objective:** Transform raw security telemetry into actionable intelligence. You will progress through Log Forensics, Network Traffic Analysis, Cryptography, and Behavioral Analysis.

**Datasets Used:**
- `web_server_logs.csv` (HTTP anomalies)
- `network_flows.csv` (C2 Beaconing)
- `encrypted_messages.csv` (Frequency Analysis)
- `fp26-features.csv` (Behavioral Fingerprinting)

---

## PHASE 1: Web Log Forensics & Anomaly Detection (15 Problems)
**Mission:** Hunt for directory brute-forcing and admin access attempts.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime

### Problem 1
**Task:** Load `../../ds-practice/data/web_server_logs.csv` into `df_logs` and display the first 5 rows.

In [None]:
# YOUR CODE HERE

### Problem 2
**Task:** Check the `timestamp` column. Notice it has two different formats (`2026-02-24` and `24/Feb/2026`). Use `pd.to_datetime()` with `format='mixed'` or `errors='coerce'` to standardize it.

In [None]:
# YOUR CODE HERE

### Problem 3
**Task:** Create a new column `hour` extracted from the `timestamp`.

In [None]:
# YOUR CODE HERE

### Problem 4
**Task:** Count the total number of unique IP addresses in the dataset.

In [None]:
# YOUR CODE HERE

### Problem 5
**Task:** Identify the Top 3 most active IP addresses (Top Talkers).

In [None]:
# YOUR CODE HERE

### Problem 6
**Task:** Filter the logs for only `404` status codes (Not Found).

In [None]:
# YOUR CODE HERE

### Problem 7
**Task:** Which IP address has triggered the most 404 errors? (Potential Scanner).

In [None]:
# YOUR CODE HERE

### Problem 8
**Task:** Filter for any requests made to the `/admin` resource.

In [None]:
# YOUR CODE HERE

### Problem 9
**Task:** Group by `ip_address` and count how many times each IP tried to access `/admin`.

In [None]:
# YOUR CODE HERE

### Problem 10
**Task:** Create a column `is_success` which is `True` if `status_code` is 200, else `False`.

In [None]:
# YOUR CODE HERE

### Problem 11
**Task:** Calculate the overall success rate (%) of all requests in the log.

In [None]:
# YOUR CODE HERE

### Problem 12
**Task:** Identify "High Failure IPs" where the success rate is less than 50%.

In [None]:
# YOUR CODE HERE

### Problem 13
**Task:** Use `groupby` on `hour` to see which hour has the highest volume of traffic.

In [None]:
# YOUR CODE HERE

### Problem 14
**Task:** Create a bar chart showing the count of each `request_method` (GET, POST, etc.).

In [None]:
# YOUR CODE HERE

### Problem 15
**Task:** Filter the logs for IPs that requested more than 5 unique resources (Potential Crawler).

In [None]:
# YOUR CODE HERE

---
## PHASE 2: Network Traffic & Beaconing Detection (15 Problems)
**Mission:** Detect Command & Control (C2) heartbeats in network flows.

### Problem 16
**Task:** Load `../../ds-practice/data/network_flows.csv` into `df_flows`.

In [None]:
# YOUR CODE HERE

### Problem 17
**Task:** Convert the `timestamp` to datetime and sort the dataframe by time.

In [None]:
# YOUR CODE HERE

### Problem 18
**Task:** Find the Top 5 Source IPs (`src_ip`) by total `bytes` transferred.

In [None]:
# YOUR CODE HERE

### Problem 19
**Task:** Filter for flows where the protocol is `SSH`.

In [None]:
# YOUR CODE HERE

### Problem 20
**Task:** Calculate the total bytes transferred to the destination IP `8.8.8.8`.

In [None]:
# YOUR CODE HERE

### Problem 21
**Task:** Identify any flows where `duration_ms` is 0 but `bytes` is greater than 0.

In [None]:
# YOUR CODE HERE

### Problem 22
**Task:** Calculate the average bytes per protocol using `groupby`.

In [None]:
# YOUR CODE HERE

### Problem 23
**Task:** Create a new column `byte_intensity` defined as `bytes / (duration_ms + 1)`.

In [None]:
# YOUR CODE HERE

### Problem 24
**Task:** Filter for internal-only traffic (where both `src_ip` and `dest_ip` start with `192.168` or `10.`).

In [None]:
# YOUR CODE HERE

### Problem 25
**Task:** This is advanced: For the most active `src_ip`, calculate the time difference (delta) between consecutive flows using `.diff()`.

In [None]:
# YOUR CODE HERE

### Problem 26
**Task:** What is the average time delta for that IP? If it is very consistent (low standard deviation), it may be a beacon.

In [None]:
# YOUR CODE HERE

### Problem 27
**Task:** Identify any Source IP that communicates with more than 3 unique Destination IPs.

In [None]:
# YOUR CODE HERE

### Problem 28
**Task:** Create a scatter plot of `duration_ms` vs `bytes`.

In [None]:
# YOUR CODE HERE

### Problem 29
**Task:** Find the protocol that accounts for the highest total duration of network activity.

In [None]:
# YOUR CODE HERE

### Problem 30
**Task:** Filter for any flow where the destination IP is external (not 192.168.x.x or 10.x.x.x) and the bytes exceed 5000 (Potential Data Exfiltration).

In [None]:
# YOUR CODE HERE

---
## PHASE 3: Cryptography & Automated Cracking (10 Problems)
**Mission:** Automate the decryption of threat actor communications.

### Problem 31
**Task:** Load `../../ds-practice/data/encrypted_messages.csv` into `df_crypto`.

In [None]:
# YOUR CODE HERE

### Problem 32
**Task:** Filter the dataframe to only show messages using the `Caesar` encryption type.

In [None]:
# YOUR CODE HERE

### Problem 33
**Task:** Write a function `char_frequency(text)` that returns a dictionary of the count of each character.

In [None]:
# YOUR CODE HERE

### Problem 34
**Task:** Run your frequency function on the first `ciphertext` message. Which character is most common?

In [None]:
# YOUR CODE HERE

### Problem 35
**Task:** In English, 'E' is the most common letter. If the most common character in your Caesar message is 'H', what is the likely shift? (Hint: distance from E to H).

In [None]:
# YOUR CODE HERE

### Problem 36
**Task:** Write a function `caesar_decrypt(text, shift)` that shifts letters back by the given amount.

In [None]:
# YOUR CODE HERE

### Problem 37
**Task:** Decrypt the first message using a shift of 3. Does it look like English?

In [None]:
# YOUR CODE HERE

### Problem 38
**Task:** Try decrypting the second message. Experiment with shifts until you find the right one.

In [None]:
# YOUR CODE HERE

### Problem 39
**Task:** Count the number of unique `encryption_type`s present in the dataset.

In [None]:
# YOUR CODE HERE

### Problem 40
**Task:** Calculate the average length (number of characters) of all ciphertexts.

In [None]:
# YOUR CODE HERE

---
## PHASE 4: Behavioral Analysis & Fingerprinting (10 Problems)
**Mission:** Identify automated "Bot" sessions based on UI behavior telemetry.

### Problem 41
**Task:** Load `../../ds-practice/data/fp26-features.csv` into `df_fingerprint`.

In [None]:
# YOUR CODE HERE

### Problem 42
**Task:** Treat `painting_id` as a `session_id`. How many unique sessions are in the data?

In [None]:
# YOUR CODE HERE

### Problem 43
**Task:** Group by `painting_id` and count how many "actions" (rows) were performed in each session.

In [None]:
# YOUR CODE HERE

### Problem 44
**Task:** Find the session with the highest number of unique `color` values used.

In [None]:
# YOUR CODE HERE

### Problem 45
**Task:** Calculate the variance of the `x` and `y` coordinates for the most active session. High variance = human movement; Zero variance = robotic.

In [None]:
# YOUR CODE HERE

### Problem 46
**Task:** Create a pivot table: Index=`painting_id`, Columns=`color`, Values=`height`, Aggfunc=`count`.

In [None]:
# YOUR CODE HERE

### Problem 47
**Task:** Filter for sessions that ONLY used the color `white`.

In [None]:
# YOUR CODE HERE

### Problem 48
**Task:** Identify sessions where the `width` and `height` are identical for every action (Perfectly square behavior).

In [None]:
# YOUR CODE HERE

### Problem 49
**Task:** Calculate the average `rgb` hex value length (should be 7 if valid #RRGGBB).

In [None]:
# YOUR CODE HERE

### Problem 50
**Task:** Final Analyst Summary. Print: "Analysis complete. Scanned [X] sessions. Found [Y] potential bots."

In [None]:
# YOUR CODE HERE