# Project: Expert Digital Forensics & Statistical Steganalysis

**Mission:** A "carrier" image has been recovered from a suspect's machine. Simple extraction might not be enough if the attacker used randomized LSB or encryption. Your mission is to perform both targeted extraction and statistical steganalysis to prove the existence of hidden data.

**Objective:** Use pixel-level statistical analysis (Chi-Square Test) to detect anomalies in the LSB distribution and perform bit-plane visualization.

---

## PHASE 1: Forensic Environment & Pixel Ingestion
**Expert Task:** Establish a MySQL connection and load the target image. Verify the pixel dimensions and color depth.

In [None]:
from PIL import Image
import numpy as np
import pandas as pd
from sqlalchemy import create_engine
import matplotlib.pyplot as plt
import scipy.stats as stats
import os

engine = create_engine("mysql+pymysql://app_user:appuserpassword456@localhost/security_lab")
img = Image.open("../data/suspicious_image.png")
img_array = np.array(img)

print(f"Resolution: {img.size}")
print(f"Channels: {img.mode}")

## PHASE 2: Bit-Plane Visualization (Anomaly Hunting)
**The Logic:** If we look only at the Least Significant Bit (LSB) of an image, it should look like random noise. If we see structured shapes, text, or patterns in the LSB plane, it is 100% evidence of steganography.

**Expert Task:** Extract the LSB of the Red channel and visualize it as a black-and-white image. Scan for any geometric or high-contrast patterns.

In [None]:
red_channel = img_array[:,:,0]
lsb_plane = red_channel & 1

plt.figure(figsize=(10,10))
plt.imshow(lsb_plane, cmap='gray')
plt.title("Red Channel LSB Plane Visualization")
plt.axis('off')
plt.show()

## PHASE 3: Statistical Steganalysis (Chi-Square Test)
**The Logic:** In a natural image, adjacent pixels have similar values. LSB steganography disrupts the frequency of "Pairs of Values" (POVs). If we count how many times pixel values $2i$ and $2i+1$ occur, they should be very similar. If they are not, the image has been modified.

**Expert Task:** Implement the Chi-Square test for the Red channel. Calculate the probability $p$ that the image contains hidden data.

In [None]:
def chi_square_steg(channel_data):
    # Calculate expected frequencies of Pairs of Values (POVs)
    counts = np.bincount(channel_data.flatten(), minlength=256)
    observed = counts[::2] # Evens
    expected = (counts[::2] + counts[1::2]) / 2 # Averages of Pairs
    
    # Filter out zeros to avoid division error
    mask = expected > 0
    chi_stat, p_value = stats.chisquare(observed[mask], expected[mask])
    return p_value

p_val = chi_square_steg(red_channel)
print(f"Chi-Square Probability of HIDDEN DATA: {1 - p_val:.4f}")

## PHASE 4: Targeted Forensic Extraction
**Expert Task:** Extract the first 1000 bits. Check for common file headers (e.g., `JFIF`, `PNG`, `PK`). If the hidden data is an encrypted blob, use the entropy of the extracted bits to confirm the encryption strength.

In [None]:
# YOUR CODE HERE

## PHASE 5: Chain of Custody Report (MySQL)
**Expert Task:** Log your forensic findings to the `forensic_results` table. Include the Chi-Square result, the extracted message snippet, and a timestamp of the analysis.

In [None]:
# YOUR CODE HERE