# Project: Expert Cryptanalysis & Automated Decryption Engines

**Mission:** A threat actor is using multiple encryption layers. You have intercepted a batch of ciphertexts. Simple frequency analysis is a start, but you must now build an automated cryptanalysis engine that can identify the cipher type and break it without manual guessing.

**Objective:** Use the Index of Coincidence (IC) to identify cipher types, and implement an automated "English Scoring" system to crack rotations.

---

## PHASE 1: Forensic Data Ingestion (MySQL)
**Expert Task:** Establish a connection to the `security_lab` and retrieve all `ciphertext` entries. Perform a preliminary character count to determine if the data is structured or randomized.

In [None]:
import pandas as pd
import numpy as np
from sqlalchemy import create_engine
import matplotlib.pyplot as plt
import os
import string

engine = create_engine("mysql+pymysql://app_user:appuserpassword456@localhost/security_lab")
df = pd.read_sql("SELECT * FROM encrypted_messages", engine)
print(f"Loaded {len(df)} encrypted messages.")

## PHASE 2: Index of Coincidence (IC) Identification
**The Logic:** The Index of Coincidence measures how "random" a text is. 
- **English Plaintext:** ~0.0667
- **Caesar Cipher:** ~0.0667 (The distribution is just shifted)
- **Vigenere / Random:** ~0.0385 (The distribution is flattened)

**Expert Task:** Write a function to calculate the IC for each message. Use this to automatically flag which messages are Caesar ciphers and which require more complex (polyalphabetic) analysis.

In [None]:
def calculate_ic(text):
    text = ''.join(filter(str.isalpha, text.upper()))
    if len(text) <= 1: return 0
    N = len(text)
    counts = [text.count(char) for char in string.ascii_uppercase]
    ic = sum(n * (n - 1) for n in counts) / (N * (N - 1))
    return ic

df['IC'] = df['ciphertext'].apply(calculate_ic)
df[['message_id', 'IC', 'encryption_type']]

## PHASE 3: Automated English Scoring Engine
**The Logic:** We can "score" a piece of text by comparing its character frequencies to standard English (e.g., 'E' = 12.7%, 'T' = 9.1%). 

**Expert Task:** 
1. Define the standard English frequency map.
2. Write a function that scores a string based on how closely it matches that map.
3. Brute-force all 26 Caesar shifts and automatically pick the one with the best score.

In [None]:
ENGLISH_FREQ = {'A': 0.0817, 'B': 0.0150, 'C': 0.0278, 'D': 0.0425, 'E': 0.1270, 'F': 0.0223, 'G': 0.0202, 'H': 0.0609, 'I': 0.0697, 'J': 0.0015, 'K': 0.0077, 'L': 0.0403, 'M': 0.0241, 'N': 0.0675, 'O': 0.0751, 'P': 0.0193, 'Q': 0.0010, 'R': 0.0599, 'S': 0.0633, 'T': 0.0906, 'U': 0.0276, 'V': 0.0098, 'W': 0.0236, 'X': 0.0015, 'Y': 0.0197, 'Z': 0.0007}

def score_english(text):
    text = text.upper()
    score = 0
    for char in text:
        if char in ENGLISH_FREQ:
            score += ENGLISH_FREQ[char]
    return score / len(text) if len(text) > 0 else 0

def auto_crack_caesar(ciphertext):
    best_shift = 0
    max_score = -1
    # YOUR DECRYPTION LOGIC HERE
    # Loop 1-25, score each, return best.
    return "CRACKED_TEXT", best_shift

## PHASE 4: Pattern Analysis (Bigrams & Trigrams)
**The Logic:** Even if character frequency is balanced, common pairs (TH, HE, IN, ER) and triplets (THE, AND, ING) will reveal the plaintext structure.

**Expert Task:** Extract all bigrams (2-letter pairs) from a message and compare them to the top 10 most common English bigrams. Use this to verify your automated crack in Phase 3.

In [None]:
# YOUR CODE HERE

## PHASE 5: Decrypted Evidence Repository (MySQL)
**Expert Task:** Once the engine reaches a confidence score > 80%, automatically save the result to the `decrypted_evidence` table. Include the `confidence_score`, the `detected_shift`, and the `timestamp` of the forensic operation.

In [None]:
# YOUR CODE HERE