# Automated Threat Detection Using MITRE ATT&CK

## Table of Contents

1. [Project Overview](#1-project-overview)
2. [Prerequisites](#2-prerequisites)
    1. [Python Setup](#21-python-setup)
    2. [Required Libraries](#22-required-libraries)
    3. [Setting Up MITRE ATT&CK Python Module](#23-setting-up-mitre-attack-python-module)
3. [MITRE ATT&CK Framework Overview](#3-mitre-attack-framework-overview)
4. [Fetching Logs and Parsing](#4-fetching-logs-and-parsing)
    1. [Collecting Logs from SIEM](#41-collecting-logs-from-siem)
    2. [Parsing Log Data](#42-parsing-log-data)
5. [Mapping Log Data to MITRE ATT&CK Techniques](#5-mapping-log-data-to-mitre-attack-techniques)
    1. [Data Processing with Pandas](#51-data-processing-with-pandas)
    2. [Mapping Techniques using MITRE ATT&CK](#52-mapping-techniques-using-mitre-attack)
6. [Visualization of Techniques](#6-visualization-of-techniques)
    1. [Creating a Heatmap](#61-creating-a-heatmap)
    2. [Displaying the ATT&CK Matrix](#62-displaying-the-attack-matrix)
7. [Generating Reports](#7-generating-reports)
    1. [Exporting Data as PDF/HTML](#71-exporting-data-as-pdf-html)
    2. [Summary Report of Detected Techniques](#72-summary-report-of-detected-techniques)
8. [Conclusion](#8-conclusion)


##### 1. Project Overview
In this project, we will create a Python-based tool that automates the process of threat detection by mapping log data from security systems to the MITRE ATT&CK framework. This will help identify potential adversary tactics, techniques, and procedures (TTPs) based on log data, providing a better understanding of the threats faced by your organization. We'll also visualize the detected techniques using heatmaps and ATT&CK matrices.

Key Objectives:

- Collect log data from a SIEM system or raw log files.
- Parse and process this log data using Python.
- Map the logs to MITRE ATT&CK techniques using the mitreattack-python library.
- Visualize the results through heatmaps and ATT&CK matrix visualizations.
- Generate reports with a summary of the detected techniques and visual outputs.

---

##### 2. Prerequisites
2.1 Python Setup
To ensure the environment is ready for the project, you'll need the following:

1. Python 3.x installed (you can check the current version via `python -version`).
   
2. Jupyter Notebook for running the `.ipynb` file.
- If you haven't installed Jupyter yet, you can do so with:

In [None]:
pip install notebook

2.2 Required Libraries
For this project, we will be using the following Python libraries:

- `pandas`: For parsing and analyzing log data.
  
- `mitreattack-python`: To interact with the MITRE ATT&CK framework.
  
- `matplotlib`: For creating visualizations.
  
- `seaborn`: To enhance heatmap visualizations.

You can install all the required libraries with the following command:

In [None]:
pip install pandas pyattck matplotlib seaborn fpdf

2.3 Setting Up MITRE ATT&CK Python Module

The `mitreattack-python` library allows us to interact with the MITRE ATT&CK framework programmatically. It helps fetch tactics and techniques based on the ATT&CK IDs that we'll map from our logs.

After installing the library (from the command above), we will import and initialize the ATT&CK matrix in the next steps.

---

##### Python Setup in Jupyter
Let’s set up the Jupyter notebook environment. Here's the first cell to run in your notebook, which installs all necessary libraries:

In [None]:
# Install required libraries
!pip install pandas mitreattack-python matplotlib seaborn


After the installation is complete, we can verify by importing the libraries:

In [None]:

# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from mitreattack import Attack

# Initialize the MITRE ATT&CK framework
attack = Attack()

# Check that the MITRE ATT&CK framework is initialized successfully
print(f"ATT&CK Version: {attack.version}")

This step initializes the mitreattack-python module and verifies that it’s connected to the correct ATT&CK dataset.

---

#### 3. MITRE ATT&CK Framework Overview

The MITRE ATT&CK Framework is a comprehensive knowledge base of adversary tactics and techniques observed in the wild. Each entry in the framework describes how attackers operate at different stages of the attack lifecycle, helping defenders detect and respond to these threats.

The framework is divided into:

- Tactics: The high-level goals adversaries are trying to achieve (e.g., Persistence, Lateral Movement).
- echniques: Specific methods adversaries use to achieve these goals (e.g., Credential Dumping, Remote File Copy).


----

##### Option 1: Using `pyattck` (Live Data)
Here’s how to interact with the MITRE ATT&CK framework using the pyattck library for live data:

In [None]:
from pyattck import Attck

# Initialize the MITRE ATT&CK framework
attack = Attck()

# Retrieve and display all tactics from MITRE ATT&CK (Enterprise Matrix)
tactics = attack.enterprise.tactics

print("List of Tactics:")
for tactic in tactics:
    print(f"- {tactic.name} (ID: {tactic.id})")

# Example: Retrieve and display techniques under a specific tactic (e.g., Execution)
execution_tactic = next((t for t in tactics if t.name.lower() == "execution"), None)

if execution_tactic:
    print("\nTechniques under the 'Execution' Tactic:")
    for technique in execution_tactic.techniques:
        print(f"- {technique.name} (ID: {technique.id})")
else:
    print("Execution tactic not found.")


##### Option 2: Using Locally Stored JSON Data (Offline Access)
If you're working offline or with custom datasets, you can load MITRE ATT&CK JSON data directly:

In [None]:
import json

# Load the locally downloaded MITRE ATT&CK JSON data
with open('C:/path/to/enterprise-attack.json', 'r') as f:
    attack_data = json.load(f)

# Retrieve and display all tactics
tactics = [item for item in attack_data['objects'] if item['type'] == 'x-mitre-tactic']

print("List of Tactics:")
for tactic in tactics:
    print(f"- {tactic['name']} (ID: {tactic['id']})")

# Example: Retrieve and display techniques under the 'Execution' tactic
execution_tactic = next((t for t in tactics if t['name'].lower() == "execution"), None)

if execution_tactic:
    execution_techniques = [item for item in attack_data['objects'] 
                            if item['type'] == 'attack-pattern' 
                            and 'kill_chain_phases' in item
                            and any(phase['phase_name'] == 'execution' for phase in item['kill_chain_phases'])]

    print("\nTechniques under the 'Execution' Tactic:")
    for technique in execution_techniques:
        print(f"- {technique['name']} (ID: {technique['id']})")
else:
    print("Execution tactic not found.")

---

##### 4. Fetching Logs and Parsing

4.1 Collecting Logs from SIEM

For this project, we assume logs are collected from a SIEM (Security Information and Event Management) system, such as Splunk, ELK Stack, or you can use a simple log file with predefined format (e.g., JSON, CSV).

To keep things simple for now, let's use a sample log format like this:

In [None]:
[
  {
    "timestamp": "2024-10-05 12:00:00",
    "source_ip": "192.168.1.10",
    "destination_ip": "192.168.1.50",
    "event_type": "Process Creation",
    "process_name": "cmd.exe",
    "mitre_attack_id": "T1059"
  },
  {
    "timestamp": "2024-10-05 12:05:00",
    "source_ip": "192.168.1.11",
    "destination_ip": "192.168.1.51",
    "event_type": "File Access",
    "file_name": "malware.exe",
    "mitre_attack_id": "T1105"
  }
]

This log format contains:

- `timestamp`: The date and time of the event.
- `source_ip` / destination_ip: The involved network addresses.
- `event_type`: The type of event (e.g., process creation, file access).
- `process_name` / file_name: The names of processes or files involved.
- `mitre_attack_id`: A mapped MITRE ATT&CK technique ID (e.g., T1059 for Command and Scripting Interpreter).
  
4.2 Parsing Log Data

Let’s parse this log data into a DataFrame using pandas for further analysis. Here's the code to load the log data:

4.2 Parsing Log Data

Let’s parse this log data into a DataFrame using pandas for further analysis. Here's the code to load the log data:

In [None]:
import json
import pandas as pd

# Sample log data in JSON format
log_data = '''
[
  {
    "timestamp": "2024-10-05 12:00:00",
    "source_ip": "192.168.1.10",
    "destination_ip": "192.168.1.50",
    "event_type": "Process Creation",
    "process_name": "cmd.exe",
    "mitre_attack_id": "T1059"
  },
  {
    "timestamp": "2024-10-05 12:05:00",
    "source_ip": "192.168.1.11",
    "destination_ip": "192.168.1.51",
    "event_type": "File Access",
    "file_name": "malware.exe",
    "mitre_attack_id": "T1105"
  }
]
'''

# Load the log data into a pandas DataFrame
log_df = pd.read_json(log_data)

# Display the log DataFrame
log_df.head()

This code will:

1.  the JSON log data into a pandas DataFrame.
2.  Display the top rows of the DataFrame to verify the logs were parsed correctly.

---

##### 5. Mapping Log Data to MITRE ATT&CK Techniques

In this step, we will:

- Match the mitre_attack_id in our logs with the corresponding MITRE ATT&CK techniques.
  
- Fetch details such as the name of the technique, the tactic it belongs to, and any relevant descriptions.
  
5.1 Data Processing with Pandas
We already have the logs loaded into a pandas DataFrame. Let's extend it by retrieving more information about each mitre_attack_id.

Here’s the code to map each mitre_attack_id in the logs to its respective MITRE ATT&CK technique:

In [None]:
# Function to fetch technique details from MITRE ATT&CK by technique ID
def get_attack_technique_details(attack_id):
    try:
        technique = next(t for t in attack.enterprise.techniques if t.id == attack_id)
        return {
            "technique_name": technique.name,
            "tactic_name": technique.tactics[0].name if technique.tactics else "Unknown",
            "description": technique.description
        }
    except Exception as e:
        return {"technique_name": "Unknown", "tactic_name": "Unknown", "description": "No details found"}

# Map each MITRE ATT&CK ID in the log data to the corresponding technique details
log_df["technique_details"] = log_df["mitre_attack_id"].apply(get_attack_technique_details)

# Expand the technique details into separate columns
technique_df = pd.json_normalize(log_df["technique_details"])

# Combine the original log DataFrame with the technique details
log_df = pd.concat([log_df, technique_df], axis=1)

# Drop the 'technique_details' column as it's no longer needed
log_df = log_df.drop(columns=["technique_details"])

# Display the updated DataFrame with technique details
log_df.head()

Explanation:

1. `get_attack_technique_details(attack_id)`: This function takes an attack technique ID (e.g.,`T1059`) and uses `mitreattack-python` to fetch details about the technique, such as its name, associated tactic, and description.
   
2. Mapping: The `apply()` function maps each mitre_attack_id in the DataFrame to its corresponding technique details.
   
3. Expanding the DataFrame: We use `pd.json_normalize()` to expand the nested dictionary of technique details into separate columns (`technique_name`, `tactic_name`, `description`).
   
4. Final DataFrame: We concatenate the original logs with the technique details and clean up unnecessary columns.
At this point, the log data should include additional details about each MITRE ATT&CK technique. The output DataFrame will look something like this:

At this point, the log data should include additional details about each MITRE ATT&CK technique. The output DataFrame will look something like this:

| timestamp           | source_ip    | destination_ip | event_type      | mitre_attack_id | technique_name               | tactic_name         |
|---------------------|--------------|----------------|-----------------|-----------------|------------------------------|---------------------|
| 2024-10-05 12:00:00 | 192.168.1.10 | 192.168.1.50   | Process Creation | T1059           | Command and Scripting ...     | Execution           |
| 2024-10-05 12:05:00 | 192.168.1.11 | 192.168.1.51   | File Access      | T1105           | Ingress Tool Transfer         | Command and Control  |



---

##### 6. Visualization of Techniques

Now that we have the MITRE ATT&CK techniques mapped to the log data, let’s move forward with visualizing the techniques using a heatmap and ATT&CK matrix.

6.1 Creating a Heatmap

We will create a heatmap that shows the frequency of MITRE ATT&CK techniques detected in the logs, grouped by tactic.

Here’s the code to generate the heatmap:

In [None]:
# Group the log data by 'tactic_name' and 'technique_name' to count occurrences
technique_counts = log_df.groupby(['tactic_name', 'technique_name']).size().unstack(fill_value=0)

# Plot the heatmap using seaborn
plt.figure(figsize=(12, 8))
sns.heatmap(technique_counts, annot=True, fmt="d", cmap="Blues", cbar=True)

# Set plot labels
plt.title('Heatmap of Detected MITRE ATT&CK Techniques by Tactic')
plt.xlabel('Technique Name')
plt.ylabel('Tactic Name')
plt.xticks(rotation=90)
plt.tight_layout()

# Show the heatmap
plt.show()


##### Explanation:

1. Grouping the Data: We group the log data by tactic_name and technique_name to count how many times each technique was detected.
   
   
2. Heatmap: We use seaborn to create a heatmap that visualizes the frequency of each technique under its corresponding tactic.
   
   
3. Visualization: The heatmap will display the counts of each technique as color intensities, with annotations showing the actual values.

---

6.2 Displaying the ATT&CK Matrix

An ATT&CK matrix visualization shows which tactics and techniques have been detected. We’ll keep this visualization simple by highlighting the techniques present in our logs.

Here’s the code to create a simplified ATT&CK matrix:

In [None]:
# Generate a simplified ATT&CK matrix based on detected tactics and techniques
tactic_technique_matrix = log_df.pivot_table(index='tactic_name', columns='technique_name', aggfunc='size', fill_value=0)

# Plot the ATT&CK matrix
plt.figure(figsize=(14, 10))
sns.heatmap(tactic_technique_matrix, annot=True, fmt="d", cmap="Greens", cbar=True)

# Set plot labels
plt.title('ATT&CK Matrix - Detected Techniques')
plt.xlabel('Technique')
plt.ylabel('Tactic')
plt.xticks(rotation=90)
plt.tight_layout()

# Show the ATT&CK matrix
plt.show()


This visualization shows a matrix of tactics (rows) and techniques (columns). The values indicate the number of occurrences of each technique, allowing you to see which parts of the ATT&CK matrix are being actively targeted in your logs.



---

##### 7. Generating Reports

7.1 Exporting Data as PDF/HTML

We can use libraries like matplotlib for visualizations and fpdf or HTML templates for generating and exporting reports. Let's create a summary report that includes:

- The original log data.
  
- The MITRE ATT&CK techniques mapped from the logs.
  
- The visualizations (heatmap and ATT&CK matrix).
  
First, install the fpdf library to help generate PDF reports:

In [None]:
pip install fpdf


7.1.1 Generating a PDF Report

Here’s the Python code to create and export a PDF report summarizing the detected techniques:

In [None]:
from fpdf import FPDF

# Initialize PDF
pdf = FPDF()
pdf.set_auto_page_break(auto=True, margin=15)
pdf.add_page()

# Title
pdf.set_font('Arial', 'B', 16)
pdf.cell(200, 10, txt="MITRE ATT&CK Threat Detection Report", ln=True, align='C')

# Subtitle
pdf.set_font('Arial', 'B', 12)
pdf.cell(200, 10, txt="Summary of Detected Techniques", ln=True, align='C')

# Add a table of log data with technique details
pdf.set_font('Arial', '', 10)
pdf.ln(10)
pdf.cell(200, 10, txt="Log Data and Mapped Techniques:", ln=True)

# Create table with the log data
for index, row in log_df.iterrows():
    text = f"Event: {row['event_type']}, Source IP: {row['source_ip']}, Destination IP: {row['destination_ip']}, Technique: {row['technique_name']} ({row['mitre_attack_id']})"
    pdf.multi_cell(0, 10, txt=text)

# Add the heatmap (we'll save the heatmap as an image first)
plt.figure(figsize=(12, 8))
sns.heatmap(technique_counts, annot=True, fmt="d", cmap="Blues", cbar=True)
plt.title('Heatmap of Detected MITRE ATT&CK Techniques by Tactic')
plt.xlabel('Technique Name')
plt.ylabel('Tactic Name')
plt.xticks(rotation=90)
plt.tight_layout()

# Save heatmap as an image
heatmap_image = 'heatmap.png'
plt.savefig(heatmap_image)
plt.close()

# Add the heatmap image to the PDF
pdf.image(heatmap_image, x=10, y=None, w=180)

# Add the ATT&CK matrix visualization
plt.figure(figsize=(14, 10))
sns.heatmap(tactic_technique_matrix, annot=True, fmt="d", cmap="Greens", cbar=True)
plt.title('ATT&CK Matrix - Detected Techniques')
plt.xlabel('Technique')
plt.ylabel('Tactic')
plt.xticks(rotation=90)
plt.tight_layout()

# Save the matrix as an image
matrix_image = 'attack_matrix.png'
plt.savefig(matrix_image)
plt.close()

# Add the matrix image to the PDF
pdf.add_page()
pdf.image(matrix_image, x=10, y=None, w=180)

# Save the final PDF
report_filename = "mitre_attack_threat_detection_report.pdf"
pdf.output(report_filename)

print(f"Report successfully generated: {report_filename}")

Explanation:

1. fpdf initialization: We initialize an FPDF object to create the report.
   
2. Log Data Table: We loop through the DataFrame (log_df) and add rows of the log data along with mapped techniques.
   
3. Heatmap and Matrix: We generate the heatmap and ATT&CK matrix, save them as images, and embed them into the PDF report.
   
4. Export to PDF: The final report is saved as mitre_attack_threat_detection_report.pdf.

---

7.1.2 Generating an HTML Report

If you'd prefer an HTML report, here’s an example of generating one using HTML templates:

In [None]:
# Create a simple HTML template for the report
html_template = '''
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>MITRE ATT&CK Threat Detection Report</title>
    <style>
        body {{ font-family: Arial, sans-serif; }}
        h1, h2 {{ text-align: center; }}
        table {{ width: 100%; border-collapse: collapse; margin: 20px 0; }}
        table, th, td {{ border: 1px solid black; padding: 10px; }}
        th {{ background-color: #f2f2f2; }}
    </style>
</head>
<body>
    <h1>MITRE ATT&CK Threat Detection Report</h1>
    <h2>Summary of Detected Techniques</h2>

    <h3>Log Data and Mapped Techniques:</h3>
    <table>
        <tr>
            <th>Timestamp</th>
            <th>Source IP</th>
            <th>Destination IP</th>
            <th>Event Type</th>
            <th>Technique Name</th>
            <th>Technique ID</th>
        </tr>
        {log_rows}
    </table>

    <h3>Heatmap of Detected Techniques:</h3>
    <img src="heatmap.png" alt="Heatmap" width="700px">

    <h3>ATT&CK Matrix - Detected Techniques:</h3>
    <img src="attack_matrix.png" alt="ATT&CK Matrix" width="700px">

</body>
</html>
'''

# Generate HTML rows for the log data
log_rows = ""
for index, row in log_df.iterrows():
    log_rows += f"<tr><td>{row['timestamp']}</td><td>{row['source_ip']}</td><td>{row['destination_ip']}</td><td>{row['event_type']}</td><td>{row['technique_name']}</td><td>{row['mitre_attack_id']}</td></tr>"

# Insert the log rows into the HTML template
html_report = html_template.format(log_rows=log_rows)

# Save the HTML report
report_filename_html = "mitre_attack_threat_detection_report.html"
with open(report_filename_html, 'w') as file:
    file.write(html_report)

print(f"Some formatted string here")



---

##### 7.2 Summary Report of Detected Techniques

Both the PDF and HTML reports contain a detailed summary of:

- Log events: Including timestamp, source_ip, destination_ip, and event type.
  
- Mapped MITRE ATT&CK techniques: Each log is mapped to its corresponding technique name and ID.
  
- Visuals: Heatmap and ATT&CK matrix visualizations.


---

##### Conclusion

We’ve now successfully built a tool to:

- Collect log data.
  
- Map it to MITRE ATT&CK techniques.
  
- Visualize the results.
  
- Generate reports summarizing the findings.

----

**Author**: Rich Van Buren  
**Date**: October 2024  
**Project**: Automated Threat Detection Using MITRE ATT&CK  

**Contact**:  
- **Email**: [rvanburen.tech@gmail.com](mailto:rvanburen.tech@gmail.com)  
- **GitHub**: [github.com/Ulfvaldr](https://github.com/Ulfvaldr)  
- **LinkedIn**: [linkedin.com/in/rich-van-buren-4762955](https://www.linkedin.com/in/rich-van-buren-4762955)

