Skip to content

This repository contains the implementation of a novel approach to memory forensics using Memory Forensics Knowledge Graphs (MFKGs) and relational Memory Forensics Knowledge Graphs (rMFKGs)

License

Notifications You must be signed in to change notification settings

danjethh/LLM-Malware-detection-tool

Repository files navigation

Memory Forensics Knowledge Graph (MFKG) and Relational MFKG (rMFKG)

This repository contains the implementation of a novel approach to memory forensics using Memory Forensics Knowledge Graphs (MFKGs) and relational Memory Forensics Knowledge Graphs (rMFKGs) . The goal is to detect sophisticated malware by analyzing cross-process interactions and leveraging predefined relationships between forensic artifacts.

The project integrates Volatility for memory analysis and uses structured data storage (CSV/JSON) to build knowledge graphs. Additionally, an embedded Large Language Model (LLM) automates threat intelligence queries to contextualize suspicious activity.

Table of Contents

Introduction

Problem Statement

Solution Overview

Project Workflow

Installation and Setup

Code Implementation

Data Structure and Preprocessing

Building MFKG and rMFKG

Future Work

Contributing

License

Introduction Memory forensics is a critical technique for detecting malware that operates entirely in volatile memory, evading traditional file-based detection methods. However, tools like Volatility produce fragmented outputs that require manual correlation, making the process inefficient and error-prone.

This project introduces:

MFKG : A graph-based representation of forensic artifacts for individual processes.

rMFKG : A relational graph that consolidates cross-process relationships to uncover malicious activity.

LLM Integration : Automates threat intelligence queries to enhance detection.

Problem Statement

Modern malware increasingly leverages techniques such as:

Process Chains : Malware spawns multiple processes to obscure its activity.

In-Memory Execution : Malware avoids writing to disk, making it invisible to file-based scanners. Traditional memory forensics tools like Volatility are powerful but produce fragmented outputs that require significant manual effort to analyze.

This project addresses these challenges by: Structuring forensic artifacts into a unified graph format. Automating the detection of cross-process relationships. Enhancing analysis with LLM-generated insights.

Solution Overview The solution consists of three key components:

Artifact Extraction : Use Volatility plugins to extract forensic artifacts from a memory image.

Graph Construction : Build MFKG for each process. Consolidate MFKGs into rMFKG using predefined relationships (e.g., shared DLLs, parent-child relationships).

Threat Intelligence Automation : Use an LLM to generate context-aware queries for suspicious activity. Project Workflow

The workflow is divided into the following steps:

Step 1: Extract Artifacts Using Volatility

Use Volatility plugins to extract process attributes such as:

  1. Process ID, Parent Process ID, Command Line
  2. Loaded DLLs, Network Connections, File Handles
  3. Store the extracted data in a structured format (CSV or JSON).

Step 2: Preprocess Data

  1. Normalize timestamps and handle missing data.
  2. Deduplicate entries to ensure consistency.

Step 3: Build MFKG

  1. Represent each process as a directed graph:
  2. Nodes: Forensic artifacts (e.g., Process, DLL, Network Connection).
  3. Edges: Relationships (e.g., Spawn, Injection).

Step 4: Build rMFKG

  1. Identify cross-process relationships (e.g., shared DLLs, synchronized execution patterns).
  2. Merge individual MFKGs into a single relational graph.

Step 5: Automate Threat Intelligence

  1. Use an LLM to generate investigative queries based on detected relationships.
  2. Allow analysts to refine queries for deeper insights.

Installation and Setup

Prerequisites

  1. Python 3.x : Ensure Python is installed on your system.
  2. Volatility : Install Volatility for memory analysis.
  3. Google Colab : Optional, for running the code in a cloud environment.

Setup Instructions

Step 1: Install Dependencies Run the following commands to install required libraries:

pip install pandas re

Step 2: Install Volatility For local setup:

sudo apt-get update sudo apt-get install -y volatility

For Google Colab:

!apt-get update !apt-get install -y volatility

Step 3: Clone the Repository

git clone https://github.com/danjethh/LLM-Malware-detection-tool.git cd LLM-Malware-detection-tool

Code Implementation

Step 1: Define Volatility Plugins

We use Volatility plugins to extract forensic artifacts. Below is an example of how to define and parse plugin outputs: def extract_artifacts(memory_image, profile): processes = run_volatility("pslist", memory_image, profile) all_artifacts = []

for process in processes:
    pid = process["Process ID"]
    artifacts = {
        "Process ID": pid,
        "Parent Process ID": process["Parent Process ID"],
        "Process Name": process["Process Name"],
        "Command Line": process["Command Line"],
        "DLLs Loaded": [],
        "Network Connections": [],  # Placeholder for future plugins
        "File Handles": []         # Placeholder for future plugins
    }
    
    dlls = run_volatility("dlllist", memory_image, profile, pid)
    artifacts["DLLs Loaded"] = [dll["DLL"] for dll in dlls]
    all_artifacts.append(artifacts)

df = pd.DataFrame(all_artifacts)
df.to_csv("memory_artifacts.csv", index=False)
print("Artifacts saved to memory_artifacts.csv")

Step 2: Extract Artifacts

The extract_artifacts function runs Volatility plugins and stores the results in a CSV file:

def extract_artifacts(memory_image, profile): processes = run_volatility("pslist", memory_image, profile) all_artifacts = []

for process in processes:
    pid = process["Process ID"]
    artifacts = {
        "Process ID": pid,
        "Parent Process ID": process["Parent Process ID"],
        "Process Name": process["Process Name"],
        "Command Line": process["Command Line"],
        "DLLs Loaded": [],
        "Network Connections": [],  # Placeholder for future plugins
        "File Handles": []         # Placeholder for future plugins
    }
    
    dlls = run_volatility("dlllist", memory_image, profile, pid)
    artifacts["DLLs Loaded"] = [dll["DLL"] for dll in dlls]
    all_artifacts.append(artifacts)

df = pd.DataFrame(all_artifacts)
df.to_csv("memory_artifacts.csv", index=False)
print("Artifacts saved to memory_artifacts.csv")

Data Structure and Preprocessing

Recommended Data Structures

  1. CSV : Each row represents a process, with columns for attributes like Process ID, Parent Process ID, DLLs Loaded, etc.
  2. JSON : Hierarchical structure for nested relationships.

Preprocessing Steps

  1. Normalize Timestamps : Convert all timestamps to ISO 8601 format.
  2. Handle Missing Data : Replace missing values with placeholders.
  3. Deduplicate Entries : Remove duplicate artifacts.

Building MFKG and rMFKG

  1. MFKG : Construct a directed graph for each process using extracted artifacts.
  2. rMFKG : Identify cross-process relationships and merge MFKGs into a unified graph.

Future Work

  1. Extend the framework to analyze memory dumps from virtualized environments.
  2. Integrate additional Volatility plugins for richer artifact extraction.
  3. Explore graph databases (e.g., Neo4j) for storing and querying rMFKG.

Contributing

Contributions are welcome! To contribute:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/YourFeatureName).
  3. Commit your changes (git commit -m "Add YourFeatureName").
  4. Push to the branch (git push origin feature/YourFeatureName).
  5. Open a pull request.

About

This repository contains the implementation of a novel approach to memory forensics using Memory Forensics Knowledge Graphs (MFKGs) and relational Memory Forensics Knowledge Graphs (rMFKGs)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published