GitHub - danjethh/LLM-Malware-detection-tool: This repository contains the implementation of a novel approach to memory forensics using Memory Forensics Knowledge Graphs (MFKGs) and relational Memory Forensics Knowledge Graphs (rMFKGs)

Memory Forensics Knowledge Graph (MFKG) and Relational MFKG (rMFKG)

This repository contains the implementation of a novel approach to memory forensics using Memory Forensics Knowledge Graphs (MFKGs) and relational Memory Forensics Knowledge Graphs (rMFKGs) . The goal is to detect sophisticated malware by analyzing cross-process interactions and leveraging predefined relationships between forensic artifacts.

The project integrates Volatility for memory analysis and uses structured data storage (CSV/JSON) to build knowledge graphs. Additionally, an embedded Large Language Model (LLM) automates threat intelligence queries to contextualize suspicious activity.

Table of Contents

Introduction

Problem Statement

Solution Overview

Project Workflow

Installation and Setup

Code Implementation

Data Structure and Preprocessing

Building MFKG and rMFKG

Future Work

Contributing

License

Introduction Memory forensics is a critical technique for detecting malware that operates entirely in volatile memory, evading traditional file-based detection methods. However, tools like Volatility produce fragmented outputs that require manual correlation, making the process inefficient and error-prone.

This project introduces:

MFKG : A graph-based representation of forensic artifacts for individual processes.

rMFKG : A relational graph that consolidates cross-process relationships to uncover malicious activity.

LLM Integration : Automates threat intelligence queries to enhance detection.

Problem Statement

Modern malware increasingly leverages techniques such as:

Process Chains : Malware spawns multiple processes to obscure its activity.

In-Memory Execution : Malware avoids writing to disk, making it invisible to file-based scanners. Traditional memory forensics tools like Volatility are powerful but produce fragmented outputs that require significant manual effort to analyze.

This project addresses these challenges by: Structuring forensic artifacts into a unified graph format. Automating the detection of cross-process relationships. Enhancing analysis with LLM-generated insights.

Solution Overview The solution consists of three key components:

Artifact Extraction : Use Volatility plugins to extract forensic artifacts from a memory image.

Graph Construction : Build MFKG for each process. Consolidate MFKGs into rMFKG using predefined relationships (e.g., shared DLLs, parent-child relationships).

Threat Intelligence Automation : Use an LLM to generate context-aware queries for suspicious activity. Project Workflow

The workflow is divided into the following steps:

Step 1: Extract Artifacts Using Volatility

Use Volatility plugins to extract process attributes such as:

Process ID, Parent Process ID, Command Line
Loaded DLLs, Network Connections, File Handles
Store the extracted data in a structured format (CSV or JSON).

Step 2: Preprocess Data

Normalize timestamps and handle missing data.
Deduplicate entries to ensure consistency.

Step 3: Build MFKG

Represent each process as a directed graph:
Nodes: Forensic artifacts (e.g., Process, DLL, Network Connection).
Edges: Relationships (e.g., Spawn, Injection).

Step 4: Build rMFKG

Identify cross-process relationships (e.g., shared DLLs, synchronized execution patterns).
Merge individual MFKGs into a single relational graph.

Step 5: Automate Threat Intelligence

Use an LLM to generate investigative queries based on detected relationships.
Allow analysts to refine queries for deeper insights.

Installation and Setup

Prerequisites

Python 3.x : Ensure Python is installed on your system.
Volatility : Install Volatility for memory analysis.
Google Colab : Optional, for running the code in a cloud environment.

Setup Instructions

Step 1: Install Dependencies Run the following commands to install required libraries:

pip install pandas re

Step 2: Install Volatility For local setup:

sudo apt-get update sudo apt-get install -y volatility

For Google Colab:

!apt-get update !apt-get install -y volatility

Step 3: Clone the Repository

git clone https://github.com/danjethh/LLM-Malware-detection-tool.git cd LLM-Malware-detection-tool

Code Implementation

Step 1: Define Volatility Plugins

We use Volatility plugins to extract forensic artifacts. Below is an example of how to define and parse plugin outputs: def extract_artifacts(memory_image, profile): processes = run_volatility("pslist", memory_image, profile) all_artifacts = []

for process in processes:
    pid = process["Process ID"]
    artifacts = {
        "Process ID": pid,
        "Parent Process ID": process["Parent Process ID"],
        "Process Name": process["Process Name"],
        "Command Line": process["Command Line"],
        "DLLs Loaded": [],
        "Network Connections": [],  # Placeholder for future plugins
        "File Handles": []         # Placeholder for future plugins
    }
    
    dlls = run_volatility("dlllist", memory_image, profile, pid)
    artifacts["DLLs Loaded"] = [dll["DLL"] for dll in dlls]
    all_artifacts.append(artifacts)

df = pd.DataFrame(all_artifacts)
df.to_csv("memory_artifacts.csv", index=False)
print("Artifacts saved to memory_artifacts.csv")

Step 2: Extract Artifacts

The extract_artifacts function runs Volatility plugins and stores the results in a CSV file:

def extract_artifacts(memory_image, profile): processes = run_volatility("pslist", memory_image, profile) all_artifacts = []

for process in processes:
    pid = process["Process ID"]
    artifacts = {
        "Process ID": pid,
        "Parent Process ID": process["Parent Process ID"],
        "Process Name": process["Process Name"],
        "Command Line": process["Command Line"],
        "DLLs Loaded": [],
        "Network Connections": [],  # Placeholder for future plugins
        "File Handles": []         # Placeholder for future plugins
    }
    
    dlls = run_volatility("dlllist", memory_image, profile, pid)
    artifacts["DLLs Loaded"] = [dll["DLL"] for dll in dlls]
    all_artifacts.append(artifacts)

df = pd.DataFrame(all_artifacts)
df.to_csv("memory_artifacts.csv", index=False)
print("Artifacts saved to memory_artifacts.csv")

Data Structure and Preprocessing

Recommended Data Structures

CSV : Each row represents a process, with columns for attributes like Process ID, Parent Process ID, DLLs Loaded, etc.
JSON : Hierarchical structure for nested relationships.

Preprocessing Steps

Normalize Timestamps : Convert all timestamps to ISO 8601 format.
Handle Missing Data : Replace missing values with placeholders.
Deduplicate Entries : Remove duplicate artifacts.

Building MFKG and rMFKG

MFKG : Construct a directed graph for each process using extracted artifacts.
rMFKG : Identify cross-process relationships and merge MFKGs into a unified graph.

Future Work

Extend the framework to analyze memory dumps from virtualized environments.
Integrate additional Volatility plugins for richer artifact extraction.
Explore graph databases (e.g., Neo4j) for storing and querying rMFKG.

Contributing

Contributions are welcome! To contribute:

Fork the repository.
Create a new branch (git checkout -b feature/YourFeatureName).
Commit your changes (git commit -m "Add YourFeatureName").
Push to the branch (git push origin feature/YourFeatureName).
Open a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
LLM-malware-detection-tool.ipynb		LLM-malware-detection-tool.ipynb
LLM_malware_detection_tool_JSON.ipynb		LLM_malware_detection_tool_JSON.ipynb
LLM_malware_detection_tool_sample_table.csv		LLM_malware_detection_tool_sample_table.csv
README.md		README.md
instructions-to-use-code		instructions-to-use-code
sample_JSON_output		sample_JSON_output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

License

danjethh/LLM-Malware-detection-tool

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages