Skip to content

Using PyTorch and CNN to distinguish between benign and malware

Notifications You must be signed in to change notification settings

anthony-som/Malware-Detection-CNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Malware Detection CNN Model

Overview

This CNN (Convolutional Neural Network) Malware Detection Model is designed to classify images of malware using PyTorch. Can be run with either CPU or GPU. Achieved a tested accuracy of 98.2%

Understanding Malware and Benign Software

What is Malware?

Malware, short for malicious software, refers to any software intentionally designed to cause damage to a computer, server, client, or computer network. By executing malicious tasks, malware can compromise data, steal information, bypass access controls, and cause harm to the system or network. Common types of malware include viruses, worms, Trojan horses, ransomware, spyware, adware, and scareware.

What is Benign Software?

Benign software refers to any software that is not designed to harm or exploit a computer system or network. It is safe, non-malicious software that performs useful tasks and operates as intended without any hidden harmful functions. In the context of malware detection, "benign" is often used to describe software or files that are not harmful and do not contain any malicious code or intent.

Understanding Malware Texture

What is Malware Texture?

Malware texture refers to the unique visual patterns that emerge when malware binaries are represented as images. This concept is crucial in image-based malware detection, where machine learning models, such as convolutional neural networks, are trained to identify and differentiate between the visual patterns of malware and benign software.

When a malware binary is converted into a grayscale image, its code structure translates into a specific pattern or "texture" in the image. These patterns are not random; they are determined by the sequence of bytes and the binary's structure. The textures can vary significantly between different types of malware and benign files, providing a visual fingerprint that can be used for classification.

In the context of the CNN Malware Detection Model, the network learns to recognize and interpret these textures, enabling it to classify an unseen binary's image as either malware or benign based on the learned visual cues.

Environment Setup

Prerequisites

  • Python 3.9.0
  • CUDA 11.2 (gpu support)

Installation Instructions

To set up your environment to run this model, follow these steps:

  1. Ensure Python 3.9.0 and CUDA 11.2 are installed in your system/environment.
  2. Install the required Python packages by running:
pip install -r requirements.txt

Verify Installation

To check the installed packages, you can run:

pip list

Dataset Preparation

Since this program uses ImageFolder to load the dataset, the model expects the dataset to be organized in a specific format with separate directories for training and testing. Each class should have its own subdirectory within these main directories. The script applies necessary transformations to the images to prepare them for the model.

Model Architecture

The NeuralNet class defines the model architecture, which includes:

  • Two convolutional layers with ReLU activations and max pooling.
  • Two fully connected layers with dropout for regularization.
  • A final linear layer for classification.

Training and Validation

The script demonstrates how to split the dataset, train the model, and validate its performance. It includes functions for training (train_one_epoch) and validation (validate_one_epoch), along with logging the loss and accuracy metrics.

Training

To train the model, run the provided training function which iterates over the training dataset and updates the model's weights.

Validation

After each training epoch, the validation function assesses the model's performance on a separate validation dataset.

Execution

Execute the script sequentially to train and validate the model. Ensure that the CUDA device is available and selected for training to leverage GPU acceleration.

Conclusion

This README provides a comprehensive guide to set up, train, and validate the CNN Malware Detection Model. For optimal results, consider tuning the hyperparameters and modifying the model architecture based on your specific dataset and requirements.

About

Using PyTorch and CNN to distinguish between benign and malware

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published