Skip to content

cridin1/malware-classification-CNN

Repository files navigation

Malware Classification using Convolutional Neural Networks (CNNs)

This GitHub repository contains an implementation of a malware classification system using Convolutional Neural Networks (CNNs). The goal of this project is to develop a model capable of accurately classifying different types of malware based on their input executable as an image.

First implementation malimg_classifier trained on 25 malware classes from Malimg dataset.

A second implementation combined_classifier includes in the dataset a benign class extracted from PE legitimate samples in DikeDataset.

The full explanation of the experiments can be found in presentation.pdf.

Introduction

Malware (malicious software) poses a significant threat to computer systems and networks worldwide. It is crucial to detect and classify malware accurately to prevent potential security breaches. This project focuses on leveraging the power of CNNs, a deep learning technique commonly used in computer vision tasks, to classify malware samples into different categories.

Dataset

The dataset Malimg used for this project contains labeled samples of different types of malware. Each sample is stored in a separate directory, with the directory name indicating the malware class.

A benign subset is stored in another folder which is uploaded in benign_data, while the Malimg dataset can be found here.

The dataset is organized in the following structure:

malimg_dataset/
├── class1/
│ ├── malware1.png
│ ├── malware2.png
│ ├── ...
├── class2/
│ ├── malware3.png
│ ├── malware4.png
│ ├── ...
├── ...
benign_data/
├── benign_imgs/
│ ├── sample1.png
│ ├── sample2.png
│ ├── ...

Dataset samples for each class

Image samples for each class

Benign data conversion

Data conversion

You can find the full code in utils/data_conversion.ipynb. Integrated from here and here.

Model Architecture

The CNN model architecture used in this project consists of several convolutional layers, followed by pooling layers and fully connected layers. The CNN workflow is the following:

CNN architecture

Final Training

Confusion matrix on combined classifier

Confusion matrix

Evaluation metrics on combined classifier

Overall precision recall f1-score support
accuracy 0.8666 0.8666 0.8666 0.8666
macro avg 0.81705 0.88241 0.83163 2054.0
weighted avg 0.86608 0.8666 0.85959 2054.0

Evaluation metrics for each class on combined classifier

class precision recall f1-score support
Adialer.C 0.96 1.0 0.97959 24.0
Agent.FYI 0.95833 1.0 0.97872 23.0
Allaple.A 0.99313 0.98132 0.98719 589.0
Allaple.L 1.0 0.99686 0.99843 318.0
Alueron.gen!J 0.975 1.0 0.98734 39.0
Autorun.K 0.11602 1.0 0.20792 21.0
Benign 0.98658 0.75 0.85217 196.0
C2LOP.P 0.39216 0.68966 0.5 29.0
C2LOP.gen!g 0.63158 0.9 0.74227 40.0
Dialplatform.B 1.0 0.97143 0.98551 35.0
Dontovo.A 0.94118 1.0 0.9697 32.0
Fakerean 0.98611 0.93421 0.95946 76.0
Instantaccess 0.97727 1.0 0.98851 86.0
Lolyda.AA1 0.93333 1.0 0.96552 42.0
Lolyda.AA2 0.91892 0.94444 0.93151 36.0
Lolyda.AA3 0.88462 0.95833 0.92 24.0
Lolyda.AT 0.9375 0.96774 0.95238 31.0
Malex.gen!J 0.96154 0.92593 0.9434 27.0
Obfuscator.AD 1.0 1.0 1.0 28.0
Rbot!gen 0.88571 1.0 0.93939 31.0
Skintrim.N 0.94118 1.0 0.9697 16.0
Swizzor.gen!E 0.60714 0.68 0.64151 25.0
Swizzor.gen!I 0.5 0.30769 0.38095 26.0
VB.AT 0.89888 0.98765 0.94118 81.0
Wintrim.BX 0.85714 0.94737 0.9 19.0
Yuner.A 0.0 0.0 0.0 160.0

References

Gibert, D., Mateu, C., Planes, J. et al. Using convolutional neural networks for classification of malware represented as images. Using convolutional neural networks for classification of malware represented as images

Daniel Gibert, Carles Mateu, Jordi Planes, Journal of Network and Computer Applications, The rise of machine learning for detection and classification of malware: Research developments, trends and challenges. The rise of machine learning for detection and ... – ScienceDirect.

Songqing Yue, Tianyang Wang, Imbalanced Malware Images Classification: a CNN based Approach. Imbalanced Malware Images Classification: a CNN based Approach. Imbalanced Malware Images Classification: a CNN based Approach

Nataraj, Lakshmanan & Karthikeyan, Shanmugavadivel & Jacob, Grégoire & Manjunath, B.. (2011). Malware Images: Visualization and Automatic Classification. 10.1145/2016904.2016908. Malware Images: Visualization and Automatic Classification – ResearchGate.

M. Kalash, M. Rochan, N. Mohammed, N. D. B. Bruce, Y. Wang and F. Iqbal, "Malware Classification with Deep Convolutional Neural Networks," 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS), Paris, France, 2018, pp. 1-5, doi: 10.1109/NTMS.2018.8328749. Malware Classification with Deep Convolutional Neural Networks | IEEE ...

Tuan, Anh Pham; Phuong, An Tran Hung; Thanh, Nguyen Vu; Van, Toan Nguyen (2018). Malware Detection PE-Based Analysis Using Deep Learning Algorithm Dataset. figshare. Dataset. Malware Detection PE-Based Analysis Using Deep Learning Algorithm Datasethttps://figshare.com/articles/dataset/Malware_Detection_PE-Based_Analysis_Using_Deep_Learning_Algorithm_Dataset/6635642/1

About

This GitHub repository contains an implementation of a malware classification/detection system using Convolutional Neural Networks (CNNs).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published