Skip to content

A deep learning project which uses a method that converts malware .bytes files into gray-scale images and uses a CNN deep learning model to classify the converted malware image and identify the malware family it belongs to.

License

Notifications You must be signed in to change notification settings

TanayBhadula/malware-image-detection

Repository files navigation

🖥️ Image-based Malware Classification using CNN

Introduction

Analyzing a huge amount of malware is a major burden for security analysts.Malware developers have been highly successful in evading signature-based detection techniques. Most of the prevailing static analysis techniques involve a tool to parse the executable, and extract features or signatures. Most of the dynamic analysis techniques involve the binary file to be run in a sand-boxed environment to examine its behaviour. This can be easily thwarted by hiding the malicious activities of the file if it is being run inside a virtual environment. Hence, there has been a need to explore new approaches to overcome the limitations of static or dynamic analysis such as time intensity, resource consumption, scalability.

We propose a method for visualizing and classifying malware using image processing techniques. Malware binaries are visualized as gray-scale images, with the observation that for many malware families, the images belonging to the same family appear very similar in layout and texture. By converting the executable into an image representation, we have made our analysis process free from the problems faced by standard static and dynamic analyses

Dataset Used

For the training and evaluation of our proposed model we have used the Malimg Dataset. The Malimg Dataset contains 9349 malware images, belonging to 25 families/classes. Thus, our goal is to perform a multi-class classification of malware.

Link - https://drive.google.com/drive/folders/1CnFx26NfWfQchIU85wRNfHjqfk7Up6hl?usp=sharing

A Malware can belong to one of the following class :

  • Adialer.C
  • Agent.FYI
  • Allaple.A
  • Allaple.L
  • Alueron.gen!J
  • Autorun.K
  • C2LOP.P
  • C2LOP.gen!g
  • Dialplatform.B
  • Dontovo.A
  • Fakerean
  • Instantaccess
  • Lolyda.AA1
  • Lolyda.AA2
  • Lolyda.AA3
  • Lolyda.AT
  • Malex.gen!J
  • Obfuscator.AD
  • Rbot!gen
  • Skintrim.N
  • Swizzor.gen!E
  • Swizzor.gen!I
  • VB.AT
  • Wintrim.BX
  • Yuner.A

Converting malware binaries to gray-scale images

To convert the binary files into gray scale images we make use of the hexadecimal representation of the file's binary content and convert those files into PNG images. For example the resulting image after converting the 0ACDbR5M3ZhBJajygTuf.bytes binary file into a PNG.

binary to gray scale

CNN Model Architecture

CNN model includes following layers to make it perform feature and pattern extractions from images and help classify the malware family.

  • Convolutional Layer : 30 filters, (3 * 3) kernel size
  • Max Pooling Layer : (2 * 2) pool size
  • Convolutional Layer : 15 filters, (3 * 3) kernel size
  • Max Pooling Layer : (2 * 2) pool size
  • DropOut Layer : Dropping 25% of neurons.
  • Flatten Layer
  • Dense/Fully Connected Layer : 128 neurons, Relu activation function
  • DropOut Layer : Dropping 50% of neurons.
  • Dense/Fully Connected Layer : 50 neurons, Softmax activation function
  • Dense/Fully Connected Layer : num_class neurons, Softmax activation function

The input has a shape of [64 * 64 * 3] : [width * height * depth]. In our case, each Malware is a RGB image.

Block Diagram

Block Diagram

Future Work

  • Future work will be focused on conducting results using more advanced CNN models like Inception V3, VGG16-Net, ResNet50, CNN-SVM, MLP-SVM ,GRU-SVM etc.
  • We also want to convert malware images into color RGB images before classification to enhance the accuracy and precision.
  • We also want to implement a web based or GUI based tool to convert malware binary files into images and then classifying them.

About

A deep learning project which uses a method that converts malware .bytes files into gray-scale images and uses a CNN deep learning model to classify the converted malware image and identify the malware family it belongs to.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published