Windows malware detection based on dynamic behaviors(Similarity API calls) using Multilayer Perceptron (MLP)

Our project is about to train an MLP prediction model to detect windows malwares based on counts of API calls similarity.

Description

We have collected about 12000 windows executable files from different public sources during web scraping and combining various datasets. Their 79% are malwares.
Then, We implemented the Cuckoo sandbox locally to collect their dynamic behaviors report. After ordering, we selected the count of API calls as our feature vector to input the network. Based on the above decision, we created our CSV dataset and SQLite database. The databsed has 3 table:

"APIs" : List of APIs that has seen in the whole of our reports.(311 = feature vector)
"Reports" : List of reports with their md5 and VirusTotal rank ("positive" column).
"APIs_Reports" : A many-to-many relationship between the above two tables plus a column("repetition") that indicates for the given report and given API how many calls occurred.

The CSV dataset was created based on the above database. Column "OUTPUT" is our output(label) that it shows is given file is a malware or not. For files with equal or greater 10 rank in VirusTotal, we labeled 1, and for files with equal 0 rank in VirusTotal, we labeled 0.

Result of the trained model

Confusion matrix on unseen test data(20% of dataset):

Contributor

Mohsen Ebadpour
Bachelor of science in computer engineering from University of Mohaghegh Ardabili(UMA)
This project was part of my final project in BSc.

Reports

For reports of executable files that Cuckoo sandbox generated(78~93GB), please contact to mohsenebadpour@outlook.com

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Accuracy.png		Accuracy.png
Confusion matrix on unseen test data.png		Confusion matrix on unseen test data.png
Database-SQLite.db		Database-SQLite.db
LICENSE		LICENSE
Loss.png		Loss.png
README.md		README.md
Train info.png		Train info.png
dataset.csv		dataset.csv
dataset_x.csv		dataset_x.csv
dataset_y.csv		dataset_y.csv
load-model.py		load-model.py
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Windows malware detection based on dynamic behaviors(Similarity API calls) using Multilayer Perceptron (MLP)

Description

Result of the trained model

Contributor

Reports

About

Releases

Packages

Languages

License

MohsenEbadpour/Windows-malware-detection-based-on-dynamic-behaviors-APIs-call-using-Multilayer-Perceptron-MLP

Folders and files

Latest commit

History

Repository files navigation

Windows malware detection based on dynamic behaviors(Similarity API calls) using Multilayer Perceptron (MLP)

Description

Result of the trained model

Contributor

Reports

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages