Malware Analysis (Binary Classification) using Kernel Constrained Subspace Method KCSM

Overview

This project focuses on malware analysis, specifically targeting malware spoofing and binary classification challenges. We implement the Kernel Constrained Subspace Method (KCSM) augmented with Random Fourier Features (RFF_CSM) for efficient and effective malware detection.

Full paper from here: https://ieeexplore.ieee.org/abstract/document/10215631.

Enhanced malware detection using KCSM, capable of distinguishing between malware and benign files.
Improved computational efficiency with the integration of RFF, reducing the complexity of kernel calculations.
Suitable for large-scale and real-time malware detection systems.

Summary

This paper proposes a novel approach based on subspace representation for malware detection, an important task of distinguishing between safe and malware (malicious) file classes. Our solution is to utilize a target software's byte-level visualization (image pattern) and represent the two classes by low-dimensional subspaces respectively, in high-dimensional vector space. We use the kernel constrained subspace method (KCSM) as a classifier, which has shown excellent results in various pattern recognition tasks. However, its computational cost may be high due to the use of kernel trick, which makes it difficult to achieve real-time detection. To address this issue, we introduce Random Fourier Features (RFF), which we can handle directly like standard vectors, bypassing the kernel trick. This approach reduces execution time by around 99%, while retaining a high recognition rate. We conduct extensive experiments on several public malware datasets, and demonstrate superior results against several baselines and previous approaches.

The analysis is conducted using three primary malware datasets, BIG2015, Malimg and the Dumpware datasets. In addition, we collect a safe class comprising 2500 cleanly coded files from three distinct operating systems: Windows 10 Pro, 11 Home, and 11 Pro.

All datasets are preprocessed for compatibility with the KCSM and RFF_CSM framework.

BIG2015 dataset ==> 2015 Microsoft Malware Classification Challenge.
Malimg dataset ==> NA ('mat' are shared here).
Dumpware dataset ==> Dumpware dataset.
Safe dataset ==> Request needed from the author.

Results

Methods	Datasets	Accuracy %	Computation time
BAT algorithm and CNN (Cui et al.)	Malimg	94.5%	NA
Inception Net (Khan et al.)	BIG2015+3000 safe	74.5%	NA
ResNet-152 (Khan et al.)	BIG2015+3000 safe	88.36%	NA
CSM (OURS)	BIG2015+2500 safe	83.25%	2.68 sec
	Malimg+2500 safe	92.87%	2.38 sec
	dumpware+2500 safe	99.06%	2.58 sec
KCSM (OURS)	BIG2015+2500 safe	92.89%	4189.79 sec
	Malimg+2500 safe	95.13%	6228.29 sec
	dumpware+2500 safe	99.12%	1034.78 sec
RFF_CSM (OURS)	BIG2015+2500 safe	93.50%	1.59 sec
	Malimg+2500 safe	97.15%	1.01 sec
	dumpware+2500 safe	99.26%	0.78 sec

Citing

To cite the paper, kindly use the following BibTex entry:

@article{djafer2023malware,
  title={Malware detection using Kernel Constrained Subspace Method},
  author={Djafer-Yahia-Messaoud, Benchadi and Bojan, Batalo and Kazuhiro, Fukui},
  journal={IEICE Proceedings Series},
  volume={78},
  number={P2-22},
  year={2023},
  publisher={The Institute of Electronics, Information and Communication Engineers}
}

Contact

If you have any enquiries or questions, you can open up an Github's issue above or contact me personally on djafer@cvlab.cs.tsukuba.ac.jp.

Name	Name	Last commit message	Last commit date
Latest commit Djaferbenchadi Update Jan 3, 2024 af51d0e · Jan 3, 2024 History 21 Commits
data	data	Delete data/Dumpware_for_binary_classification/dif_as	Jan 2, 2024
Malware_analysis_binary_CSM.py	Malware_analysis_binary_CSM.py	Add files via upload	Jan 2, 2024
Malware_analysis_binary_KCSM.py	Malware_analysis_binary_KCSM.py	Add files via upload	Jan 2, 2024
Malware_analysis_binary_RFF_CSM.py	Malware_analysis_binary_RFF_CSM.py	Add files via upload	Jan 2, 2024
README.md	README.md	Update	Jan 3, 2024
rff_csm_n.png	rff_csm_n.png	Add files via upload	Jan 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Malware Analysis (Binary Classification) using Kernel Constrained Subspace Method KCSM

Overview

Summary

Results

Citing

Contact

About

Releases

Packages

Languages

Djaferbenchadi/Malware_analysis_binary

Folders and files

Latest commit

History

Repository files navigation

Malware Analysis (Binary Classification) using Kernel Constrained Subspace Method KCSM

Overview

Summary

Results

Citing

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages