This project focuses on malware analysis, specifically targeting malware spoofing and binary classification challenges. We implement the Kernel Constrained Subspace Method (KCSM) augmented with Random Fourier Features (RFF_CSM) for efficient and effective malware detection.
Full paper from here: https://ieeexplore.ieee.org/abstract/document/10215631.
- Enhanced malware detection using KCSM, capable of distinguishing between malware and benign files.
- Improved computational efficiency with the integration of RFF, reducing the complexity of kernel calculations.
- Suitable for large-scale and real-time malware detection systems.
This paper proposes a novel approach based on subspace representation for malware detection, an important task of distinguishing between safe and malware (malicious) file classes. Our solution is to utilize a target software's byte-level visualization (image pattern) and represent the two classes by low-dimensional subspaces respectively, in high-dimensional vector space. We use the kernel constrained subspace method (KCSM) as a classifier, which has shown excellent results in various pattern recognition tasks. However, its computational cost may be high due to the use of kernel trick, which makes it difficult to achieve real-time detection. To address this issue, we introduce Random Fourier Features (RFF), which we can handle directly like standard vectors, bypassing the kernel trick. This approach reduces execution time by around 99%, while retaining a high recognition rate. We conduct extensive experiments on several public malware datasets, and demonstrate superior results against several baselines and previous approaches.
The analysis is conducted using three primary malware datasets, BIG2015, Malimg and the Dumpware datasets. In addition, we collect a safe class comprising 2500 cleanly coded files from three distinct operating systems: Windows 10 Pro, 11 Home, and 11 Pro.
All datasets are preprocessed for compatibility with the KCSM and RFF_CSM framework.
BIG2015 dataset ==> 2015 Microsoft Malware Classification Challenge.
Malimg dataset ==> NA ('mat' are shared here).
Dumpware dataset ==> Dumpware dataset.
Safe dataset ==> Request needed from the author.
Methods | Datasets | Accuracy % | Computation time |
---|---|---|---|
BAT algorithm and CNN (Cui et al.) | Malimg | 94.5% | NA |
Inception Net (Khan et al.) | BIG2015+3000 safe | 74.5% | NA |
ResNet-152 (Khan et al.) | BIG2015+3000 safe | 88.36% | NA |
CSM (OURS) | BIG2015+2500 safe | 83.25% | 2.68 sec |
Malimg+2500 safe | 92.87% | 2.38 sec | |
dumpware+2500 safe | 99.06% | 2.58 sec | |
KCSM (OURS) | BIG2015+2500 safe | 92.89% | 4189.79 sec |
Malimg+2500 safe | 95.13% | 6228.29 sec | |
dumpware+2500 safe | 99.12% | 1034.78 sec | |
RFF_CSM (OURS) | BIG2015+2500 safe | 93.50% | 1.59 sec |
Malimg+2500 safe | 97.15% | 1.01 sec | |
dumpware+2500 safe | 99.26% | 0.78 sec |
To cite the paper, kindly use the following BibTex entry:
@article{djafer2023malware,
title={Malware detection using Kernel Constrained Subspace Method},
author={Djafer-Yahia-Messaoud, Benchadi and Bojan, Batalo and Kazuhiro, Fukui},
journal={IEICE Proceedings Series},
volume={78},
number={P2-22},
year={2023},
publisher={The Institute of Electronics, Information and Communication Engineers}
}
If you have any enquiries or questions, you can open up an Github's issue above or contact me personally on djafer@cvlab.cs.tsukuba.ac.jp
.