Graphene

Graph-based malware detection using machine learning.

Table of Contents
Installation
Usage

Installation

This program uses pip to manage all module dependencies. The easiest way to get started is to initialize a new virtual environment and install all packages using the requirements.txt file.

Windows

$ py -m venv venv
$ ./venv/bin/activate
$ pip install -r requirements.txt

Linux

$ python3 -m venv venv
$ source ./venv/bin/activate
$ pip3 install -r requirements.txt

Please note that your system may vary slightly in the installation process.

Usage

Very generally, Graphene has two modes: feature extraction and model training. The biggest difference in running Graphene will come from what features are extracated and what model architecture is trained.

Most capabilities of Graphene can be accessed through the Graphene.py Python script.

Feature Extraction

Generating a dataset from executables can be done by passing generate as the mode of operation.

$ py src/Graphene.py --mode generate

Configuration data for the generation process can be found in generate.json

Graph Traversal

A total of four traversal algorithms are used:

Breadth-First
Depth-First
Beam Traversal
Node2Vec

The beam traversal has capabilities for three different heuristic algorithms: out-degree, function size, and random weight assignment.

Node2Vec generates its own embeddings at runtime. It is currently only implemented for the RNN and DNN due to RoBERTa utilizing its own tokenizer.

Machine Learning

Various model architectures are also supported.

Architecture	Attributes	Config File
Recurrent Neural Network	Multi-layered model with LSTM	rnn.json
RNN with Node2Vec	RNN architecture with Node2Vec embeddings	rnn_node2vec.json
Deep Neural Network	Six `Linear` layers with ReLU activation	dnn.json
DNN with Node2Vec	DNN architecture with Node2Vec embeddings	dnn_node2vec.json
Large Language Model	Utilizes RoBERTa as base model	tformer.json

Training a model requires specifying the model architecture beforehand. Like with feature extraction, this is done using the -m or --mode command line argument. A list of available options can be obtained by running the following command.

$ py src/Graphene.py --help

Current options for model training are:

dnn: Train a DNN using standard traversal algorithms and embeddings.
rnn: Train a RNN using standard traversal algorithms and embeddings.
dnn_node2vec: Train a DNN using embeddings generated by Node2Vec.
rnn_node2vec: Train a RNN using embeddings generated by Node2Vec.
tformer_train: Trains a RoBERTa-based binary classifier.

Parameters used by the model during training are defined in the corresponding .json file. A link to any given model's configuration file can be found in the table above.

Explainability

Explainability mechanisms for the RoBERTa model are implemented using the Captum module. This allows for obtaining explanations at the token level, which can then be aggregated to generate a word-level attribution. See Explainability.py for an example of how word attributions can be calculated.

Adversarial Attacks

The second portion of this repository allows for launching adversarial attacks on a RoBERTa model. In almost every case, the model is trained and the attack is launched in series. The type of attack launched is in a white-box scenario, meaning there is full access to the target model. Adversarial attacks utilize the explainability mechanisms explained in the previous section.

Name		Name	Last commit message	Last commit date
Latest commit History 183 Commits
cfg		cfg
src		src
tools		tools
.gitignore		.gitignore
CITATION.cff		CITATION.cff
README.md		README.md
requirements.in		requirements.in
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graphene

Installation

Windows

Linux

Usage

Feature Extraction

Graph Traversal

Machine Learning

Explainability

Adversarial Attacks

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Graphene

Installation

Windows

Linux

Usage

Feature Extraction

Graph Traversal

Machine Learning

Explainability

Adversarial Attacks

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages