Binary2Name

Automatic detection for binary code functionality

This project was devoloped by Carol Hanna and Abdallah Yassin as a part of the Project in Computer Security course at Technion. Project Advisor: Dr. Gabi Nakibly.

Introduction:

The main motivation for this project is to be a helpful tool for researchers of binary code. We started with binary datasets as input and used Angr, a symbolic analysis tool to get intermediate representation of the code. From there, came the most extensive step in the project which was to preprocess the intermediate code in preparation to be used as input to a neural network. We used a deep neural network adopted from code2seq, which is intended for the same goal but on source code as input instead of binaries.

We suggest reading our report about this project here before starting to run the code.

Getting started:

Requirements:

-   python3
-   rouge package, version 0.3.2
-   TensorFlow, version 1.13 (pip install rouge==0.3.2)

Full preprocessing and training:

Extarct our datasets:

cd our_dataset/

tar -xzf <dataset_name>.tar.gz

Preprocessing:

We have more than one model to preprocess the data (<model_name>_main.py files). First, change the run_exps.sh file to run the desired model (default is path with constraints).

run_exps.sh <dataset name: coreutils_ds|dpdk_linux_ds|gnu_dataset>

code2seq training:

cd code2seq

./train.sh

Get the best results quickly - TBD:

We have uploaded our best models, with the preprocessed data. To run it automatically follow:

cd code2seq

continue_best_model.sh --dataset=<coreutils|coreutils_dpdk>

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
angr_experiments		angr_experiments
code2seq		code2seq
mini_project		mini_project
our_dataset		our_dataset
README.md		README.md
constraints_libcalls_main.py		constraints_libcalls_main.py
generate_nero_output.py		generate_nero_output.py
generate_output.py		generate_output.py
main.py		main.py
paths_constraints_main.py		paths_constraints_main.py
paths_only_main.py		paths_only_main.py
run_exps.sh		run_exps.sh
tmp_test_output.txt		tmp_test_output.txt
tmp_train_output.txt		tmp_train_output.txt
tmp_val_output.txt		tmp_val_output.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Binary2Name

Automatic detection for binary code functionality

Introduction:

Getting started:

Requirements:

Full preprocessing and training:

Extarct our datasets:

Preprocessing:

code2seq training:

Get the best results quickly - TBD:

About

Releases

Packages

Contributors 2

Languages

carolhanna01/binary2name

Folders and files

Latest commit

History

Repository files navigation

Binary2Name

Automatic detection for binary code functionality

Introduction:

Getting started:

Requirements:

Full preprocessing and training:

Extarct our datasets:

Preprocessing:

code2seq training:

Get the best results quickly - TBD:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages