FIT: Inspect Vulnerabilities in Cross-Architecture Firmware by Deep Learning and Bipartite Matching

Prerequiste

Make sure you have installed all of following packages or libraries (including dependencies if necessary) in your workspace:

Tensorflow
gensim
scikit-learn
pickle in python
IDA Pro

Dataset

OpenSSL in Instruction_Embedding/dataset/filtered_json_inst/
CoreUtils in Dataset/CoreUtils/json/
FindUtils in Dataset/findutils.zip
BusyBox in Dataset/busybox.zip

3LACFG_Generator

In run_ida_preprocess.py, config ida_path and binary_dir, which is your ida_bin path and the binaries dir path, respectively.
In my_preprocess.py, config path, which is the output dir to store the generated *.ida files.
Run run_ida_preprocess.py and you will get the responding *.ida files, this will take a while...
In 2json.py, config dirpath, which might be the same as path in Step 2, run and get the *.json files in "./json/".
FUTURE WORK: CONTEXT SENSITIVE

Instruction_Embedding

In instEmbedding.py, config dirPath and filePath which is your dataset path and the output path, run preparing(dirPath, filePath) to fetch all instructions from a particular architecture.
In instEmbedding.py, config modelPath which is the output model path, run inputGen(filePath) first and then training(modelPath, output of inputGen) to train the w2v model.
My trained w2v models can be found in "./myModel/".
FUTURE WORK: ARCHITECTURE FREE

Block_Embedding & Graph_Embedding

Run __train.py, this will take a long time Orzzzzz
python3 __train.py --save_path ./saved_model/405/ --w2v_model ../Instruction_Embedding/myModel/
The trained model will be stored in "./saved_model/". The AUC of FIT model is 0.97.
In filter.py, config load_path which is the trained model path, and top_similar which means top N most similar functions. Run filter.py, get the similar score between function pairs and N suspicious vulnerable function names can be found in check__dir. Note that, the vulnerable binary function should be the last json item in the json file which store all the preprocessed functions from the under-test binary!
python3 filter.py --load_path ./saved_model/405/graphnn-model_best --w2v_path ../Instruction_Embedding/myModel/ --top_similar 50 --check_dir ./suspicious/
FUTURE WORK: BETTER WAY FOR FEATURE FUSION AND OF COURSE BETTER MODEL

Graph_Match

Run run_graphMatch.py, find the vulnerable functions' name printed in the terminal.
python3 run_graphMatch.py --sus_dir ../Block_Graph_Embedding/suspicious/ --json_dir ../Instruction_Embedding/dataset/filtered_json_inst/ --threashold 1.5
FUTURE WORK: BETTER BIPATITIE ALGORITHM OR DYNAMIC ANALYSIS

Cite

If you use FIT in scientific work, consider citing our paper presented at COSE'20:

Bibtex:

@article{LIANG2020102032,
title = {FIT: Inspect vulnerabilities in cross-architecture firmware by deep learning and bipartite matching},
journal = {Computers & Security},
volume = {99},
pages = {102032},
year = {2020},
issn = {0167-4048},
doi = {https://doi.org/10.1016/j.cose.2020.102032},
url = {https://www.sciencedirect.com/science/article/pii/S0167404820303059},
author = {Hongliang Liang and Zhuosi Xie and Yixiu Chen and Hua Ning and Jianli Wang},
keywords = {firmware security, binary code, similarity detection, neural network, bipartite matching},
abstract = {Widely deployed IoT devices expose serious security threats because the firmware in them contains vulnerabilities, which are difficult to detect due to two main factors: 1) The firmware’s code is usually not available; 2) A same vulnerability often exists in multiple firmware with different architectures and/or release versions. In this paper, we propose a novel neural network-based staged approach to inspect vulnerabilities in firmware, which first learns semantics in binary code and utilizes neural network model to screen out the potential vulnerable functions, then performs bipartite graph matching upon three-level features between two binary functions. We implement the approach in a tool called FIT and evaluation results show that FIT outperforms state-of-the-art approaches, i.e., Gemini, CVSSA and discovRE, on both effectiveness and efficiency. FIT also detects vulnerabilities in real-world firmware of IoT devices, such as D-Link routers. Moreover, we make our tool and dataset publicly available in the hope of facilitating further researches in the firmware security field.}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FIT: Inspect Vulnerabilities in Cross-Architecture Firmware by Deep Learning and Bipartite Matching

Prerequiste

Dataset

3LACFG_Generator

Instruction_Embedding

Block_Embedding & Graph_Embedding

Graph_Match

Cite

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
3LACFG_Generator		3LACFG_Generator
Block_Graph_Embedding		Block_Graph_Embedding
Dataset		Dataset
Graph_Match		Graph_Match
Instruction_Embedding		Instruction_Embedding
README.md		README.md

Jiadosi/FIT

Folders and files

Latest commit

History

Repository files navigation

FIT: Inspect Vulnerabilities in Cross-Architecture Firmware by Deep Learning and Bipartite Matching

Prerequiste

Dataset

3LACFG_Generator

Instruction_Embedding

Block_Embedding & Graph_Embedding

Graph_Match

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages