This repo contains BinaryAI file comparison algorithm implementation, along with datasets and metric scripts.
The binaryai_bindiffmatch
directory is BinaryAI BindiffMatch algorithm, not including BAI-2.0 model and embedding implementation.
The data
directory contains metric datasets. (You can download it from release assets )
data/files
contains unstripped files and stripped files.
We use binaries from coreutils
, diffutils
and findutils
libraries as testcases. These binaries are experiment data in DeepBinDiff project, go to origin project to get these binaries.
We manually build some versions of openssl
project and choose two files as example case. Here are the sources openssl-1.1.1u openssl-3.1.1
data/labeleds
contains pre-generated infos of functions in each binary file. The basicinfo, pseudocode, callees, name are powered by Ghidra, and feature embedding vectors are powered by BinaryAI BAI-2.0 model. Scripts to generate these file are not included in this project.
data/matchresults
contains pre-generated match results on testcases and example, powered by BinaryAI BindiffMatch algorithm and Diaphora, as well as the groundtruth results.
BinaryAI BindiffMatch results can be generated by python -m binaryai_bindiffmatch <file1_labeled_doc> <file2_labeled_doc> -o <matchresult>
on each pair of files.
Diaphora results are generated by first applying patch on this commit, then using IDA headless mode to export .sqlite
database. After then, run offline Diaphora script to generate .diaphora
results (with relaxed_ratio
set to True, other options keep default), and finally convert to json as same format as BinaryAI results. Scripts for doing these are not included in this project.
Require Python >= 3.10
Run pip install .[lowmem]
to install this package and its dependencies
python scripts/metrics.py testcases binaryai
: get metric result on full testcases powered by BinaryAI BindiffMatch algorithm
python scripts/metrics.py testcases diaphora
: get metric result on full testcases powered by Diaphora
python scripts/metrics.py example binaryai
: get metric result on example case powered by BinaryAI BindiffMatch algorithm
python scripts/metrics.py example diaphora
: get metric result on example case powered by Diaphora