Skip to content

Source code of BinMoCo: Improving Binary Code Similarity Detection with Hard Sample-Aware Momentum Contrastive Learning

Notifications You must be signed in to change notification settings

Netsec-SJTU/BinMoCo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BinMoCo

Source code of BinMoCo: Improving Binary Code Similarity Detection with Hard Sample-Aware Momentum Contrastive Learning.

Environment

radare2 5.7.8 + r2pipe 1.7.4
torch-geometric 2.1.0
transformers 4.28.1
pytorch 1.12.1
python 3.10.6
lightning 2.0.2
connectorx 0.3.1
faiss-gpu 1.7.2

Train and Evaluate (BFS dataset)

The same procedure can be applied to BINKIT dataset which can be downloaded here.

  1. Download the BFS dataset here, which is proposed in paper How Machine Learning Is Solving the Binary Function Similarity Problem

  2. Build BFS Database

python code/build_db.py --db_name sec22
  • This will generate db.sqlite in database/sec22
  1. Build function groups
python code/build_group.py database/sec22
  1. Build vocabs
python code/build_vocab.py database/sec22
python code/build_imp_vocab.py database/sec22
  1. Train BinMoCo
python code/train.py \
    --data_dir database/sec22 \
    --data_repr cfg_cg \
    --num_edge_type 3 \
    --embedding_dims 128 \
    --seq_model transformer \
    --seq_hidden_dims 128 \
    --seq_layers 4 \
    --gnn_name gatedgcn \
    --gnn_hidden_dims 128 \
    --gnn_layers 5 \
    --gnn_out_dims 128 \
    --train_batch_size 30 \
    --batch_k 5 \
    --val_batch_size 30 \
    --train_num_each_epoch 300000 \
    --num_epochs 30 \
    --num_workers 12 \
    --learning_rate 0.001 \
    --miner_type ms \
    --loss_type ms \
    --use_moco \
    --memory_size 16384 \
    --early_stopping 15 \
    --precision 16 \
    --save_name sec22_cfg_cg_trans_gatedgcn_gin_ms_ms_moco
  1. Build test data (four tasks: XO, XC, XA, XM, with different Poolsizes)
python code/build_testdata.py database/sec22
  1. Test the trained model
python code/test.py results/dml/sec22_cfg_cg_trans_gatedgcn_gin_ms_ms_moco/version_0

Reference

We refer to the following repositories during implementation:

About

Source code of BinMoCo: Improving Binary Code Similarity Detection with Hard Sample-Aware Momentum Contrastive Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages