CovBooster 🚀

CovBooster: Coverage Booster for Binary Code Clone Detection by Reduced Signatures.

This work has been accepted for presentation at The 41st ACM/SIGAPP Symposium On Applied Computing (SAC 2026).

This repository contains the implementation and evaluation code for the CovBooster approach, which uses dominating set algorithms to improve binary function detection coverage.

📋 Overview

CovBooster is a novel approach for binary function detection that leverages dominating set algorithms to select optimal binary sets for function matching.

🚀 Quick Start

Prerequisites

Python 3.7 or higher
Required Python packages (see requirements.txt)

Installation

# Clone the repository
git clone <repository-url>
cd CovBooster-public

# Install dependencies
pip install -r requirements.txt

Data Format

The code expects TLSH hash files organized in the following structure:

<db_root>/
├── <binary_group_1>/
│   ├── <binary_1>/
│   │   ├── <function_1>.tlsh
│   │   ├── <function_2>.tlsh
│   │   └── ...
│   └── <binary_2>/
│       └── ...
└── <binary_group_2>/
    └── ...

Each .tlsh file should contain:

Line 1: TLSH hash value
Line 2: Function size (strand size)

Sample Data: This repository includes a sample_data/ directory containing TLSH hash files for testing. The sample data includes:

5 binary groups: bool, direvent, gmp, libcrypto, libssl
Multiple compiler versions (clang 4.0-7.0, gcc 4.9.4-8.2.0)
Multiple architectures (arm_32, arm_64, x86_32, x86_64)
Multiple optimization levels (O0, O1, O2, O3)
TLSH hash files across multiple binary variants

To use the sample data:

python3 evaluation_dominating.py sample_data 30 test_results

📁 Repository Structure

Github/
├── README.md                          # This file
├── requirements.txt                   # Python dependencies
├── ds_algo.py                         # Dominating set algorithm implementation
├── dominating_set.py                  # Dominating set construction and evaluation
├── evaluation_dominating.py           # Main evaluation script with dominating set approach
├── threshold_sensitivity_analysis.py   # Threshold sensitivity analysis
├── THRESHOLD_ANALYSIS_README.md       # Detailed threshold analysis documentation
└── sample_data/                       # Sample TLSH hash files for testing
    ├── bool/                           # bool binary group
    ├── direvent/                       # direvent binary group
    ├── gmp/                            # gmp binary group
    ├── libcrypto/                      # libcrypto binary group
    └── libssl/                         # libssl binary group

🔧 Usage

1. Dominating Set Evaluation

Run the main evaluation script with dominating set approach:

python3 evaluation_dominating.py <db_root> <base_result_directory>

This will:

Test multiple threshold values (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40)
Generate results for each threshold automatically
Save results in timestamped directories under <base_result_directory>/exp_<timestamp>/threshold_<value>/

2. Threshold Sensitivity Visualization

Generate ROC/PR curves and detailed analysis from evaluation results:

python3 threshold_sensitivity_analysis.py <exp_dir>

This generates:

ROC and PR curves
Detailed performance analysis
Threshold sensitivity results CSV

Generate ROC/PR curves and detailed analysis:

python3 threshold_sensitivity_analysis.py <exp_dir>

📊 Output Files

Evaluation Results

For each binary group, the following files are generated:

dominating_set_metrics.csv: Main metrics (Precision, Recall, F1-score, etc.)
false_positives.csv: False positive cases
topk_matches.csv: Top-K matching results
grid_search_results.csv: Grid search parameter optimization results

Analysis Results

threshold_sensitivity_results.csv: Threshold sensitivity analysis
threshold_roc_pr_curves.png: ROC and PR curves
threshold_detailed_analysis.png: Detailed performance analysis

📝 Citation

If you find CovBooster useful in your research, please cite:

CovBooster: Coverage Booster for Binary Code Clone Detection by Reduced Signatures.
To appear in The 41st ACM/SIGAPP Symposium on Applied Computing (SAC 2026).

🔍 Parameters

Key parameters that can be adjusted:

TLSH_THRESHOLD: TLSH similarity threshold (default: 0-40)
SIZE_DIFF_THRESHOLD: Maximum size difference ratio (default: 0.3)

📧 Contact

For questions or issues, please open an issue on the repository or contact me by email (jeongwoo@korea.ac.kr).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CovBooster 🚀

📋 Overview

🚀 Quick Start

Prerequisites

Installation

Data Format

📁 Repository Structure

🔧 Usage

1. Dominating Set Evaluation

2. Threshold Sensitivity Visualization

📊 Output Files

Evaluation Results

Analysis Results

📝 Citation

🔍 Parameters

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
sample_data		sample_data
.gitignore		.gitignore
README.md		README.md
dominating_set.py		dominating_set.py
ds_algo.py		ds_algo.py
evaluation_dominating.py		evaluation_dominating.py
requirements.txt		requirements.txt
threshold_sensitivity_analysis.py		threshold_sensitivity_analysis.py

Folders and files

Latest commit

History

Repository files navigation

CovBooster 🚀

📋 Overview

🚀 Quick Start

Prerequisites

Installation

Data Format

📁 Repository Structure

🔧 Usage

1. Dominating Set Evaluation

2. Threshold Sensitivity Visualization

📊 Output Files

Evaluation Results

Analysis Results

📝 Citation

🔍 Parameters

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages