GitHub

1. Tool Information

Tool Name: VulSifter
Repository / URL: https://github.com/yikun-li/CleanVul

2. Authors and Contact

Main Author(s): Yikun Li, Ting Zhang, Ratnadira Widyasari, Yan Naing Tun, Huu Hung Nguyen, Tan Bui, Ivana Clairine Irsan, Yiran Cheng, Xiang Lan, Han Wei Ang, Frank Liauw, Martin Weyssow, Hong Jin Kang, Eng Lieh Ouh, Lwin Khin Shar, David Lo
Contact: yikunli@smu.edu.sg

3. Overview

VulSifter identifies function-level vulnerability-fixing changes inside vulnerability-fixing commits by combining (i) an LLM-based semantic analysis of code changes and commit context with (ii) heuristic filtering to remove common non-security changes (notably test-related edits). The tool outputs a vulnerability-fix confidence score and supports thresholding to trade off dataset cleanliness vs. coverage when curating training data.

4. Installation

Install VulSifter by downloading the zip file and setting up the Python dependencies used for LLM inference and/or local model execution.

# Create/activate your Python environment as preferred, then install dependencies.
# (Exact requirements depend on the repo; commonly provided as requirements.txt / environment.yml)
pip install -r requirements.txt

5. Usage

A typical workflow is: (1) prepare commit-level inputs (original/revised function code, commit message, and other changed functions as context), (2) run the provided prediction script to score each change, and (3) optionally apply the built-in heuristics to filter test-related changes and keep items above a chosen threshold (e.g., keep score ≥ 3 for a cleaner subset).

Based on your provided run note:

# After dependencies are installed:
sh gpu_pred.sh

The paper’s scoring prompt produces an integer score in {0,1,2,3,4} representing confidence that the change is vulnerability-fixing; users then select a threshold (e.g., keep only 4 for the strictest subset).

6. Input and Output Format

Input format: For each candidate change (typically mined from vulnerability-fixing commits), provide:
- commit message
- “Original” function code (before change)
- “Revised” function code (after change)
- optional context: other functions changed in the same commit
Output format: For each input change:
- an LLM score 0–4 indicating confidence that the change is vulnerability-fixing (and/or an optional binary label if configured)
- optionally, a post-processed result after heuristic filtering (e.g., removing test-related changes detected via regex patterns over filenames/function signatures across languages).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
gpu_pred.sh		gpu_pred.sh
llm_vul_fixing_pred.py		llm_vul_fixing_pred.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. Tool Information

2. Authors and Contact

3. Overview

4. Installation

5. Usage

6. Input and Output Format

About

Uh oh!

Releases

Packages

Languages

TitanCAProject/VulSifter

Folders and files

Latest commit

History

Repository files navigation

1. Tool Information

2. Authors and Contact

3. Overview

4. Installation

5. Usage

6. Input and Output Format

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages