- Tool Name: VulSifter
- Repository / URL: https://github.com/yikun-li/CleanVul
- Main Author(s): Yikun Li, Ting Zhang, Ratnadira Widyasari, Yan Naing Tun, Huu Hung Nguyen, Tan Bui, Ivana Clairine Irsan, Yiran Cheng, Xiang Lan, Han Wei Ang, Frank Liauw, Martin Weyssow, Hong Jin Kang, Eng Lieh Ouh, Lwin Khin Shar, David Lo
- Contact: yikunli@smu.edu.sg
VulSifter identifies function-level vulnerability-fixing changes inside vulnerability-fixing commits by combining (i) an LLM-based semantic analysis of code changes and commit context with (ii) heuristic filtering to remove common non-security changes (notably test-related edits). The tool outputs a vulnerability-fix confidence score and supports thresholding to trade off dataset cleanliness vs. coverage when curating training data.
Install VulSifter by downloading the zip file and setting up the Python dependencies used for LLM inference and/or local model execution.
# Create/activate your Python environment as preferred, then install dependencies.
# (Exact requirements depend on the repo; commonly provided as requirements.txt / environment.yml)
pip install -r requirements.txtA typical workflow is: (1) prepare commit-level inputs (original/revised function code, commit message, and other changed functions as context), (2) run the provided prediction script to score each change, and (3) optionally apply the built-in heuristics to filter test-related changes and keep items above a chosen threshold (e.g., keep score ≥ 3 for a cleaner subset).
Based on your provided run note:
# After dependencies are installed:
sh gpu_pred.shThe paper’s scoring prompt produces an integer score in {0,1,2,3,4} representing confidence that the change is vulnerability-fixing; users then select a threshold (e.g., keep only 4 for the strictest subset).
-
Input format: For each candidate change (typically mined from vulnerability-fixing commits), provide:
- commit message
- “Original” function code (before change)
- “Revised” function code (after change)
- optional context: other functions changed in the same commit
-
Output format: For each input change:
- an LLM score 0–4 indicating confidence that the change is vulnerability-fixing (and/or an optional binary label if configured)
- optionally, a post-processed result after heuristic filtering (e.g., removing test-related changes detected via regex patterns over filenames/function signatures across languages).