Skip to content

TitanCAProject/VulSifter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

1. Tool Information

2. Authors and Contact

  • Main Author(s): Yikun Li, Ting Zhang, Ratnadira Widyasari, Yan Naing Tun, Huu Hung Nguyen, Tan Bui, Ivana Clairine Irsan, Yiran Cheng, Xiang Lan, Han Wei Ang, Frank Liauw, Martin Weyssow, Hong Jin Kang, Eng Lieh Ouh, Lwin Khin Shar, David Lo
  • Contact: yikunli@smu.edu.sg

3. Overview

VulSifter identifies function-level vulnerability-fixing changes inside vulnerability-fixing commits by combining (i) an LLM-based semantic analysis of code changes and commit context with (ii) heuristic filtering to remove common non-security changes (notably test-related edits). The tool outputs a vulnerability-fix confidence score and supports thresholding to trade off dataset cleanliness vs. coverage when curating training data.

4. Installation

Install VulSifter by downloading the zip file and setting up the Python dependencies used for LLM inference and/or local model execution.

# Create/activate your Python environment as preferred, then install dependencies.
# (Exact requirements depend on the repo; commonly provided as requirements.txt / environment.yml)
pip install -r requirements.txt

5. Usage

A typical workflow is: (1) prepare commit-level inputs (original/revised function code, commit message, and other changed functions as context), (2) run the provided prediction script to score each change, and (3) optionally apply the built-in heuristics to filter test-related changes and keep items above a chosen threshold (e.g., keep score ≥ 3 for a cleaner subset).

Based on your provided run note:

# After dependencies are installed:
sh gpu_pred.sh

The paper’s scoring prompt produces an integer score in {0,1,2,3,4} representing confidence that the change is vulnerability-fixing; users then select a threshold (e.g., keep only 4 for the strictest subset).

6. Input and Output Format

  • Input format: For each candidate change (typically mined from vulnerability-fixing commits), provide:

    • commit message
    • “Original” function code (before change)
    • “Revised” function code (after change)
    • optional context: other functions changed in the same commit
  • Output format: For each input change:

    • an LLM score 0–4 indicating confidence that the change is vulnerability-fixing (and/or an optional binary label if configured)
    • optionally, a post-processed result after heuristic filtering (e.g., removing test-related changes detected via regex patterns over filenames/function signatures across languages).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published