This is a study submitted to IEEE/ACM ICPC 2025, under the title Investigating Graphical Representations of Code Changes for Detecting Vulnerability Fixing Changes. In this study we have proposed and investigate three graph-based code change representations and compared their effectiveness on our model we proposed VulFixNet.
The primary dependencies of this project are:
- PyTorch Geometric (version 2.3.1)
- PyTorch
- NumPy
- scikit-learn
- Joern (we used version 2.0.107 - many versions may work, but versions which are significantly older than this may not properly parse some samples)
- Please follow the setup for sent2vec. Note:
sent2vecis only required training and generating images for VulCNN. You can also use the pretrained models of sent2vec from VulCNN at: baidu or Google Drive
This part will extract the source codes from the BigVul dataset, generate Code Property Graphs (CPGs), and combine the related vulnerable and fixed versions of the functions into a single file to generate the graph representations later. The codes related Python scripts can be found under the data_preparation folder.
- Extract the functions from the the BigVul dataset by running the python script below and update the BigVul dataset path:
python bigvul_parser.py- Generate CPGs from the extracted source files. This process will generate the CPGs in
*.dotformat for each vulnerable and fixed functions.
python generate_cpgs.py - Combine the vulnerable and the fixed versions of the functions into one
*.dotfile.
python combined_dots.py There are three code change graph-based representations: Terminal-2-Root, Root-2-Root Terminal-2-Terminal, and Naive Matched. Each graph representation has two Python scripts, one is for representing the vulnerability-fixing, and the other is vulnerability-inducing, which we also refer as inverse. The scripts can be found under the graph_rep_generation folder.
Run the Python script below to generate the vulnerability-fixing representation.
python graph_match_terminal.pyRun the Python script below to generate the vulnerability-inducing representation.
python graph_match_terminal_inverse.pyRun the Python script below to generate the vulnerability-fixing representation.
python graph_match_root_terminal.pyRun the Python script below to generate the vulnerability-inducing representation.
python graph_match_root_terminal_inverse.pyRun the Python script below to generate the vulnerability-fixing representation.
python graph_match_similar.pyRun the Python script below to generate the vulnerability-inducing representation.
python graph_match_similar_inverse.pyTo train and test VulFixNet, run the Python script below. Before, running the script please make sure that you set your folder to the generated pickle files (*.pkl). The scripts related to training/testing VulFixNet can be found under the VulFixNet folder.
python main.py