VGX

VGX: Large-Scale Sample Generation for Boosting Learning-Based Software Vulnerability Analyses


Original artifact	https://figshare.com/s/de1a7ca036bdc38d6a19
Imported from	the publications page
Tool	`pubs2github`

├── baselines
│   ├── getafix
│   │   ├── AST.py
│   │   ├── cluster.py
│   │   ├── const.py
│   │   ├── Hierarchical.py
│   │   ├── parse.py
│   │   ├── Pattern.py
│   │   └── testApplying.py
│   ├── graph2edit
│   │   ├── asdl
│   │   ├── common
│   │   ├── datasets
│   │   ├── edit_components
│   │   ├── scripts
│   │   ├── source_data
│   │   ├── __init__.py
│   │   └── exp_githubedits.py
│   ├── t5
│   │   ├── T5_vulgen_test_translate_final_tokenized2
│   │   ├── T5_beam1.py
│   │   ├── test.sh
│   │   └── train.sh
│   └── vulgen
│       ├── AST.py
│       ├── cluster.py
│       ├── CodeT5_raw_preds_final2_beam1.pkl
│       ├── const.py
│       ├── Hierarchical.py
│       ├── parse.py
│       ├── Pattern.py
│       └── testApplying.py
├── downstream_tasks
│   ├── detection
│   │   ├── devign
│   │   ├── ivdetect
│   │   └── linevul
│   ├── localization
│   │   ├── linevd
│   │   └── linevul
│   └── repair
│       ├── vrepair
│       └── vulrepair
├── VGX
│   ├── Contextualization
│   │   ├── checkpoint
│   │   ├── data
│   │   ├── dataset
│   │   ├── model
│   │   ├── trainer
│   │   └── main.py
│   └── Human-Knowledge-Enhanced-Edit-Pattern
│       ├── .Hierarchical.py.un~
│       ├── AST.py
│       ├── cluster.py
│       ├── const.py
│       ├── Contextualization_raw_preds_final2_beam1.pkl
│       ├── Contextualization_raw_preds_final2_beam1_no_ast.pkl
│       ├── Contextualization_raw_preds_final2_beam1_no_aug.pkl
│       ├── Contextualization_raw_preds_final2_beam1_no_flow.pkl
│       … (18 more items)
… (566 more items)

Original `README.md` (from the upstream artifact)

VGX: Large-Scale Sample Generation for Boosting Learning-Based Software Vulnerability Analyses

VGX is a new technique aimed for large-scale generation of high-quality vulnerability datasets. Given a normal program, VGX first identifies the code contexts in which vulnerabilities can be injected, using a customized source code Transformer featured with a new value-flow-based position encoding and pre-trained against new objectives particularly for learning code structure and context. Then, VGX materializes vulnerability-injection code editing in the identified contexts using patterns of such edits obtained from both historical fixes and human knowledge about real-world vulnerabilities. In this artifact, we provide the source code of VGX, the baselines compared, the generated dataset, as well as the downstream task tools augmented by the generated dataset.

Package Structure

VGX.zip: The source code, evaluation data, and results for VGX and its ablation study experiments.
- Contextualization/: The source code, evaluation data, and results for VGX Step 1 Contextualization.
  - data/: The data used to for Contextualization.
  - checkpoint/: The trained models used to for Contextualization.
  - main.py: The main function for running contextualization.
- Human-Knowledge-Enhanced-Edit-Pattern/: The source code, evaluation data, and results for VGX Step 2 Edit Pattern formation and vulnerability production.
  - Contextualization_raw_preds_final2_beam1.pkl: The contextualization results used for vulnerability production.
  - Contextualization_raw_preds_final2_beam1_no_*.pkl: The contextualization ablation study results used for vulnerability production.
  - res_reg4_mutation.txt: The experiment results on VGX vulnerability production.
  - res_reg4_mutation_no_*.txt: The ablation study experiment results on VGX vulnerability production.
  - testApplying.py: The main function for VGX vulnerability production.
  - vulgen_test_final2.pkl: The testing data for VGX vulnerability production.
baseline.zip: The source code, evaluation data, and comparison results for VGX' baselines.
- vulgen/: The source code, evaluation data, and results for VulGen evaluation.
  - CodeT5_raw_preds_final_beam1.pkl: The experiment output from the VulGen injection localization model on the testing set (used to localize the statement to inject vulnerability).
  - testApplying.py The script to test VulGen using the testing data and generate (possible) vulnerable functions.
  - vulgen_test_final2.pkl: The testing data for vulnerability production.
- getafix/: The source code, evaluation data, and results for Getafix evaluation.
  - testApplying.py The script to test Getafix using the testing data and generate (possible) vulnerable functions.
  - vulgen_test_final2.pkl: The testing data for vulnerability production.
- T5/: The source code, evaluation data, and results for Transformer-based injection localization and translation experiments.
  - T5_vulgen_test_translate_final_tokenized/: The teseting data for Transformer-based vulnerability generation baseline approach.
  - T5_beam1.py: The source code for T5 relevant models training and testing.
  - train.sh: The script to start training the T5 models.
  - test.sh: The script to test the trained T5 models for injection localization or Transformer-based vulnerability generation.
- graph2edit/: The source code, evaluation data, and results for GNN-based vulnerability injection approach Graph2Edit.
  - scripts/githubedits/: The scripts to start training and testing the Graph2Edit model.
  - source_data/githubedits/: The training, validation, and testing data for Graph2Edit.
  - exp_githubedits_runs/: The training and testing outputs for Graph2Edit.
vgx_generated_full.zip: The full dataset generated by VGX ready for use.
downstream_tasks.zip: The downstream task tools augmented by the generated dataset and the respective experiments.
- detection/: The source code, evaluation data, and augmentation results for DL-based vulnerability detection approach.
  - devign/: The source code, evaluation data, and results for DL-based vulnerability detection approach Devign.
    - devign-ori/: The source code, data, and results before the augmentation.
    - devign-aug/: The source code, data, and results after the augmentation.
  - linevul: The source code, evaluation data, and results for DL-based vulnerability detection approach LineVul.
    - linevul-ori/: The source code, data, and results before the augmentation.
    - linevul-aug/: The source code, data, and results after the augmentation.
    - data/: The testing data used for evaluation.
  - ivdetect/: The source code, evaluation data, and results for DL-based vulnerability detection approach IVDetect.
    - ivdetect-ori/: The source code, data, and results before the augmentation.
    - ivdetect-aug/: The source code, data, and results after the augmentation.
    - reveal_ivdetect.csv: The testing data used for evaluation.
- localization/: The source code, evaluation data, and augmentation results for DL-based vulnerability localization approach.
  - linevul: The source code, evaluation data, and results for DL-based vulnerability localization approach LineVul.
    - linevul-ori/: The source code, data, and results before the augmentation.
    - linevul-aug/: The source code, data, and results after the augmentation.
    - data/: The testing data used for evaluation.
  - linevd: The source code, evaluation data, and results for DL-based vulnerability localization approach LineVD.
    - linevd-ori/: The source code, data, and results before the augmentation.
    - linevd-aug/: The source code, data, and results after the augmentation.
- repair/: The source code, evaluation data, and augmentation results for DL-based vulnerability repair approach.
  - vrepair: The source code, evaluation data, and results for DL-based vulnerability repair approach VRepair.
    - vrepair-ori/: The source code, data, and results before the augmentation.
    - vrepair-aug/: The source code, data, and results after the augmentation.
    - data/: The testing data used for evaluation.
  - vulrepair: The source code, evaluation data, and results for DL-based vulnerability repair approach VulRepair.
    - vulrepair-ori/: The source code, data, and results before the augmentation.
    - vulrepair-aug/: The source code, data, and results after the augmentation.
    - data/: The testing data used for evaluation.

How to use

Please use the package structure to find the source code, evaluation data, and results for the corresponding contents described in the original paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VGX

Contents

Original `README.md` (from the upstream artifact)

VGX: Large-Scale Sample Generation for Boosting Learning-Based Software Vulnerability Analyses

Package Structure

How to use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
VGX		VGX
baselines		baselines
downstream_tasks		downstream_tasks
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

VGX

Contents

Original README.md (from the upstream artifact)

VGX: Large-Scale Sample Generation for Boosting Learning-Based Software Vulnerability Analyses

Package Structure

How to use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Original `README.md` (from the upstream artifact)

Packages