Learning to Detect Vulnerable Code Using Differentiable Line Probability of Large Code Models

Motivation

Problem Description

Mission: Automate vulnerability detection to reduce manual effort.
Objectives:
- Automation: Reduce manual efforts in detection.
- Efficiency: Quickly identify vulnerabilities in code.
- Accuracy: Minimize false positives and false negatives.
- Scalability. Easily integrated with multiple languages.

Method

Line Probability

Objectives:
- Must quantify vulnerability likelihood at the line level.
- Must be differentiable for accurate training.
Line probability is the sum of token probabilities which indicates the vulnerability of a code line.
Token probability is derived from a distribution of large code model outputs, where the input is a convex combination of prior distributions and some previous code lines.

Prompt Tuning

Objectives:
- Must optimize the performance of line probability.
- Must be tuned in terms of line probabilities.
Prompt tuning is a mechanism for learning soft prompts, enabling models to perform specific down- stream tasks.[1]
Large Code Model, the tuned version of large language models, detects the possibility of vulnerability by line probabilities.

Results

Model and Dataset

Base Model: CodeLlama 7B
Vulnerable Lines: 336
Benign Lines: 3,207
Validation Split: 0.2

Parameters

Length of Soft Prompt: 64
Epochs: 8
Batch Size: 16
Learning Rate: 0.0005

Observations

Line probability emergence as a potent metric for vulnerability detection.
Prompt tuning makes the model more sensitive to vulnerable lines.

Discussion

High Cost of Calculating Line Probabilities.
- Convex-combinated token embedding vector is constructed from all-available tokens of a base model.
- A line probability should be calculated sequentially. These two reasons make the calculation very slow.
- Exploring ways to calculate line probabilities in parallel could be beneficial.
Lack of Data for Training and Testing.
- Our dataset is constructed from CodeQL's public repository and sampled DARPA's challenges.
- Some data did not fit well during tuning.
- The size of dataset was insufficient for tuning.
- Explore self-supervised learning methods that do not require labeled data but need to design auxiliary tasks useful for vulnerability findings.

[1] Lester, Brian, Rami Al-Rfou, and Noah Constant. "The power of scale for parameter-efficient prompt tuning." arXiv preprint arXiv:2104.08691 (2021).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dependencies		dependencies
docs		docs
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning to Detect Vulnerable Code Using Differentiable Line Probability of Large Code Models

Motivation

Problem Description

Method

Line Probability

Prompt Tuning

Results

Model and Dataset

Parameters

Observations

Discussion

About

Languages

betarixm/CSED499II

Folders and files

Latest commit

History

Repository files navigation

Learning to Detect Vulnerable Code Using Differentiable Line Probability of Large Code Models

Motivation

Problem Description

Method

Line Probability

Prompt Tuning

Results

Model and Dataset

Parameters

Observations

Discussion

About

Topics

Resources

Stars

Watchers

Forks

Languages