🪪 Overview | 📐 Structure | 🪄️ Installation | 🔬 Experiments | 📊 Evaluation | 🔗 Citation | 📝 Paper
This repository contains the code for the paper: False Alarm Reduction Method for Weakness Static Analysis Using BERT Model
- Static analysis tools inspect source code and generate diagnostic messages ("warnings") that indicate the location and contextual characteristics of potential security vulnerabilities. Since each static analysis tool differs in the types of vulnerabilities it can detect and its analysis performance, it is common to use multiple tools during software development. However, this approach often produces a large number of alarms, including many false positives.
- In many cases, it is difficult to analyze syntax or semantics accurately across diverse source code written by developers. To address this, we aim to enhance vulnerability detection accuracy using the BERT model, which is based on the Transformer architecture and is capable of capturing sequential and semantic relationships in source code.
- In this research, we propose a system that leverages BERT to compute vulnerability scores for each line of code, and uses a decision tree model to classify the reliability of alerts generated by multiple static anlysis tools—thereby reducing the false positive rate.
- This approach combines the strengths of multiple static analysis tools with the advantages of deep learning to accurately detect software vulnerabilities. Ultimately, we propose a method that enables comprehensive vulnerability analysis while significantly reducing false positives, leading to cost and time savings during software development and code review processes.
- The architecture of the proposed false positive reduction model using BERT-based static vulnerability analysis is shown below.
BWA (Bug Warning Analyzer): This is a line-level vulnerability analysis model using a BERT-based architecture. It tokenizes and embeds input C/C++ source code and analyzes it by learning vulnerability patterns.
Tools configuration: Multiple static analysis tools are used to detect potential vulnerabilities in the source code.
ACM (Alarm Classification Model): This model takes the line-level vulnerability scores produced by the BWA and the alarms generated by multiple static analysis tools as input, and classifies the alerts using a decision tree model.
- This project uses the Juliet test suite, first released in December 2010 by the Center for Assured Software (CAS) of the U.S. National Security Agency (NSA). The suite consists of relatively short code snippets with distinct control flow, data flow, or data structure characteristics. Version 1.3 is used in this project and includes 118 classes of security weaknesses.
- The official C/C++ version includes a total of 64,099 test cases:
- C source files: 53,476
- C++ source files: 46,276
- Header files: 4,422
- Total files: 104,174
- After downloading, store the dataset in the dataset folder.
- Development environment: Anaconda, Python
- For detailed installation instructions, please refer to the README.md file within the module folder.
- Development environment: Python
- For detailed installation instructions, please refer to the README.md file within the module folder.
- Development environment: Python
- For detailed installation instructions, please refer to the README.md file within the module folder.
- Since this research involves two deep learning models, the Juliet test suite dataset must first be split appropriately. As shown in Table 3-6, 60% or 80% of the dataset is used for training each model, while the remaining 20% is used for validation and testing.
- The following figure summarizes the specifications of the system used to conduct experiments and the primary Python packages utilized.

System Specifications and Major Package Information
- To evaluate the performance of the proposed models, we measured Precision, Accuracy, F1-Score, and the ROC Curve. For the BWA model, representative evaluation metrics for deep learning were selected, while for the ACM model, commonly used metrics for classification tasks were employed.

Evaluation metrics for each model
- the following figure presents the results of applying the proposed models to detect vulnerabilities based on different CWE (Common Weakness Enumeration) categories.

Evaluation results by CWE category
If you use this code for your research, please cite the following paper.
Title: False Alarm Reduction Method for Weakness Static Analysis Using BERT Model
Journal: Applied Sciences
DOI: 10.3390/app13063502
@Article{nguyen2023FalseAlarmReduction,
AUTHOR = {Nguyen, Dinh Huong and Seo, Aria and Nnamdi, Nnubia Pascal and Son, Yunsik},
TITLE = {False Alarm Reduction Method for Weakness Static Analysis Using BERT Model},
JOURNAL = {Applied Sciences},
VOLUME = {13},
YEAR = {2023},
NUMBER = {6},
ARTICLE-NUMBER = {3502},
URL = {https://www.mdpi.com/2076-3417/13/6/3502},
ISSN = {2076-3417},
DOI = {10.3390/app13063502}
}