Skip to content

SophieHYe/PreciseBugs

Repository files navigation

PreciseBugs Dataset and Source Code

@INPROCEEDINGS {10298528,
author = {Y. He and Z. Chen and C. Le Goues},
booktitle = {2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)},
title = {PreciseBugCollector: Extensible, Executable and Precise Bug-Fix Collection: Solution for Challenge 8: Automating Precise Data Collection for Code Snippets with Bugs, Fixes, Locations, and Types},
year = {2023},
volume = {},
issn = {},
pages = {1899-1910},
abstract = {Bug datasets are vital for enabling deep learning techniques to address software maintenance tasks related to bugs. However, existing bug datasets suffer from precise and scale limitations: they are either small-scale but precise with manual validation or large-scale but imprecise with simple commit message processing. In this paper, we introduce Precise-BugCollector, a precise, multi -language bug collection approach that overcomes these two limitations. PreciseBugCollector is based on two novel components: a) A bug tracker to map the codebase repositories with external bug repositories to trace bug type information, and b) A bug injector to generate project-specific bugs by injecting noise into the correct codebases and then executing them against their test suites to obtain test failure messages. We implement PreciseBugCollector against three sources: 1) A bug tracker that links to the national vulnerability data set (NVD) to collect general-wise vulnerabilities, 2) A bug tracker that links to OSS-Fuzz to collect general-wise bugs, and 3) A bug injector based on 16 injection rules to generate project-wise bugs. To date, PreciseBugCollector comprises 1057818 bugs extracted from 2968 open-source projects. Of these, 12602 bugs are sourced from bug repositories (NVD and OSS-Fuzz), while the remaining 1045216 project-specific bugs are generated by the bug injector. Considering the challenge objectives, we argue that a bug injection approach is highly valuable for the industrial setting, since project-specific bugs align with domain knowledge, share the same codebase, and adhere to the coding style employed in industrial projects.},
keywords = {industries;deep learning;training;location awareness;software maintenance;computer bugs;manuals},
doi = {10.1109/ASE56229.2023.00163},
url = {https://doi.ieeecomputersociety.org/10.1109/ASE56229.2023.00163},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month = {sep}
}

All data could be found in Zenodo: https://zenodo.org/record/8218280

Folder Structure

├── CVEs: Bugs collected by tracking CVEs. Each BugEntry.json contains buggy file, fix file, location, type, etc.
│
├── OSS-Fuzz: Bugs collected by tracking OSS-Fuzz.
│
└── BugInjection: Bugs based on injecting noisy to correct source code.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published