Replication Package for "The Impact of IR-based Classifier Configuration on the Performance and the Effort of Bug Localization of Method-Level Bug Localization" (Under Review)

Abstract: IR-based bug localization techniques assist developers in locating buggy source code entities based on the content of a bug report. Prior bug localization research makes extensive use of Information Retrieval (IR) classifiers. However, such IR-based classifiers have various parameters that can be configured differently (e.g., the choice of entity representation). Recent research has shown that the choice of a classifier configuration impacts the performance of classifiers that are used to locate bugs at the file-level. Moreover, our recent work has shown that locating buggy source code entities at the method-level requires less effort than at the file-level. However, little is known about the impact that the choice of a classifier configuration has on classifiers that are used to locate bugs at the method-level. In this paper, we investigate the impact that the choice of the IR-based classifier configuration has on the top-k performance and the required effort to examine source code entities before locating a bug at the method level. Moreover, we also analyze the classifier sensitivity to parameter value changes. In total, we explore a large space of classifier configurations 3,172 configurations. Through a case study of 5,266 bug reports of two software systems (i.e., Eclipse and Mozilla), we find that (1) the choice of classifier configuration impacts the top-k performance from 0.44% to 36% and the required effort from 4,395 to 50,000 LOC, suggesting that using inappropriate configurations could result in poor top-k performance and wasted effort; (2) classifier configurations with similar top-k performance might require different efforts, suggesting that practitioners should take into consideration the required effort to locate bugs while comparing the performance of classifier configurations using top-k metrics; (3) VSM achieves both the best top-k performance and the least required effort for method-level bug localization; (4) the likelihood of randomly picking a configuration that performs within 20% of the best top-k classifier configuration is on average 5.4% and that of the least effort is on average 1%, suggesting that finding the best configuration is difficult; and (5) configurations related to the entity representation of the analyzed data have the most impact on both the top-k performance and the required effort, suggesting that practitioners would benefit of guidance on which configuration parameters matter the most.

Data and Scripts

You can download the latest version here

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Replication Package for "The Impact of IR-based Classifier Configuration on the Performance and the Effort of Bug Localization of Method-Level Bug Localization" (Under Review)

Data and Scripts

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Replication Package for "The Impact of IR-based Classifier Configuration on the Performance and the Effort of Bug Localization of Method-Level Bug Localization" (Under Review)

Data and Scripts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Packages