Skip to content

AmadeusBugProject/PredictiveRerankingUsingCodeSmellsForIRFL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predictive Reranking using Code Smells for Information Retrieval Fault Localization

This repository constitutes the source code and results for the paper "Predictive Reranking using Code Smells for Information Retrieval Fault Localization", by Thomas Hirsch and Birgit Hofer, 2023.

Preliminaries

This repository contains the Python code of our experiments and all results used in our paper. The underlying dataset and intermediate files are available on Zenodo, or can be created from scratch by importing Bench4BL (see details below).

Python environment

  • python=3.8
  • pandas
  • numpy
  • joblib
  • scikit-learn==1.0.2
  • keras
  • tensorflow
  • nltk
  • sentence-transformers
  • matplotlib
  • seaborn

Conda files are located in the root directory of the repository, conda_from_history.yml.

Bench4BL

The Bench4BL dataset was used in our evaluation. All data necessary for our machine learning and localization experiments is available on Zenodo.

However, if the data is to be re-imported and recalculated directly from Bench4BL: Bench4BL has to be downloaded and paths to the benchmark root set accordingly in paths.py. BugLocator, BRTracer, and BLIA have to be run on the Bench4BL dataset using the scripting provided by the benchmark. PMD has to be installed in version 6.45.0 and path to PMD set accordingly in paths.py.

Structure of this repository

General utility functions

Experiment

Dataset setup and preparation

The following scripts are responsible to create, import, and set up data that is used in our experiments. The produced data is already part of this repository, the execution of these scripts is therefore only necessary when data is to be re-imported from the Bench4BL repository.

Preliminary experiments and dataset splitting

The following scripts perform data set splitting, and the preliminary experiments used for feature selection as discussed in Section VI of the paper.

Localization experiments

The following scripts perform our actual localization experiments. These scripts are applied to bootstrapped dataset splits. Results are stored in p_FINAL_Bench4BL for the full Bench4BL dataset, for the single project experiments please refer to p_FINAL_CAMEL, p_FINAL_HBASE, and p_FINAL_ROO accordingly. For a detailed experiment setup we refer to our paper.

Result collection and evaluation

The following scripts calculate final scores and statistics from the 20 bootstrap iterations of the previous block of scripts.

Further analysis

The following scripts collect statistics and results to create latex tables and additional analysis used in our paper.

Results

Results for preliminary experiments for feature selection:

Results of our localization eperiments:

Licence

All code and results are licensed under AGPL v3, according to LICENSE file. Other licences may apply for some tools and datasets contained in this repo: cloc-1.92.pl under GPL v2, and data originating from Bench4BL under CCA 4.0.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published