ApacheBugLearning

ApacheBugLearning is a Java project structured in Maven modules, capable of retrieving all commits of any Java project from the Apache Software Foundation, linked with related Jira tickets, and through machine learning predicting classes affected by BUGs.

git Module

The git module takes care, through the use of the Eclipse JGit project, of retrieving all the conmmits, related releases and files inherent to each commit of each release.

The source repository can be pre-existing locally, thus indentified by an absolute path or be cloaned locally automatically, indicating the choice within the configuration .properties file.The projects that can be cloned automatically are: TAJO, STORM, SYNCOPE, ZOOKEEPER, OPENJPA, AVRO and BOOKKEEPER.

During reading, several metrics are calculated for each file and entered into a .csv file. The metrics used:

Metric	Description
`LOC`	Number of lines of code
`LOC ADDED`	Sum of LOCs added
`LOC MAX ADDED`	Maximum LOCs added
`LOC TOUCHED`	Sum of LOCs added and LOCs removed
`NUMBER OF REVISION`	Number of revisions of the class
`AVERAGE LOC ADDED`	Average number of LOCs added
`NUMBER OF AUTHORS`	Number of reviewers of the class
`CHURN`	\| LOC added - LOC removed \|
`MAX CHURN`	Maximum CHURN
`AVERAGE CHURN`	Average CHURN
`NPM`	Number of public methods in the class
`NPVM`	Number of private methods in the class
`NSM`	Number of static methods in the class
`NAM`	Number of methods in the class
`NLOCM`	Number of commented lines

Project test files are discarded; you can un-discard them by setting the flag in the configuration file.

jira Module

The jira module deals with the management of the data recuoperated through the git module, in particular, it deals with associating each commit (object of type RevCommit) with the related Jira ticket, associating the commits and calculated metrics with the related files, and associating the files with the reference relaease.

Ticket information is retrieved in groups of 1000 tickets, via the Jira REST API, using the query:

"https://issues.apache.org/jira/rest/api/2/search?jql=project%20%3D%20" + projectName +
"%20AND%20issuetype%20%3D%20Bug%20AND%20(%22status%22%20%3D%22resolved%22%20OR%20%22status" +
"%22%20%3D%20%22closed%22)%20AND%20%20%22resolution%22%20%3D%20%22fixed%22%20" + "%20ORDER%20BY%20key%20ASC" +
"&fields=key,resolutiondate,versions,created,fixVersions&startAt=" + i + "&maxResults=" + j

All tickets that refer to BUGs, and that have been closed as a result of the Bug fix, are considered. The variables i and j are used to retrieve tickets in groups of 1000 items (group of maximum elements returned by a Jira REST API call).

A separate entity has been created for each of these items:

commit → Bug Entity

release → Release Entity

file → RepoFile Entity

repository → Repo Entity

Inconsistent tickets are discarded and if not present the injection version, necessary for class labeling, is calculated using proportion techniques (link paper). Under a threshold, set within the configuration file, proportion Cold Start is used otherwise porportion Increment.

dataset Module

The dataset module deals with the prediction of Bugginness of classes, using the training set constructed with the previous modules and as a Machine Laerning tool WEKA.

The linear combination of different techniques and classifiers was used for prediction. The Classifiers used were Random Forest, Naive Bayes, IBk in combination with Feature Selection Best First techniques (Backward Search and Forward Search), Sampling Oversampling, Undersampling and SMOTE techniques, and Cost Sensitive Threshold and Learning techniques.

For the evaluation of the classifiers, the Walk Forward technique was used; all the given data were divided into k Releases, ordered chronologically, and for each release a run was performed. At the k-th run, the k-th release was used as the testing set and all previous releases as the training set.

For each classifier used, a .csv file is generated containing for each combination of techniques used the values related to prediction:

Metric	Description
`PRECISION`	How many times you have correctly classified an instance as positive
`RECALL`	How many of the positives you have been able to classify
`AUC`	Area subtended by the ROC curve
`KAPPA`	How many times you have been more accurate than a dummy classifier
`ACCURACY`	Percentage of all right estimates out of all estimates
`TRUE-NEGATIVE`	Predicted negative and actually negative
`TRUE-POSITIVE`	Predicted positive and actually positive
`FALSE-POSITIVE`	Predicted positive and actually negative
`FALSE-NEGATIVE`	Predicted negative and actually positive

Configuration File

The configuration.properties file was used for configuration. The parameters are:


`project`	Name of the project to be analyzed
`use_local`	Path of the cloned project locally
`project_path`	Path of the cloned project locally
`coldstart_project`	Set to 'true' to use an apache repository already cloned locally or set to 'false' to automatically clone the repository
`threshold`	Minimum number of commits to make proportion increment otherwise make cold start
`aproximate`	If set to 'true' it approximates the Injected Version calculated via proportion by excess, otherwise it approximates by default
`get_test_classes`	If set to 'true' consider the project test files

Project Vulnerabilities

The project was analyzed via the Sonar Cloud platform ( 🔗 analysis ), showing the presence of 0 code smells, 0 bugs and 0 vulnerabilities.

Presentation 🇮🇹

An expository presentation of the project has been created in Italian language.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
BugRetriever		BugRetriever
ProjectMaterial		ProjectMaterial
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ApacheBugLearning

git Module

jira Module

dataset Module

Configuration File

Project Vulnerabilities

Presentation 🇮🇹

About

Releases

Packages

Languages

callbrok/ApacheBugLearning

Folders and files

Latest commit

History

Repository files navigation

ApacheBugLearning

git Module

jira Module

dataset Module

Configuration File

Project Vulnerabilities

Presentation 🇮🇹

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages