Skip to content

ChristianBirchler/ticket-tagger-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Model Evaluations for GitHub Issue Classification

Collaborators

Table of Contents

Goal of this Repository

Issue labeling on GitHub is usually done manually by the developers. In order to automate this process a tool name Ticket Tagger was developed. It classifies the issues on GitHub by a fasttest classifier. Ticket Tagger is a machine learning-driven issue classification bot. It was written by Rafael Kallis in the scope of a project similar to this one. Once installed in a GitHub repository, Ticket Tagger offers the benefit of automatic issue classification. Small repositories may not gain much value from it, but larger ones do since they receive more issues per time unit.

The paper by R. Kallis et al. (2019)

The main limitations of Ticket Tagger

  • It uses only the fasttext classifier developed by facebook
  • The evaluation was done on issues of different repositories
  • No preprocessing of the data was done
  • The data sets used for training come from lots of different repositories by fetching data using Google Big Query GitHub Archive

The main problems or questions we address in this repository

Can we increase the classification performance with different classifiers?

What changes in the data have an impact on the classifications?

Does using a single repository for machine learning increase prediction performance (We use Pandas in our case)?

This repository contains all data, scripts and evaluations to explore those problems and questions.

Extension Points

Extend the original data set with another balanced real world data set

Extend the original ML pipeline

Summary of Findings

  • Preprocessing affects the performance considerably but introduced variance depending on method used
  • Using the Pandas repository for training and testing leads to higher performance when cross validating
  • Fasttext may not generate the best performing model when it comes to issue classification

A more detailed description and discussion can be found in the results folder

License

This repository contains derivative work of Ticket Tagger, which is published under the GPL-3 license. This repository also is published under the GPL-3 license.

You are required to cite if you use any of our work:

Moser T., Steiger D., Birchler C., Fried L., Panichella S., Kallis R., 2020. Machine Learning Model Evaluations for GitHub Issue Classification, https://github.com/ChristianBirchler/ticket-tagger-analysis
Machine Learning Model Evaluations for GitHub Issue Classification
Copyright (C) 2020 Tim Moser, David Steiger, Christian Birchler, Lara Fried, Sebastiano Panichella, Rafael Kallis

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.

References

A Project in the Context of the University of Zurich Course Software Maintenance & Evolution

UZH

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages