Skip to content
Branch: master
Find file History
bamurtaugh and CESARDELATORRE Migration/v1.3.1 (#597)
* Add anomaly detection example to solution

* Updated label/score printing for anomaly detect

With ML.NET v1.3.0, fixed issue where Predicted Label was always true. No longer need "hack" of comparing score to 0.2

* Update build props nuget versions

ML.NET v1.3.1, ML Preview 0.15.0

* Renamed solution to match v1.3.1

* Update C# readmes to v1.3.1

* Update F# E2E readme

* Update F# getting started readmes

Change to v1.3.1

* Rename F# solution to v1.3.1

* Update to preview v0.15.1

* Changed to ML from MLPreview

Update TimeSeries to v1.3.1 instead of preview

* Update timeseries from preview to regular v1.3.1

* Change TimeSeries from preview to regular v1.3.1

* Update TensorFlow from Preview to regular v1.3.1

* Update TensorFlow from preview to regular v1.3.1
Latest commit 2feb479 Aug 6, 2019
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
GitHubLabeler Fix comment typo in GitHubLabeler sample (#490) May 29, 2019
GitHubLabeler.sln GitHubLabeler refactored to use ML.NET v0.7 and the common-code appro… Nov 3, 2018
README.md Migration/v1.3.1 (#597) Aug 6, 2019

README.md

GitHub Issues Labeler

ML.NET version API type Status App Type Data sources Scenario ML Task Algorithms
v1.3.1 Dynamic API Up-to-date Console app .csv file and GitHub issues Issues classification Multi-class classification SDCA multi-class classifier, AveragedPerceptronTrainer

This is a simple prototype application to demonstrate how to use ML.NET APIs. The main focus is on creating, training, and using ML (Machine Learning) model that is implemented in Predictor.cs class.

Overview

GitHubLabeler is a .NET Core console application that:

  • trains ML model on your labeled GitHub issues to teach the model what label should be assigned for a new issue. (As an example, you can use corefx-issues-train.tsv file that contains issues from public corefx repository)
  • labels a new issue. The application will get all unlabeled open issues from the GitHub repository specified at the appsettings.json file and label them using the trained ML model created on the step above.

This ML model is using multi-class classification algorithm (SdcaMultiClassTrainer) from ML.NET.

Enter your GitHub configuration data

  1. Provide your GitHub data in the appsettings.json file:

    To allow the app to label issues in your GitHub repository you need to provide the folloving data into the appsettings.json file.

        {
          "GitHubToken": "YOUR-GUID-GITHUB-TOKEN",
          "GitHubRepoOwner": "YOUR-REPO-USER-OWNER-OR-ORGANIZATION",
          "GitHubRepoName": "YOUR-REPO-SINGLE-NAME"
        }

    Your user account (GitHubToken) should have write rights to the repository (GitHubRepoName).

    Check out here how to create a Github Token.

    GitHubRepoOwner can be a GitHub user ID (i.e. "MyUser") or it can also be a GitHub Organization (i.e. "dotnet")

  2. Provide training file

    a. You can use existing corefx_issues.tsv data file for experimenting with the program. In this case the predicted labels will be chosen among labels from corefx repository. No changes required.

    b. To work with labels from your GitHub repository, you will need to train the model on your data. To do so, export GitHub issues from your repository in .tsv file with the following columns:

    • ID - issue's ID
    • Area - issue's label (named this way to avoid confusion with the Label concept in ML.NET)
    • Title - issue's title
    • Description - issue's description

    and add the file in Data folder. Update DataSetLocation field to match your file's name:

private static string DataSetLocation = $"{BaseDatasetsLocation}/corefx-issues-train.tsv";

Training

Training is a process of running an ML model through known examples (in our case - issues with labels) and teaching it how to label new issues. In this sample it is done by calling this method at the console app:

BuildAndTrainModel(DataSetLocation, ModelFilePathName);

After the training is completed, the model is saved as a .zip file in MLModels\GitHubLabelerModel.zip.

Labeling

When the model is trained, it can be used for predicting new issue's label.

For a single test/demo without connecting to a real GitHub repo, call this method from the console app:

TestSingleLabelPrediction(ModelFilePathName);

For accessing the real issues of a GitHub repo, you call this other method from the console app:

await PredictLabelsAndUpdateGitHub(ModelFilePathName);

For testing convenience when reading issues from your GitHub repo, it will only load not labeled issues that were created in the past 10 minutes and are subject to be labeled. You can change that config, though:

Since = DateTime.Now.AddMinutes(-10)

You can modify those settings. After predicting the label, the program updates the issue with the predicted label on your GitHub repo.

You can’t perform that action at this time.