Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiclass Classification Samples Update #3322

Merged
merged 3 commits into from Apr 16, 2019

Conversation

Projects
None yet
4 participants
@artidoro
Copy link
Contributor

artidoro commented Apr 12, 2019

Tracked under #2522

This PR adds samples for LbfgsMaximumEntropy and SdcaNonCalibrated trainers.

This PR also removes dependency from Samples Utils in other multiclass classification samples and adds .tt files for all multiclass classification samples.

Notice that this PR does not take care of Naive Bayes as it is in progress in #3246.

@artidoro artidoro requested review from wschin, shmoradims, ganik and zeahmed Apr 12, 2019

@artidoro artidoro self-assigned this Apr 12, 2019

@artidoro artidoro force-pushed the artidoro:multiclassamples branch from 40725ca to 2bf1485 Apr 13, 2019

// Expected output:
// Micro Accuracy: 0.91
// Macro Accuracy: 0.91
// Log Loss: 0.00

This comment has been minimized.

Copy link
@artidoro

artidoro Apr 13, 2019

Author Contributor

Notice that there is no EvaluateNonCalibrated method in the multiclass classification catalog.
The LogLoss metric does not makes sense in this case.

I opened an issue #3323 to track this problem.

@artidoro artidoro force-pushed the artidoro:multiclassamples branch from 2bf1485 to 5975856 Apr 13, 2019

@@ -44,7 +51,12 @@ namespace Samples.Dynamic.Trainers.MulticlassClassification
var options = new <#=TrainerOptions#>;

// Define the trainer.
var pipeline = mlContext.MulticlassClassification.Trainers.<#=Trainer#>(options);
var pipeline =
// Convert the string labels into key types.

This comment has been minimized.

Copy link
@wschin

wschin Apr 13, 2019

Member

a line not aligned. #Resolved

This comment has been minimized.

Copy link
@artidoro

artidoro Apr 15, 2019

Author Contributor

It's done on purpose, so that we can comment before the two estimators added to the pipeline.


In reply to: 275097557 [](ancestors = 275097557)

@wschin

wschin approved these changes Apr 13, 2019

@codecov

This comment has been minimized.

Copy link

codecov bot commented Apr 13, 2019

Codecov Report

Merging #3322 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #3322   +/-   ##
=======================================
  Coverage   72.69%   72.69%           
=======================================
  Files         807      807           
  Lines      145172   145172           
  Branches    16225    16225           
=======================================
  Hits       105537   105537           
  Misses      35220    35220           
  Partials     4415     4415
Flag Coverage Δ
#Debug 72.69% <ø> (ø) ⬆️
#production 68.23% <ø> (ø) ⬆️
#test 88.97% <ø> (ø) ⬆️
Impacted Files Coverage Δ
...oft.ML.StandardTrainers/StandardTrainersCatalog.cs 92.34% <ø> (ø) ⬆️
@wschin

This comment has been minimized.

Copy link
Member

wschin commented Apr 13, 2019

                mlContext.Transforms.Conversion.MapValueToKey("Label")

nameof(DataPoint.Label) #Resolved


Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/MulticlassClassification.ttinclude:39 in 5975856. [](commit_id = 5975856, deletion_comment = False)

// Convert the string labels into key types.
mlContext.Transforms.Conversion.MapValueToKey("Label")
// Apply LightGbm multiclass trainer.
.Append(mlContext.MulticlassClassification.Trainers.LightGbm());

This comment has been minimized.

Copy link
@wschin

wschin Apr 13, 2019

Member

LightGbm() [](start = 72, length = 10)

Better to specify column names using nameof. #Resolved

var model = pipeline.Fit(trainingData);

// Create testing data. Use different random seed to make it different from training data.
var testData = mlContext.Data.LoadFromEnumerable(GenerateRandomDataPoints(500, seed:123));

This comment has been minimized.

Copy link
@yaeldekel

yaeldekel Apr 15, 2019

Member

nit: add a space here. #Resolved

// Define the trainer.
var pipeline =
// Convert the string labels into key types.
mlContext.Transforms.Conversion.MapValueToKey("Label")

This comment has been minimized.

Copy link
@yaeldekel

yaeldekel Apr 15, 2019

Member

MapValueToKey [](start = 52, length = 13)

I'm not sure if this is an issue, there may be other samples for this, but would it make sense to pass the IDataView keyData argument to this method to show how the user can avoid a pass over the data to get the labels in case the set of labels is known? #Resolved

This comment has been minimized.

Copy link
@artidoro

artidoro Apr 15, 2019

Author Contributor

I think it's important to show that use of the method, and I hope that there is a sample for that under MapValueToKey. However, I don't think it would be the right place here to include such a sample.
What I could do instead is load a keyType directly, instead of using MapValueToKey. Would that be better?


In reply to: 275474158 [](ancestors = 275474158)

@shmoradims
Copy link
Contributor

shmoradims left a comment

:shipit:

@yaeldekel
Copy link
Member

yaeldekel left a comment

:shipit:

@artidoro artidoro force-pushed the artidoro:multiclassamples branch from ec560ed to 7682d7f Apr 16, 2019

@artidoro artidoro force-pushed the artidoro:multiclassamples branch from 7682d7f to 41f062f Apr 16, 2019

@artidoro artidoro merged commit 8644b3b into dotnet:master Apr 16, 2019

2 of 3 checks passed

MachineLearning-CodeCoverage #20190416.25 failed
Details
MachineLearning-CI #20190416.43 succeeded
Details
license/cla All CLA requirements met.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.