Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing ModelParameter discrepancies #2968

Merged
merged 11 commits into from Mar 19, 2019

Conversation

Projects
None yet
6 participants
@abgoswam
Copy link
Member

abgoswam commented Mar 14, 2019

Fixes #2938

  • Fixes to the ~7 odd ModelParameter types which were inconsistent with the rest of the ModelParameter types

Couple of notes:

  1. PR follows naming convention used by other ModelParameter types in the codebase

    • {AlgoName}(optional){TypeOfTask}ModelParameters
    • {TypeOfTask} is added only when needed to distinguish between Binary , Regression or Multiclass
  2. ModelParameter types do not use the word Classification in the {TypeOfTask} . PR follows that convention.

EDIT : MulticlassLogisticRegressionModelParameters is being refactored by separate issue #1100 . SO not fixing that in this PR

@abgoswam abgoswam requested review from sfilipi and artidoro Mar 14, 2019

@@ -66,7 +66,7 @@ public Arguments()
// estimator, as opposed to a regular trainer.
var trainerEstimator = new LogisticRegressionMulticlassClassificationTrainer(env, LabelColumnName, FeatureColumnName);
return TrainerUtils.MapTrainerEstimatorToTrainer<LogisticRegressionMulticlassClassificationTrainer,
MulticlassLogisticRegressionModelParameters, MulticlassLogisticRegressionModelParameters>(env, trainerEstimator);
LogisticRegressionMulticlassModelParameters, LogisticRegressionMulticlassModelParameters>(env, trainerEstimator);

This comment has been minimized.

@wschin

wschin Mar 14, 2019

Member

We can NOT swap them. Logistic regression is never a multiclass classification model while multi-class logistic regression is an alternative name of multinomial logistic regression. Can you revert this change? I will handle this in my issue #1100. I am refactorizing it. #Resolved

This comment has been minimized.

@abgoswam

abgoswam Mar 15, 2019

Author Member

One alternative name we can use is SoftmaxRegression (also mentioned in wikipedia link above).

While refactoring, could you fix the name of the trainer estimator for this as well.

I will revert this, and wait for your refactoring PR. Sounds good ?


In reply to: 265806976 [](ancestors = 265806976)

@codecov

This comment has been minimized.

Copy link

codecov bot commented Mar 15, 2019

Codecov Report

Merging #2968 into master will increase coverage by <.01%.
The diff coverage is 86.41%.

@@            Coverage Diff             @@
##           master    #2968      +/-   ##
==========================================
+ Coverage   72.35%   72.35%   +<.01%     
==========================================
  Files         803      803              
  Lines      143296   143296              
  Branches    16155    16155              
==========================================
+ Hits       103675   103679       +4     
+ Misses      35194    35191       -3     
+ Partials     4427     4426       -1
Flag Coverage Δ
#Debug 72.35% <86.41%> (ø) ⬆️
#production 68.06% <72.15%> (ø) ⬆️
#test 88.52% <100%> (ø) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.LightGbm/LightGbmArguments.cs 89.63% <ø> (ø) ⬆️
...LogisticRegression/MulticlassLogisticRegression.cs 67.46% <ø> (ø) ⬆️
...crosoft.ML.StandardTrainers/Standard/SdcaBinary.cs 72.68% <ø> (ø) ⬆️
...rs/Standard/PoissonRegression/PoissonRegression.cs 88.57% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/GamModelParameters.cs 46.51% <0%> (ø) ⬆️
src/Microsoft.ML.FastTree/FastTreeArguments.cs 85.38% <0%> (ø) ⬆️
src/Microsoft.ML.FastTree/GamTrainer.cs 90.38% <0%> (ø) ⬆️
test/Microsoft.ML.Functional.Tests/Training.cs 100% <100%> (ø) ⬆️
...est/Microsoft.ML.Predictor.Tests/TestPredictors.cs 63.8% <100%> (ø) ⬆️
...icrosoft.ML.Functional.Tests/DataTransformation.cs 100% <100%> (ø) ⬆️
... and 46 more
@wschin

wschin approved these changes Mar 15, 2019

Copy link
Member

wschin left a comment

LGTM.

@@ -24,16 +24,16 @@
OlsTrainer.LoadNameValue,
OlsTrainer.ShortName)]

[assembly: LoadableClass(typeof(OrdinaryLeastSquaresRegressionModelParameters), null, typeof(SignatureLoadModel),
[assembly: LoadableClass(typeof(OlsModelParameters), null, typeof(SignatureLoadModel),

This comment has been minimized.

@TomFinley

TomFinley Mar 15, 2019

Contributor

OlsModelParameters [](start = 32, length = 18)

So, general .NET naming guidelines seem to suggest that we avoid acronyms. That is, OrdinaryLeastSquaresRegressionModelParameters is right, and OlsTrainer is wrong. Similar with PCA. Have we made a conscious decision to discard the .NET guidelines in this case? If we have, could I see where that discussion happened so I can better understand the rationale? It is not in the linked issue, but may have occurred somewhere else.Thanks. #Resolved

This comment has been minimized.

@artidoro

artidoro Mar 15, 2019

Contributor

I think the discussion is in this thread: #2762 #Resolved

This comment has been minimized.

@abgoswam

abgoswam Mar 15, 2019

Author Member

The summary is here : #2762 (comment)

This came out from suggestion from @sfilipi and @eerhardt .

"For some trainers estimators it is OK to use acronyms for {AlgoName} . This is typically when (a) the expanded name doesn't provide more value to typical users than the acronym or (b) the names becomes too long (c) the acronym is unique in the field." #Resolved

This comment has been minimized.

@TomFinley

TomFinley Mar 18, 2019

Contributor

Ah, all right, thanks for linking the relevant issues @artidoro and @abgoswam. #Resolved


[assembly: LoadableClass(typeof(void), typeof(NaiveBayesTrainer), null, typeof(SignatureEntryPointModule), NaiveBayesTrainer.LoadName)]

namespace Microsoft.ML.Trainers
{
public sealed class NaiveBayesTrainer : TrainerEstimatorBase<MulticlassPredictionTransformer<MulticlassNaiveBayesModelParameters>, MulticlassNaiveBayesModelParameters>
public sealed class NaiveBayesTrainer : TrainerEstimatorBase<MulticlassPredictionTransformer<NaiveBayesModelParameters>, NaiveBayesModelParameters>

This comment has been minimized.

@artidoro

artidoro Mar 15, 2019

Contributor

NaiveBayesTrainer [](start = 24, length = 17)

Why does NaiveBayesTrainer not have Multiclass or MulticlassClassification in its name? #Resolved

This comment has been minimized.

@artidoro

artidoro Mar 15, 2019

Contributor

NaiveBayes is a very general probabilistic technique, and could be applied to many different tasks, so I don't understand the rational for removing the task from its name?


In reply to: 266106131 [](ancestors = 266106131)

This comment has been minimized.

@abgoswam

abgoswam Mar 15, 2019

Author Member

#2762 (comment)

  • {TypeOfTask} is added only only when the algorithm supports multiple kinds of tasks . When added, we will spell out each task fully. #Resolved

This comment has been minimized.

@abgoswam

abgoswam Mar 15, 2019

Author Member

We renamed it as per the discussion here.

#2762 (comment)

• {TypeOfTask} is added only only when the algorithm supports multiple kinds of tasks . When added, we will spell out each task fully.

Do u have any suggestions ?


In reply to: 266107249 [](ancestors = 266107249,266106131)

This comment has been minimized.

@artidoro

artidoro Mar 15, 2019

Contributor

My concern is that if we add this algorithm for binaryclassification it would require a breaking change to add Multiclass to the name NaiveBayes. I think that the above guideline works well when the algorithm can only be applied to one learning task. But this algorithm can be applied at least to binary and multiclass so I would suggest to add MulticlassClassification to its name, similarly for the ModelParameters.


In reply to: 266136420 [](ancestors = 266136420,266107249,266106131)

This comment has been minimized.

@abgoswam

abgoswam Mar 15, 2019

Author Member

I see. Currently in ML.NET we only have NaiveBayes under the MulticlassClassification MLContext

So your concern is about a possible breaking change in the future if we want to add NaiveBayes inside the BinaryClassification MLContext ? Would we ever want to do that ?


In reply to: 266138507 [](ancestors = 266138507,266136420,266107249,266106131)

This comment has been minimized.

@artidoro

artidoro Mar 15, 2019

Contributor

Yes that was my concern, maybe @tfinley@gmail.com has an opinion about this?


In reply to: 266143059 [](ancestors = 266143059,266138507,266136420,266107249,266106131)

This comment has been minimized.

@abgoswam

abgoswam Mar 15, 2019

Author Member

@TomFinley . Would be great to get your thoughts on this. #Resolved

This comment has been minimized.

@sfilipi

sfilipi Mar 18, 2019

Member

I would keep multiclass on NaiveBayes too.


In reply to: 266143921 [](ancestors = 266143921,266143059,266138507,266136420,266107249,266106131)

This comment has been minimized.

@abgoswam

abgoswam Mar 18, 2019

Author Member

Looks like we favor naming it NaiveBayesMulticlass. Updating accordingly.


In reply to: 266568218 [](ancestors = 266568218,266143921,266143059,266138507,266136420,266107249,266106131)

@@ -49,7 +49,7 @@ internal FastForestOptionsBase()
}
}

public sealed class FastForestClassificationModelParameters :
public sealed class FastForestMulticlassModelParameters :

This comment has been minimized.

@sfilipi

sfilipi Mar 18, 2019

Member

FastForestMulticlassModelParameters [](start = 24, length = 35)

thank you for not making it FastForestMulticlassClassificationModelParameters :) #WontFix

@sfilipi
Copy link
Member

sfilipi left a comment

:shipit:

@TomFinley
Copy link
Contributor

TomFinley left a comment

Thanks @abgoswam !

@@ -171,7 +171,7 @@ private protected override SchemaShape.Column[] GetOutputColumnsCore(SchemaShape
/// <summary>
/// The model parameters class for Binary Classification GAMs
/// </summary>
public sealed class BinaryClassificationGamModelParameters : GamModelParametersBase, IPredictorProducing<float>
public sealed class GamBinaryModelParameters : GamModelParametersBase, IPredictorProducing<float>

This comment has been minimized.

@eerhardt

eerhardt Mar 18, 2019

Member

BinaryClassification?

Do we need to put Classification in the name? We do everywhere else. #Resolved

This comment has been minimized.

@abgoswam

abgoswam Mar 18, 2019

Author Member

For ModelParameters we do not. Please see the other comment.


In reply to: 266626783 [](ancestors = 266626783)


namespace Microsoft.ML.Trainers
{
public sealed class NaiveBayesTrainer : TrainerEstimatorBase<MulticlassPredictionTransformer<MulticlassNaiveBayesModelParameters>, MulticlassNaiveBayesModelParameters>
public sealed class NaiveBayesMulticlassTrainer : TrainerEstimatorBase<MulticlassPredictionTransformer<NaiveBayesMulticlassModelParameters>, NaiveBayesMulticlassModelParameters>

This comment has been minimized.

@eerhardt

eerhardt Mar 18, 2019

Member

NaiveBayesMulticlassTrainer => NaiveBayesMulticlassClassificationTrainer? Why are we dropping Classification from these names? #Resolved

This comment has been minimized.

@abgoswam

abgoswam Mar 18, 2019

Author Member

as noted in the bug description, it seems for ModelParameters we are consistent in not using the word "Classification" . I am following the same convention in this PR.

Example:

  • LinearBinaryModelParameters
  • FastTreeBinaryModelParameters

I don't think we have solved the "Classification" debate completely yet #2623

Here is my observation :

  • For Trainers, we use "Classification"
  • For ModelParameters, we do not use "Classification"

In reply to: 266627123 [](ancestors = 266627123)

This comment has been minimized.

@eerhardt

eerhardt Mar 18, 2019

Member

For Trainers, we use "Classification"
For ModelParameters, we do not use "Classification"

Why is this a valid policy? #Resolved

This comment has been minimized.

@abgoswam

abgoswam Mar 18, 2019

Author Member

I cannot claim its a "valid" policy, but one observation I have is that

  • we do not use "Classification" anywhere for ModelParameters
    • CalibratedModelParametersBase
    • LinearBinaryModelParameters
  • in some cases it leads to long names e.g. FastForestMulticlassClassificationModelParameter

We have 2 options:

A. Follow the naming convention used by Trainers i.e. "<>BinaryClassification" and "MulticlassClassification" uniformly.
B. Not use "Classification" for ModelParameters .

Which one you favor ? @eerhardt @sfilipi #Resolved

@eerhardt
Copy link
Member

eerhardt left a comment

:shipit:

@abgoswam abgoswam merged commit 0831865 into dotnet:master Mar 19, 2019

3 checks passed

MachineLearning-CI #20190318.48 succeeded
Details
MachineLearning-CodeCoverage #20190318.49 succeeded
Details
license/cla All CLA requirements met.
Details

@abgoswam abgoswam deleted the abgoswam:abgoswam/modelparameters branch Mar 20, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.