Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor GAM Predictor to be Public-ly Creatable #2142

Merged
merged 5 commits into from Jan 22, 2019

Conversation

rogancarr
Copy link
Contributor

@rogancarr rogancarr commented Jan 14, 2019

This PR refactors the GAM trainer and predictor such that all training information remains in the trainer, and the GAM predictor is for any generic (binned) GAM.

Fixes #1948

Update: Two small bug fixes included in this PR

  1. Fixed a serialization error when reading in GAM models written prior to 0.6. Now we issue a warning with a workaround.
  2. Fixed a scoring error for sparse datasets.


namespace Microsoft.ML.Trainers.FastTree
{
public abstract class GamModelParametersBase : ModelParametersBase<float>, IValueMapper, ICalculateFeatureContribution,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 4, length = 6)

a line of XML in anything public..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

verReadableCur: 0x00010001,
// verWrittenCur: 0x00010001, // Initial
// verWrittenCur: 0x00010001, // Added Intercept but collided from release 0.6-0.9
verWrittenCur: 0x00020001, // Added Intercept (version revved to address collisions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0x00020001 [](start = 31, length = 10)

I think in general we upgrade version by last digit, i.e. 0x0010002 instead of 0x00020001 #Closed

Copy link
Contributor Author

@rogancarr rogancarr Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. #Closed

internal BinaryClassificationGamModelParameters(IHostEnvironment env, int inputLength, Dataset trainset,
double meanEffect, double[][] binEffects, int[] featureMap)
: base(env, LoaderSignature, inputLength, trainset, meanEffect, binEffects, featureMap) { }
public BinaryClassificationGamModelParameters(IHostEnvironment env,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BinaryClassificationGamModelParameters [](start = 15, length = 38)

Since you are making this public, could you also add XML documentation of what it does and what the parameters are?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// For Generalized Additive Models (GAM), the contribution of a feature is equal to the shape function for the given feature evaluated at
/// the feature value.
/// </summary>
public FeatureContributionCalculator FeatureContributionClaculator => new FeatureContributionCalculator(this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FeatureContributionCalculator [](start = 15, length = 29)

this class has one internal property and one internal method.
I advice to make it internal instead of public (as class in general and as property of Gam predictor)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what you mean here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thing gonna be visible to any user. In same time if you look on content of this class, it has only internal objects.
Basically during debugging user will see FeatureContributionCalculator, but it will have zero public methods and properties.
We have that thing for a reason, but I doubt it should be public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@artidoro Does this method need to be public to work?

/// <returns>The bin upper bounds. May be null if this feature has no bins.</returns>
public double[] GetFeatureBinUpperBounds(int featureIndex)
{
Contracts.Assert(0 <= featureIndex && featureIndex < _numFeatures);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert [](start = 22, length = 6)

use Check for public methods.
Also give it proper message. #Closed

Copy link
Contributor Author

@rogancarr rogancarr Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. #Closed

private readonly double[][] _binUpperBounds;
private readonly double[][] _binEffects;
public readonly double Intercept;
private readonly int _numFeatures;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_numFeatures [](start = 29, length = 12)

I don't see any way for user to access this, but in same time it's used as constrain in public method.
Can we expose it with proper description? #Closed

Copy link
Contributor Author

@rogancarr rogancarr Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. #Closed

/// <returns>The binned weights for each feature.</returns>
public double[] GetFeatureWeights(int featureIndex)
{
Contracts.Assert(0 <= featureIndex && featureIndex < _numFeatures);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert [](start = 22, length = 6)

check and proper message. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// </summary>
/// <param name="featureIndex">The index of the feature (in the training vector) to get.</param>
/// <returns>The bin upper bounds. May be null if this feature has no bins.</returns>
public double[] GetFeatureBinUpperBounds(int featureIndex)
Copy link
Contributor

@artidoro artidoro Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetFeatureBinUpperBounds [](start = 24, length = 24)

Is there a reason why we don't want to return the upper bounds for all features with an array of arrays?
Same question for the binned weights.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real reason. Let's add this as a separate issue.

double[] featureWeights;
if (_inputFeatureToDatasetFeatureMap.TryGetValue(featureIndex, out int j))
{
featureWeights = new double[_binUpperBounds[j].Length];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_binUpperBounds [](start = 44, length = 15)

Just checking that this is the same length as _binEffects[j]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Should be the same and checked when defined, but not good practice. Changing to use _binEffects[j].Length.

}

/// <summary>
/// The GAM model visualization command. Because the data access commands must access private members of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must access private members [](start = 82, length = 27)

I think it's a sign what we haven't properly design predictor.
From what I see it needs:
_inputType.ValueCount which is same as _numFeatures which I think we need to expose.
_inputFeatureToDatasetFeatureMap , _binEffects and _featureMap First two I think you use in GetFeatureBinUpperBounds and it can probably be rewritten. last one can be made as internal.

I found it's strange what we make command line thing which has more functionality than user can achieve by himself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point of confusion. Currently, there can be fewer shape functions built than input features because of flocking (I think this is a bug and will file one separately.)

_inputType.ValueCount is different than _numFeatures. I've rename numFeatures to NumShapeFunctions to clarify this.

inputFeaturesToDatasetFeatureMap has been renamed to _inputFeatureToShapeFunctionMap

featureMap has been renamed to shapeToInputMap. (and is private).

I found it's strange what we make command line thing which has more functionality than user can achieve by himself.

You mean the visualization command? With the new public interfaces, the user should be able to hand-edit models and create versions with the modified shapes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_inputType.ValueCount is different than _numFeatures.

If featureToInputMap is null, I think they should be same, otherwise it's a size of featureToInputMap. Sorry, didn't properly check constructor of predictor.

You mean the visualization command? With the new public interfaces, the user should be able to hand-edit models and create versions with the modified shapes.

I just find it strange what we have nested internal command in predictor. And only reason why it's nested is because we using some internal guts of predictor. I understand what you can do hand-edits, but can user visualize model same way how our visualize command work? If not, it's probably sign what we didn't expose enough.

Copy link
Contributor Author

@rogancarr rogancarr Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you are saying. The visualization command is for an internal tool that may not be released. I am not sure what the plan for visualization in the .NET world is, but for now we are "visualizing" by pretty-printing, as we do in the docs. Obviously not as nice as fully interactive visualizations!

/// <returns>The bin upper bounds. May be null if this feature has no bins.</returns>
public double[] GetFeatureBinUpperBounds(int featureIndex)
{
Contracts.Assert(0 <= featureIndex && featureIndex < _numFeatures);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contracts [](start = 12, length = 9)

You have Host. #Closed

Copy link
Contributor Author

@rogancarr rogancarr Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. #Closed

/// <returns>The binned weights for each feature.</returns>
public double[] GetFeatureWeights(int featureIndex)
{
Contracts.Assert(0 <= featureIndex && featureIndex < _numFeatures);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contracts [](start = 12, length = 9)

Please use Host. #Closed

Copy link
Contributor Author

@rogancarr rogancarr Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. #Closed

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Jan 16, 2019

using System;

no header???!!!11 #Closed


Refers to: src/Microsoft.ML.FastTree/GamModelParameters.cs:1 in add0cc8. [](commit_id = add0cc8, deletion_comment = False)


private double GetBinEffect(int featureIndex, double featureValue)
{
Contracts.Assert(0 <= featureIndex && featureIndex < _numFeatures);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contracts [](start = 12, length = 9)

Please change Contracts to Host where it possible. #Closed

Copy link
Contributor Author

@rogancarr rogancarr Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. #Closed

gtChild.Insert(0, right);
return --internalNodeId;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line #Closed

Copy link
Contributor Author

@rogancarr rogancarr Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done #Closed

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@TomFinley
Copy link
Contributor

Hi @rogancarr I'm having trouble getting started reading this, since I can't quite find where the predictors have been changed to use generics?

@rogancarr rogancarr changed the title Refactor GAM Predictor to be Generic Refactor GAM Predictor to be Public-ly Creatable Jan 17, 2019
@rogancarr
Copy link
Contributor Author

Hi @rogancarr I'm having trouble getting started reading this, since I can't quite find where the predictors have been changed to use generics?

Hi @TomFinley! Apologies for the confusion. I renamed this PR to make more sense. This PR changes the GAM predictor s.t. it can be created with the output of any GAM training (with binned results). This is part of the move to create public constructors for our ModelParameters classes.

It does not have anything to do with generics in the programming language sense.

Copy link
Contributor

@artidoro artidoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@rogancarr rogancarr merged commit 240faeb into dotnet:master Jan 22, 2019
@rogancarr rogancarr deleted the gam_generic_modelparameters branch January 22, 2019 21:21
@dotnet dotnet locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Public constructors for GamModelParameters
5 participants