New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor GAM Predictor to be Public-ly Creatable #2142
Refactor GAM Predictor to be Public-ly Creatable #2142
Conversation
|
||
namespace Microsoft.ML.Trainers.FastTree | ||
{ | ||
public abstract class GamModelParametersBase : ModelParametersBase<float>, IValueMapper, ICalculateFeatureContribution, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 4, length = 6)
a line of XML in anything public..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
verReadableCur: 0x00010001, | ||
// verWrittenCur: 0x00010001, // Initial | ||
// verWrittenCur: 0x00010001, // Added Intercept but collided from release 0.6-0.9 | ||
verWrittenCur: 0x00020001, // Added Intercept (version revved to address collisions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0x00020001 [](start = 31, length = 10)
I think in general we upgrade version by last digit, i.e. 0x0010002
instead of 0x00020001
#Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. #Closed
internal BinaryClassificationGamModelParameters(IHostEnvironment env, int inputLength, Dataset trainset, | ||
double meanEffect, double[][] binEffects, int[] featureMap) | ||
: base(env, LoaderSignature, inputLength, trainset, meanEffect, binEffects, featureMap) { } | ||
public BinaryClassificationGamModelParameters(IHostEnvironment env, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BinaryClassificationGamModelParameters [](start = 15, length = 38)
Since you are making this public, could you also add XML documentation of what it does and what the parameters are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/// For Generalized Additive Models (GAM), the contribution of a feature is equal to the shape function for the given feature evaluated at | ||
/// the feature value. | ||
/// </summary> | ||
public FeatureContributionCalculator FeatureContributionClaculator => new FeatureContributionCalculator(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FeatureContributionCalculator [](start = 15, length = 29)
this class has one internal property and one internal method.
I advice to make it internal instead of public (as class in general and as property of Gam predictor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what you mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This thing gonna be visible to any user. In same time if you look on content of this class, it has only internal objects.
Basically during debugging user will see FeatureContributionCalculator, but it will have zero public methods and properties.
We have that thing for a reason, but I doubt it should be public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@artidoro Does this method need to be public
to work?
/// <returns>The bin upper bounds. May be null if this feature has no bins.</returns> | ||
public double[] GetFeatureBinUpperBounds(int featureIndex) | ||
{ | ||
Contracts.Assert(0 <= featureIndex && featureIndex < _numFeatures); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert [](start = 22, length = 6)
use Check for public methods.
Also give it proper message. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. #Closed
private readonly double[][] _binUpperBounds; | ||
private readonly double[][] _binEffects; | ||
public readonly double Intercept; | ||
private readonly int _numFeatures; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_numFeatures [](start = 29, length = 12)
I don't see any way for user to access this, but in same time it's used as constrain in public method.
Can we expose it with proper description? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. #Closed
/// <returns>The binned weights for each feature.</returns> | ||
public double[] GetFeatureWeights(int featureIndex) | ||
{ | ||
Contracts.Assert(0 <= featureIndex && featureIndex < _numFeatures); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert [](start = 22, length = 6)
check and proper message. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/// </summary> | ||
/// <param name="featureIndex">The index of the feature (in the training vector) to get.</param> | ||
/// <returns>The bin upper bounds. May be null if this feature has no bins.</returns> | ||
public double[] GetFeatureBinUpperBounds(int featureIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetFeatureBinUpperBounds [](start = 24, length = 24)
Is there a reason why we don't want to return the upper bounds for all features with an array of arrays?
Same question for the binned weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No real reason. Let's add this as a separate issue.
double[] featureWeights; | ||
if (_inputFeatureToDatasetFeatureMap.TryGetValue(featureIndex, out int j)) | ||
{ | ||
featureWeights = new double[_binUpperBounds[j].Length]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_binUpperBounds [](start = 44, length = 15)
Just checking that this is the same length as _binEffects[j]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Should be the same and checked when defined, but not good practice. Changing to use _binEffects[j].Length.
} | ||
|
||
/// <summary> | ||
/// The GAM model visualization command. Because the data access commands must access private members of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must access private members [](start = 82, length = 27)
I think it's a sign what we haven't properly design predictor.
From what I see it needs:
_inputType.ValueCount
which is same as _numFeatures
which I think we need to expose.
_inputFeatureToDatasetFeatureMap
, _binEffects
and _featureMap
First two I think you use in GetFeatureBinUpperBounds
and it can probably be rewritten. last one can be made as internal.
I found it's strange what we make command line thing which has more functionality than user can achieve by himself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point of confusion. Currently, there can be fewer shape functions built than input features because of flocking (I think this is a bug and will file one separately.)
_inputType.ValueCount
is different than _numFeatures
. I've rename numFeatures
to NumShapeFunctions
to clarify this.
inputFeaturesToDatasetFeatureMap
has been renamed to _inputFeatureToShapeFunctionMap
featureMap
has been renamed to shapeToInputMap
. (and is private
).
I found it's strange what we make command line thing which has more functionality than user can achieve by himself.
You mean the visualization command? With the new public interfaces, the user should be able to hand-edit models and create versions with the modified shapes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_inputType.ValueCount is different than _numFeatures.
If featureToInputMap
is null, I think they should be same, otherwise it's a size of featureToInputMap
. Sorry, didn't properly check constructor of predictor.
You mean the visualization command? With the new public interfaces, the user should be able to hand-edit models and create versions with the modified shapes.
I just find it strange what we have nested internal command in predictor. And only reason why it's nested is because we using some internal guts of predictor. I understand what you can do hand-edits, but can user visualize model same way how our visualize command work? If not, it's probably sign what we didn't expose enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you are saying. The visualization command is for an internal tool that may not be released. I am not sure what the plan for visualization in the .NET world is, but for now we are "visualizing" by pretty-printing, as we do in the docs. Obviously not as nice as fully interactive visualizations!
/// <returns>The bin upper bounds. May be null if this feature has no bins.</returns> | ||
public double[] GetFeatureBinUpperBounds(int featureIndex) | ||
{ | ||
Contracts.Assert(0 <= featureIndex && featureIndex < _numFeatures); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contracts [](start = 12, length = 9)
You have Host. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. #Closed
/// <returns>The binned weights for each feature.</returns> | ||
public double[] GetFeatureWeights(int featureIndex) | ||
{ | ||
Contracts.Assert(0 <= featureIndex && featureIndex < _numFeatures); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contracts [](start = 12, length = 9)
Please use Host. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. #Closed
|
||
private double GetBinEffect(int featureIndex, double featureValue) | ||
{ | ||
Contracts.Assert(0 <= featureIndex && featureIndex < _numFeatures); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contracts [](start = 12, length = 9)
Please change Contracts
to Host
where it possible. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. #Closed
gtChild.Insert(0, right); | ||
return --internalNodeId; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new line #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done #Closed
add0cc8
to
1f65d26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…ion into the trainer, and make the GAM predictor generic.
…ing classes; fixing a serialization error; fixing an error in the sparsity calculation.
1f65d26
to
4179af2
Compare
Hi @rogancarr I'm having trouble getting started reading this, since I can't quite find where the predictors have been changed to use generics? |
Hi @TomFinley! Apologies for the confusion. I renamed this PR to make more sense. This PR changes the GAM predictor s.t. it can be created with the output of any GAM training (with binned results). This is part of the move to create public constructors for our It does not have anything to do with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR refactors the GAM trainer and predictor such that all training information remains in the trainer, and the GAM predictor is for any generic (binned) GAM.
Fixes #1948
Update: Two small bug fixes included in this PR