-
Notifications
You must be signed in to change notification settings - Fork 1.9k
making GetCoefficientStatistics public #1979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| /// </summary> | ||
| public readonly struct CoefficientStatistics | ||
| { | ||
| public readonly string Name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public readonly string Name; [](start = 8, length = 28)
As discussed, if this is to be part of our public API we should probably as part of this PR make this entire structure a bit less goofy. If I were to trace back the root of what makes this structure a confusing structure, I think this choice here -- having the name of the slot instead of just the slot index, is the original sin. (There are lots of other really strange choices, but I think much of the evil and confusion stems from here.)
This should have just been The fact that this is a Nameinstead of just aSlotIndex` is the cause for an enormous amount of confusion in this structure. This requires that we link it with the schema column for features, but from an API perspective this is just something that could have easily (in fact, more easily) be done by the user themselves. As it is, the structure is offering "convenience" but in a way that inevitably leads to a more complicated API.
Once you do that, restructuring the odd API below (which I'll comment more on in the test file) should become more approachable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not to say that it might be convenient to have an internal utility method that makes saving as text more convenient for the LinearModelParameters and whatnot, but confusing the needs of the text exporter (which is not part of our public API) with what we actually do in the public API was a misstep. (Not your misstep to be clear, since this code has existed, I think, more or less forever, but still something we ought to correct now that we are making it part of the publci API.)
In reply to: 244374085 [](ancestors = 244374085)
| var linearModel = transformerChain.LastTransformer.Model.SubPredictor as LinearBinaryModelParameters; | ||
| var linearModel = transformer.LastTransformer.Model.SubPredictor as LinearBinaryModelParameters; | ||
| var stats = linearModel.Statistics; | ||
| LinearModelStatistics.TryGetBiasStatistics(stats, 2, out float stdError, out float zScore, out float pValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LinearModelStatistics.TryGetBiasStatistics(stats, 2, out float stdError, out float zScore, out float pValue); [](start = 12, length = 109)
This is the first thing that does not make much sense to me. It seems like linearModel.Statistics should offer similar accessors through an instance method. (Why out parameters? Why not just properties on the original structure?) This is just really complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
As Tom pointed out, the API for getting statistics needs some cleanup. linearModel.Statistics or some other simple 1-liner should be the only code a user needs.
In reply to: 244374354 [](ancestors = 244374354)
|
|
||
| var scoredData = transformer.Transform(dataView); | ||
|
|
||
| var coeffcients = stats.GetCoefficientStatistics(linearModel, scoredData.Schema["Features"], 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stats.GetCoefficientStatistics(linearModel, scoredData.Schema["Features"], 100); [](start = 31, length = 80)
This is the second thing that does not make sense to me. Again should just be an instance method. As discussed also, the complication that this code itself is responsible for extracting out the slot names was a mistake. Anticipating some people's reaction, if the reaction from some people would be "well it's more convenient," then the correct remediation is to just make slot names easier to access (somehow), definitely not to "solve" the problem here as is done.
|
|
||
| var coeffcients = stats.GetCoefficientStatistics(linearModel, scoredData.Schema["Features"], 100); | ||
|
|
||
| Assert.Equal(19, coeffcients.Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coeffcients [](start = 29, length = 11)
Not a big deal at all since it's just a test, but while you're at it, this is a typo. coeffcients => coefficients. #Resolved
|
@TomFinley @shmoradims I am in the process of doing the suggested refactoring, and figured that i should go a bit further: break down this class in tow parts: one that gives basic stats - used by MultiClass and LR in the case when the extended cals are not required, and another class that offers the extended stats in top of the previous. This will be a bit more complex of a change, since it will require loading modes backwards etc; i am not confident that if i publish the changes today/tomorrow the next round of review will be final and also it will be a larger change to cherry pick. Are you guys ok with checking in this PR, since |
TomFinley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you guys ok with checking in this PR, since GetCoefficientStatistics needs to be public, and getting the rest of the changes on the next PR, but in 0.10?
Hmmmm. OK. 😄 But if you could write an issue to capture the problems with current public surface so someone (maybe you?) can clean up later that would make me feel better.
|
thanks Tom. Created #2010 with full description of the other issues with it. Assigned to self. |
Ivanidzo4ka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
| // The .NET Foundation licenses this file to you under the MIT license. | ||
| // See the LICENSE file in the project root for more information. | ||
|
|
||
| using System.Linq; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need Linq here?
|
|
||
| /// <summary> | ||
| /// Gets the coefficient statistics as an object. | ||
| /// </summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need better summary than this one.
| /// Gets the coefficient statistics as an object. | ||
| /// </summary> | ||
| internal CoefficientStatistics[] GetCoefficientStatistics(LinearBinaryModelParameters parent, RoleMappedSchema schema, int paramCountCap) | ||
| public CoefficientStatistics[] GetCoefficientStatistics(LinearBinaryModelParameters parent, Schema.Column featureColumn, int paramCountCap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paramCountCap [](start = 133, length = 13)
this isn't great name. Param is such a generic term, I have no idea what it's capping.
shmoradims
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
| public long TrainingExampleCount => _trainingExampleCount; | ||
|
|
||
| public Single Deviance => _deviance; | ||
| public float Deviance => _deviance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_deviance [](start = 33, length = 9)
why we need _deviance?
| public float Deviance => _deviance; | ||
|
|
||
| public Single NullDeviance => _nullDeviance; | ||
| public float NullDeviance => _nullDeviance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_nullDeviance [](start = 37, length = 13)
I don't see any reason for _nullDeviance to exist.
| public float Deviance => _deviance; | ||
|
|
||
| public Single NullDeviance => _nullDeviance; | ||
| public float NullDeviance => _nullDeviance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NullDeviance [](start = 21, length = 12)
Need proper comment
| public long TrainingExampleCount => _trainingExampleCount; | ||
|
|
||
| public Single Deviance => _deviance; | ||
| public float Deviance => _deviance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deviance [](start = 21, length = 8)
need proper comment.
| _paramCount = paramCount; | ||
| _trainingExampleCount = trainingExampleCount; | ||
| _deviance = deviance; | ||
| _nullDeviance = nullDeviance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
none of _variable from that block should exist.
Just make public readonly version of it, and use it. #Resolved
|
|
||
| var scoredData = transformer.Transform(dataView); | ||
|
|
||
| var coeffcients = stats.GetCoefficientStatistics(linearModel, scoredData.Schema["Features"], 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetCoefficientStatistics [](start = 37, length = 24)
I find it strange to have
TryGetBiasStatistics which accept LinearModelStatistics object
and GetCoefficientStatistics which are part of LinearModelStatistics object.
a) relationship to LinearModelStatistics object
b) why one is Try, and other is just Get?
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #1977 by making method GetCoefficientStatistics public.