Skip to content

Conversation

@sfilipi
Copy link
Member

@sfilipi sfilipi commented Dec 28, 2018

Fixes #1977 by making method GetCoefficientStatistics public.

@sfilipi sfilipi self-assigned this Dec 28, 2018
@sfilipi sfilipi added API Issues pertaining the friendly API enhancement New feature or request and removed API Issues pertaining the friendly API labels Dec 28, 2018
@sfilipi sfilipi added this to the 1218 milestone Dec 28, 2018
/// </summary>
public readonly struct CoefficientStatistics
{
public readonly string Name;
Copy link
Contributor

@TomFinley TomFinley Dec 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public readonly string Name; [](start = 8, length = 28)

As discussed, if this is to be part of our public API we should probably as part of this PR make this entire structure a bit less goofy. If I were to trace back the root of what makes this structure a confusing structure, I think this choice here -- having the name of the slot instead of just the slot index, is the original sin. (There are lots of other really strange choices, but I think much of the evil and confusion stems from here.)

This should have just been The fact that this is a Nameinstead of just aSlotIndex` is the cause for an enormous amount of confusion in this structure. This requires that we link it with the schema column for features, but from an API perspective this is just something that could have easily (in fact, more easily) be done by the user themselves. As it is, the structure is offering "convenience" but in a way that inevitably leads to a more complicated API.

Once you do that, restructuring the odd API below (which I'll comment more on in the test file) should become more approachable.

Copy link
Contributor

@TomFinley TomFinley Dec 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not to say that it might be convenient to have an internal utility method that makes saving as text more convenient for the LinearModelParameters and whatnot, but confusing the needs of the text exporter (which is not part of our public API) with what we actually do in the public API was a misstep. (Not your misstep to be clear, since this code has existed, I think, more or less forever, but still something we ought to correct now that we are making it part of the publci API.)


In reply to: 244374085 [](ancestors = 244374085)

var linearModel = transformerChain.LastTransformer.Model.SubPredictor as LinearBinaryModelParameters;
var linearModel = transformer.LastTransformer.Model.SubPredictor as LinearBinaryModelParameters;
var stats = linearModel.Statistics;
LinearModelStatistics.TryGetBiasStatistics(stats, 2, out float stdError, out float zScore, out float pValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LinearModelStatistics.TryGetBiasStatistics(stats, 2, out float stdError, out float zScore, out float pValue); [](start = 12, length = 109)

This is the first thing that does not make much sense to me. It seems like linearModel.Statistics should offer similar accessors through an instance method. (Why out parameters? Why not just properties on the original structure?) This is just really complicated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

As Tom pointed out, the API for getting statistics needs some cleanup. linearModel.Statistics or some other simple 1-liner should be the only code a user needs.


In reply to: 244374354 [](ancestors = 244374354)


var scoredData = transformer.Transform(dataView);

var coeffcients = stats.GetCoefficientStatistics(linearModel, scoredData.Schema["Features"], 100);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stats.GetCoefficientStatistics(linearModel, scoredData.Schema["Features"], 100); [](start = 31, length = 80)

This is the second thing that does not make sense to me. Again should just be an instance method. As discussed also, the complication that this code itself is responsible for extracting out the slot names was a mistake. Anticipating some people's reaction, if the reaction from some people would be "well it's more convenient," then the correct remediation is to just make slot names easier to access (somehow), definitely not to "solve" the problem here as is done.


var coeffcients = stats.GetCoefficientStatistics(linearModel, scoredData.Schema["Features"], 100);

Assert.Equal(19, coeffcients.Length);
Copy link
Contributor

@TomFinley TomFinley Dec 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coeffcients [](start = 29, length = 11)

Not a big deal at all since it's just a test, but while you're at it, this is a typo. coeffcients => coefficients. #Resolved

@sfilipi
Copy link
Member Author

sfilipi commented Jan 3, 2019

@TomFinley @shmoradims I am in the process of doing the suggested refactoring, and figured that i should go a bit further: break down this class in tow parts: one that gives basic stats - used by MultiClass and LR in the case when the extended cals are not required, and another class that offers the extended stats in top of the previous.

This will be a bit more complex of a change, since it will require loading modes backwards etc; i am not confident that if i publish the changes today/tomorrow the next round of review will be final and also it will be a larger change to cherry pick.

Are you guys ok with checking in this PR, since GetCoefficientStatistics needs to be public, and getting the rest of the changes on the next PR, but in 0.10?

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you guys ok with checking in this PR, since GetCoefficientStatistics needs to be public, and getting the rest of the changes on the next PR, but in 0.10?

Hmmmm. OK. 😄 But if you could write an issue to capture the problems with current public surface so someone (maybe you?) can clean up later that would make me feel better.

@sfilipi
Copy link
Member Author

sfilipi commented Jan 3, 2019

thanks Tom. Created #2010 with full description of the other issues with it. Assigned to self.

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using System.Linq;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need Linq here?


/// <summary>
/// Gets the coefficient statistics as an object.
/// </summary>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need better summary than this one.

/// Gets the coefficient statistics as an object.
/// </summary>
internal CoefficientStatistics[] GetCoefficientStatistics(LinearBinaryModelParameters parent, RoleMappedSchema schema, int paramCountCap)
public CoefficientStatistics[] GetCoefficientStatistics(LinearBinaryModelParameters parent, Schema.Column featureColumn, int paramCountCap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paramCountCap [](start = 133, length = 13)

this isn't great name. Param is such a generic term, I have no idea what it's capping.

Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

public long TrainingExampleCount => _trainingExampleCount;

public Single Deviance => _deviance;
public float Deviance => _deviance;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_deviance [](start = 33, length = 9)

why we need _deviance?

public float Deviance => _deviance;

public Single NullDeviance => _nullDeviance;
public float NullDeviance => _nullDeviance;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_nullDeviance [](start = 37, length = 13)

I don't see any reason for _nullDeviance to exist.

public float Deviance => _deviance;

public Single NullDeviance => _nullDeviance;
public float NullDeviance => _nullDeviance;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NullDeviance [](start = 21, length = 12)

Need proper comment

public long TrainingExampleCount => _trainingExampleCount;

public Single Deviance => _deviance;
public float Deviance => _deviance;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deviance [](start = 21, length = 8)

need proper comment.

@sfilipi sfilipi merged commit 85db282 into dotnet:master Jan 3, 2019
@sfilipi sfilipi deleted the publicStats branch January 3, 2019 22:23
_paramCount = paramCount;
_trainingExampleCount = trainingExampleCount;
_deviance = deviance;
_nullDeviance = nullDeviance;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jan 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of _variable from that block should exist.
Just make public readonly version of it, and use it. #Resolved


var scoredData = transformer.Transform(dataView);

var coeffcients = stats.GetCoefficientStatistics(linearModel, scoredData.Schema["Features"], 100);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jan 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetCoefficientStatistics [](start = 37, length = 24)

I find it strange to have
TryGetBiasStatistics which accept LinearModelStatistics object
and GetCoefficientStatistics which are part of LinearModelStatistics object.
a) relationship to LinearModelStatistics object
b) why one is Try, and other is just Get?
#Resolved

Copy link
Member Author

@sfilipi sfilipi Jan 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, ugly. I am fixing all of it now, working on #2010

Markign as resolved since #2010 exists.


In reply to: 245155368 [](ancestors = 245155368)

@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants