Skip to content

Conversation

@shmoradims
Copy link

@shmoradims shmoradims commented Dec 19, 2018

Add support for saving GAM models as ini format. Parity results with maml are included in the tests.

Shahab Moradi added 14 commits December 18, 2018 07:44
OVERALL RESULTS
---------------------------------------
L1(avg):            0.090945 (0.0000)
L2(avg):            0.024789 (0.0000)
RMS(avg):           0.157445 (0.0000)
Loss-fn(avg):       0.024789 (0.0000)
R Squared:          0.891027 (0.0000)
Model creation:
F:\Git\Tlc3\bin\Debug>maml.exe ini ini=F:\temp\ini\model1.ini out=F:\temp\ini\model1.ini.zip kind=Regression data=F:\temp\ini\data\breast-cancer-noNan.txt  loader=TextLoader{col=Label:R4:0 col=F1:R4:1 col=F3:R4:3 col=F6:R4:6 col=F7:R4:7 col
=F9:R4:9} xf=Concat{col=Features:F1,F9,F7,F6}

Parity Results:
OVERALL RESULTS
---------------------------------------
L1(avg):            0.120008 (0.0000)
L2(avg):            0.036569 (0.0000)
RMS(avg):           0.191230 (0.0000)
Loss-fn(avg):       0.036569 (0.0000)
R Squared:          0.839242 (0.0000)
OVERALL RESULTS
---------------------------------------
L1(avg):            0.120008 (0.0000)
L2(avg):            0.036569 (0.0000)
RMS(avg):           0.191230 (0.0000)
Loss-fn(avg):       0.036569 (0.0000)
R Squared:          0.839242 (0.0000)
// OVERALL RESULTS
// ---------------------------------------
// L1(avg):            0.093257 (0.0000)
// L2(avg):            0.025707 (0.0000)
// RMS(avg):           0.160336 (0.0000)
// Loss-fn(avg):       0.025707 (0.0000)
// R Squared:          0.886203 (0.0000)
OVERALL RESULTS
---------------------------------------
AUC:                0.995452 (0.0000)
Accuracy:           0.969957 (0.0000)
Positive precision: 0.950820 (0.0000)
Positive recall:    0.962656 (0.0000)
Negative precision: 0.980220 (0.0000)
Negative recall:    0.973799 (0.0000)
Log-loss:           0.115940 (0.0000)
Log-loss reduction: 87.524160 (0.0000)
F1 Score:           0.956701 (0.0000)
AUPRC:              0.989809 (0.0000)
public Dictionary<string, object> KeyValues { get; }
}
}
internal static class FastTreeIniFormatUtils
Copy link
Contributor

@rogancarr rogancarr Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this as a separate file in FastTree/Utils? I'm not sure we need it in FastTree.cs #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to utils


In reply to: 243139101 [](ancestors = 243139101)

void ICanSaveInIniFormat.SaveAsIni(TextWriter writer, RoleMappedSchema schema, ICalibrator calibrator)
{
Host.CheckValue(writer, nameof(writer));
Host.CheckValue(schema, nameof(schema));
Copy link
Contributor

@rogancarr rogancarr Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to check here as you don't use it in this function. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.


In reply to: 243139197 [](ancestors = 243139197)

{
Host.CheckValue(writer, nameof(writer));
Host.CheckValue(schema, nameof(schema));
Host.CheckValueOrNull(calibrator);
Copy link
Contributor

@rogancarr rogancarr Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to check here, as you don't use it in this function. (And you check below.) #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed


In reply to: 243139235 [](ancestors = 243139235)

{
host.CheckValue(ensemble, nameof(ensemble));
host.CheckValue(schema, nameof(schema));
host.CheckValueOrNull(calibrator);
Copy link
Contributor

@rogancarr rogancarr Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to check here, as you don't use it here. And you check in the accessory function. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.


In reply to: 243139398 [](ancestors = 243139398)

public void SaveAsIni(TextWriter writer, RoleMappedSchema schema, ICalibrator calibrator = null)
{
Host.CheckValue(writer, nameof(writer));
Host.CheckValue(schema, nameof(schema));
Copy link
Contributor

@rogancarr rogancarr Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to check here, as you don't use it, and it is asserted when it is. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed


In reply to: 243139968 [](ancestors = 243139968)

{
Host.CheckValue(writer, nameof(writer));
Host.CheckValue(schema, nameof(schema));
Host.CheckValueOrNull(calibrator);
Copy link
Contributor

@rogancarr rogancarr Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to check here, as you don't use it, and it is asserted when it is. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.


In reply to: 243139976 [](ancestors = 243139976)

// GAM bins should be converted to balanced trees / binary search trees
// so that scoring takes O(log(n)) instead of O(n). The following utility
// creates a balanced tree.
private (float[], int[], int[]) CreateBalancedTree(int numInternalNodes, double[] binThresholds)
Copy link
Contributor

@rogancarr rogancarr Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does returning a list limit what C# or .NET versions we can target? If so, we can switch this to an out pattern. #ByDesign

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuples are heavily used as return type in Static API, so using it here doesn't create more version restrictions than what we already have in the rest of the code. I think this way it fine.


In reply to: 243140927 [](ancestors = 243140927)


// Tried adding the intercept as the bias term in the final ini aggregator,
// but that didn't seem to have any effects during testing.
// Adding the intercept as a dummy tree with the output values being the model intercept,
Copy link
Contributor

@rogancarr rogancarr Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good comment for review, but for the final check-in, just keep this line. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the top 2 lines of comments.


In reply to: 243141367 [](ancestors = 243141367)


// Getting parity results from maml.exe:
// maml.exe ini ini=model.ini out=model_ini.zip data=breast-cancer.txt loader=TextLoader{col=Label:R4:0 col=Features:R4:1-9} xf=NAHandleTransform{col=Features slot=- ind=-} kind=Regression
Assert.Equal(0.093256807643323947, results.L1);
Copy link
Contributor

@rogancarr rogancarr Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this test. On one hand, it's good to make sure that it can produce the score. On the other hand, it's a new baseline test for GAMs. I'd say remove it from ML.NET. #ByDesign

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, do you want me to remove both TestGamRegressionIni and TestGamBinaryClassificationIni tests?


In reply to: 243141824 [](ancestors = 243141824)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need a test that checks that the new Ini file produces the same results as the GAM. It would be ideal if you could compare the the outputs with a full file compare instead of using Assert.Equal. Is it possible to do that?


In reply to: 244387146 [](ancestors = 244387146,243141824)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's better to keep these parity tests and also prevent regressions cause by refactoring in fasttree code.

The problem with ini file comparison is the same as testing baseline comparison, which needs to extract numbers in the file and compare them with some tolerance, because verbatim comparison will fail due to changes in the higher decimal places. We already have those facilities but they're fitted for train-test / CV type tasks that are common in ML. I didn't want to modify those files for this one-off ini-file comparison. Also half of this testing is done in TLC, b/c the ini parsing code is only in TLC. The baseline metrics actually come from TLC.

More importantly, the ini-format is not an open-source format and is Microsoft IP. If we want to check-in an ini file for baseline, we should have LCA clear that.

For those reasons, I think this simple test is good enough for now.


In reply to: 244822621 [](ancestors = 244822621,244387146,243141824)

Copy link
Contributor

@rogancarr rogancarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with comments.

{
// Base case: we've reached a leaf node
Host.Assert(lower == upper + 1);
return ~lower;
Copy link
Member

@singlis singlis Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why negate lower? #Resolved

Copy link
Author

@shmoradims shmoradims Dec 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The leaf values are negated in the ini format (to distinguish from the internal nodes). The base case happens when we reach leaf nodes.

I didn't want to add too much comments about the ini format details, because frankly the rest of the code is the only documentation we have for that format.


In reply to: 243443992 [](ancestors = 243443992)

// This is postorder traversal algorithm and populating the internalNodeIndices/lte/gt lists in reverse.
// Preorder is the only option, because we need the results of both left/right recursions for populating the lists.
// As a result, lists are populated in reverse, because the root node should be the first item on the lists.
var mid = (lower + upper) / 2;
Copy link
Member

@singlis singlis Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this is good information - but for me its missing a key point, which is you are building a balanced tree by dividing the array in half, then taking each half and dividing that half where each midway point becomes a node in the tree.
#Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I had mentioned binary-search-tree in the function above, which is exactly the key point you mentioned. I'll re-mention it here.


In reply to: 243444717 [](ancestors = 243444717)

var right = CreateBalancedTreeRecursive(
mid + 1, upper, internalNodeIndices, lteChild, gtChild, ref internalNodeId);
internalNodeIndices.Insert(0, mid);
lteChild.Insert(0, left);
Copy link
Contributor

@rogancarr rogancarr Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like we have one depth too many. Won't the final Right and Left children be the same? #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RegressionTree has internal nodes and leaf nodes. That extra depth ends up in the leaf nodes which ends the recursion. The final left and right nodes will be two leaf nodes, OR 1 leaf node and 1 internal node (which will subsequently end up with 2 leaf nodes).


In reply to: 243455772 [](ancestors = 243455772)

@artidoro
Copy link
Contributor

artidoro commented Jan 2, 2019

using System;

Can you add the license declaration? #Resolved


Refers to: src/Microsoft.ML.FastTree/Utils/FastTreeIniFileUtils.cs:1 in 8022e33. [](commit_id = 8022e33, deletion_comment = False)

// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using System;
Copy link
Contributor

@artidoro artidoro Jan 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can add back the line after the declaration it would be better. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


In reply to: 244804207 [](ancestors = 244804207)

@shmoradims
Copy link
Author

using System;

done


In reply to: 450932989 [](ancestors = 450932989)


Refers to: src/Microsoft.ML.FastTree/Utils/FastTreeIniFileUtils.cs:1 in 8022e33. [](commit_id = 8022e33, deletion_comment = False)

Copy link
Contributor

@artidoro artidoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@shmoradims shmoradims merged commit f5a029f into dotnet:master Jan 2, 2019
@shmoradims shmoradims deleted the gam_ini branch March 11, 2019 20:53
@ghost ghost locked as resolved and limited conversation to collaborators Mar 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants