-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Implement ICanSaveInIniFormat interface for GamPredictor #1929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
OVERALL RESULTS --------------------------------------- L1(avg): 0.090945 (0.0000) L2(avg): 0.024789 (0.0000) RMS(avg): 0.157445 (0.0000) Loss-fn(avg): 0.024789 (0.0000) R Squared: 0.891027 (0.0000)
Model creation:
F:\Git\Tlc3\bin\Debug>maml.exe ini ini=F:\temp\ini\model1.ini out=F:\temp\ini\model1.ini.zip kind=Regression data=F:\temp\ini\data\breast-cancer-noNan.txt loader=TextLoader{col=Label:R4:0 col=F1:R4:1 col=F3:R4:3 col=F6:R4:6 col=F7:R4:7 col
=F9:R4:9} xf=Concat{col=Features:F1,F9,F7,F6}
Parity Results:
OVERALL RESULTS
---------------------------------------
L1(avg): 0.120008 (0.0000)
L2(avg): 0.036569 (0.0000)
RMS(avg): 0.191230 (0.0000)
Loss-fn(avg): 0.036569 (0.0000)
R Squared: 0.839242 (0.0000)
OVERALL RESULTS --------------------------------------- L1(avg): 0.120008 (0.0000) L2(avg): 0.036569 (0.0000) RMS(avg): 0.191230 (0.0000) Loss-fn(avg): 0.036569 (0.0000) R Squared: 0.839242 (0.0000)
// OVERALL RESULTS // --------------------------------------- // L1(avg): 0.093257 (0.0000) // L2(avg): 0.025707 (0.0000) // RMS(avg): 0.160336 (0.0000) // Loss-fn(avg): 0.025707 (0.0000) // R Squared: 0.886203 (0.0000)
OVERALL RESULTS --------------------------------------- AUC: 0.995452 (0.0000) Accuracy: 0.969957 (0.0000) Positive precision: 0.950820 (0.0000) Positive recall: 0.962656 (0.0000) Negative precision: 0.980220 (0.0000) Negative recall: 0.973799 (0.0000) Log-loss: 0.115940 (0.0000) Log-loss reduction: 87.524160 (0.0000) F1 Score: 0.956701 (0.0000) AUPRC: 0.989809 (0.0000)
| public Dictionary<string, object> KeyValues { get; } | ||
| } | ||
| } | ||
| internal static class FastTreeIniFormatUtils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this as a separate file in FastTree/Utils? I'm not sure we need it in FastTree.cs #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| void ICanSaveInIniFormat.SaveAsIni(TextWriter writer, RoleMappedSchema schema, ICalibrator calibrator) | ||
| { | ||
| Host.CheckValue(writer, nameof(writer)); | ||
| Host.CheckValue(schema, nameof(schema)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to check here as you don't use it in this function. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| { | ||
| Host.CheckValue(writer, nameof(writer)); | ||
| Host.CheckValue(schema, nameof(schema)); | ||
| Host.CheckValueOrNull(calibrator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to check here, as you don't use it in this function. (And you check below.) #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| { | ||
| host.CheckValue(ensemble, nameof(ensemble)); | ||
| host.CheckValue(schema, nameof(schema)); | ||
| host.CheckValueOrNull(calibrator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to check here, as you don't use it here. And you check in the accessory function. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public void SaveAsIni(TextWriter writer, RoleMappedSchema schema, ICalibrator calibrator = null) | ||
| { | ||
| Host.CheckValue(writer, nameof(writer)); | ||
| Host.CheckValue(schema, nameof(schema)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to check here, as you don't use it, and it is asserted when it is. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| { | ||
| Host.CheckValue(writer, nameof(writer)); | ||
| Host.CheckValue(schema, nameof(schema)); | ||
| Host.CheckValueOrNull(calibrator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to check here, as you don't use it, and it is asserted when it is. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // GAM bins should be converted to balanced trees / binary search trees | ||
| // so that scoring takes O(log(n)) instead of O(n). The following utility | ||
| // creates a balanced tree. | ||
| private (float[], int[], int[]) CreateBalancedTree(int numInternalNodes, double[] binThresholds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does returning a list limit what C# or .NET versions we can target? If so, we can switch this to an out pattern. #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuples are heavily used as return type in Static API, so using it here doesn't create more version restrictions than what we already have in the rest of the code. I think this way it fine.
In reply to: 243140927 [](ancestors = 243140927)
|
|
||
| // Tried adding the intercept as the bias term in the final ini aggregator, | ||
| // but that didn't seem to have any effects during testing. | ||
| // Adding the intercept as a dummy tree with the output values being the model intercept, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good comment for review, but for the final check-in, just keep this line. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| // Getting parity results from maml.exe: | ||
| // maml.exe ini ini=model.ini out=model_ini.zip data=breast-cancer.txt loader=TextLoader{col=Label:R4:0 col=Features:R4:1-9} xf=NAHandleTransform{col=Features slot=- ind=-} kind=Regression | ||
| Assert.Equal(0.093256807643323947, results.L1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this test. On one hand, it's good to make sure that it can produce the score. On the other hand, it's a new baseline test for GAMs. I'd say remove it from ML.NET. #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, do you want me to remove both TestGamRegressionIni and TestGamBinaryClassificationIni tests?
In reply to: 243141824 [](ancestors = 243141824)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely need a test that checks that the new Ini file produces the same results as the GAM. It would be ideal if you could compare the the outputs with a full file compare instead of using Assert.Equal. Is it possible to do that?
In reply to: 244387146 [](ancestors = 244387146,243141824)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's better to keep these parity tests and also prevent regressions cause by refactoring in fasttree code.
The problem with ini file comparison is the same as testing baseline comparison, which needs to extract numbers in the file and compare them with some tolerance, because verbatim comparison will fail due to changes in the higher decimal places. We already have those facilities but they're fitted for train-test / CV type tasks that are common in ML. I didn't want to modify those files for this one-off ini-file comparison. Also half of this testing is done in TLC, b/c the ini parsing code is only in TLC. The baseline metrics actually come from TLC.
More importantly, the ini-format is not an open-source format and is Microsoft IP. If we want to check-in an ini file for baseline, we should have LCA clear that.
For those reasons, I think this simple test is good enough for now.
In reply to: 244822621 [](ancestors = 244822621,244387146,243141824)
rogancarr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with comments.
| { | ||
| // Base case: we've reached a leaf node | ||
| Host.Assert(lower == upper + 1); | ||
| return ~lower; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why negate lower? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The leaf values are negated in the ini format (to distinguish from the internal nodes). The base case happens when we reach leaf nodes.
I didn't want to add too much comments about the ini format details, because frankly the rest of the code is the only documentation we have for that format.
In reply to: 243443992 [](ancestors = 243443992)
| // This is postorder traversal algorithm and populating the internalNodeIndices/lte/gt lists in reverse. | ||
| // Preorder is the only option, because we need the results of both left/right recursions for populating the lists. | ||
| // As a result, lists are populated in reverse, because the root node should be the first item on the lists. | ||
| var mid = (lower + upper) / 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think this is good information - but for me its missing a key point, which is you are building a balanced tree by dividing the array in half, then taking each half and dividing that half where each midway point becomes a node in the tree.
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I had mentioned binary-search-tree in the function above, which is exactly the key point you mentioned. I'll re-mention it here.
In reply to: 243444717 [](ancestors = 243444717)
| var right = CreateBalancedTreeRecursive( | ||
| mid + 1, upper, internalNodeIndices, lteChild, gtChild, ref internalNodeId); | ||
| internalNodeIndices.Insert(0, mid); | ||
| lteChild.Insert(0, left); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like we have one depth too many. Won't the final Right and Left children be the same? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RegressionTree has internal nodes and leaf nodes. That extra depth ends up in the leaf nodes which ends the recursion. The final left and right nodes will be two leaf nodes, OR 1 leaf node and 1 internal node (which will subsequently end up with 2 leaf nodes).
In reply to: 243455772 [](ancestors = 243455772)
| // The .NET Foundation licenses this file to you under the MIT license. | ||
| // See the LICENSE file in the project root for more information. | ||
|
|
||
| using System; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can add back the line after the declaration it would be better. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
artidoro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
Add support for saving GAM models as ini format. Parity results with maml are included in the tests.