Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated xml docs for tree-based trainers. #2970

Merged
merged 2 commits into from Mar 15, 2019

Conversation

Projects
None yet
5 participants
@shmoradims
Copy link
Contributor

shmoradims commented Mar 15, 2019

Updated XML documentation for tree-based trainers (FastTree, FastForest, GAM, etc). Related to #2522.

Samples to come in a separate PR.

@shmoradims shmoradims requested review from sfilipi , rogancarr and zeahmed and removed request for sfilipi Mar 15, 2019

@@ -24,7 +24,7 @@
The output of the ensemble produced by MART on a given instance is the sum of the tree outputs.
</para>
<list type='bullet'>
<item><description>In case of a binary classification problem, the output is converted to a probability by using some form of calibration.</description></item>

This comment has been minimized.

@zeahmed

zeahmed Mar 15, 2019

Member

case [](start = 32, length = 4)

is the deletion of word case by mistake? #Resolved

This comment has been minimized.

@shmoradims

shmoradims Mar 15, 2019

Author Contributor

good catch


In reply to: 265813438 [](ancestors = 265813438)

This learner is a generalization of Poisson, compound Poisson, and gamma regression.
</summary>
<!--
The following text describes the FastForest algorithm details.

This comment has been minimized.

@zeahmed

zeahmed Mar 15, 2019

Member

FastForest [](start = 37, length = 10)

GAMs??? #Resolved

This comment has been minimized.

@shmoradims

shmoradims Mar 15, 2019

Author Contributor

right, copy-paste error


In reply to: 265813553 [](ancestors = 265813553)

@@ -594,7 +643,7 @@ public abstract class BoostedTreeOptions : TreeOptions
public bool BestStepRankingRegressionTrees = false;

/// <summary>
/// Should we use line search for a step size.
/// Determines whether to we use line search for a step size.

This comment has been minimized.

@zeahmed

zeahmed Mar 15, 2019

Member

to we use [](start = 31, length = 9)

to we use -> to use #Resolved

This comment has been minimized.

@shmoradims

shmoradims Mar 15, 2019

Author Contributor

fixed


In reply to: 265814516 [](ancestors = 265814516)

@@ -611,11 +660,17 @@ public abstract class BoostedTreeOptions : TreeOptions
[Argument(ArgumentType.LastOccurenceWins, HelpText = "Minimum line search step size", ShortName = "minstep")]
public Double MinimumStepSize;

/// <summary>
/// The type of optimizer algorithm for setting <see cref="OptimizationAlgorithm"/>.

This comment has been minimized.

@zeahmed

zeahmed Mar 15, 2019

Member

The type of optimizer algorithm for setting . [](start = 12, length = 80)

I think only Types of optimization algorithms. will make more sense here. #Resolved

@codecov

This comment has been minimized.

Copy link

codecov bot commented Mar 15, 2019

Codecov Report

Merging #2970 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2970      +/-   ##
==========================================
+ Coverage   72.29%   72.29%   +<.01%     
==========================================
  Files         796      796              
  Lines      142349   142349              
  Branches    16051    16051              
==========================================
+ Hits       102905   102909       +4     
+ Misses      35063    35061       -2     
+ Partials     4381     4379       -2
Flag Coverage Δ
#Debug 72.29% <100%> (ø) ⬆️
#production 68.01% <100%> (ø) ⬆️
#test 88.48% <ø> (ø) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.FastTree/GamClassification.cs 89% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/FastTreeArguments.cs 85.38% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/FastTreeRegression.cs 54.5% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/TreeTrainersCatalog.cs 94.18% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/FastTreeRanking.cs 48.19% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/FastTreeTweedie.cs 56.29% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/GamTrainer.cs 90.38% <ø> (ø) ⬆️
...rc/Microsoft.ML.FastTree/FastTreeClassification.cs 78.19% <ø> (ø) ⬆️
...rc/Microsoft.ML.FastTree/RandomForestRegression.cs 59.9% <ø> (ø) ⬆️
src/Microsoft.ML.FastTree/GamRegression.cs 89.09% <ø> (ø) ⬆️
... and 5 more
@zeahmed
Copy link
Member

zeahmed left a comment

Overall it LGTM. I have left a few comments. Hope you will address those before merging.

@singlis

This comment has been minimized.

Copy link
Member

singlis commented Mar 15, 2019

/// The base class for all unsupervised learner inputs that support a weight column.

is it worth changing all "learner" instances to trainer in this file?
#Resolved


Refers to: src/Microsoft.ML.Data/Training/TrainerInputBase.cs:85 in 0966dca. [](commit_id = 0966dca, deletion_comment = False)

namespace Microsoft.ML.Samples.Dynamic.Trainers.Regression
{
public static class FastTree
{

This comment has been minimized.

@singlis

singlis Mar 15, 2019

Member

{ [](start = 4, length = 1)

So do we need a Binary and Ranking examples for FastTree? #Resolved

This comment has been minimized.

@shmoradims

shmoradims Mar 15, 2019

Author Contributor

all the samples will come in my next PR. I added this one as template for discussion. please see my email about in-memory samples.


In reply to: 265818226 [](ancestors = 265818226)

@singlis

This comment has been minimized.

Copy link
Member

singlis commented Mar 15, 2019

    /// <param name="featureToInputMap">A map from the feature shape functions (as described by the binUpperBounds and BinEffects)

see crefs for binUpperBounds and binEffects. I know this is internal -- but it would bound to the variable name if they ever change. #Resolved


Refers to: src/Microsoft.ML.FastTree/GamClassification.cs:189 in 0966dca. [](commit_id = 0966dca, deletion_comment = False)

@singlis
Copy link
Member

singlis left a comment

:shipit:

@shmoradims

This comment has been minimized.

Copy link
Contributor Author

shmoradims commented Mar 15, 2019

    /// <param name="featureToInputMap">A map from the feature shape functions (as described by the binUpperBounds and BinEffects)

done


In reply to: 473121003 [](ancestors = 473121003)


Refers to: src/Microsoft.ML.FastTree/GamClassification.cs:189 in 0966dca. [](commit_id = 0966dca, deletion_comment = False)

@shmoradims

This comment has been minimized.

Copy link
Contributor Author

shmoradims commented Mar 15, 2019

/// The base class for all unsupervised learner inputs that support a weight column.

done


In reply to: 473118846 [](ancestors = 473118846)


Refers to: src/Microsoft.ML.Data/Training/TrainerInputBase.cs:85 in 0966dca. [](commit_id = 0966dca, deletion_comment = False)

var pipeline = mlContext.BinaryClassification.Trainers.FastTree();

// Train the model.
var model = pipeline.Fit(data);

This comment has been minimized.

@wschin

wschin Mar 15, 2019

Member

This is minimal version without prediction. I personally like to see in-memory prediction, which is what will happen immediately in production.

Why in-memory prediction is important?
(1) User have no idea about the IDataView produced by the train model. If we don't tell them how to extract data into C# data structure, they will have to look for tutorials of IDataVIew, ITransformer, IDataView-C# bridge.
(2) Prediction format varies from different models and are ML.NET-specific, so it's also hard to figure out which one should be used.
(3) Prediction is how the trained model will be used. One might think scikit-learn doesn't do so, so we shouldn't. My suggestion is we should! Here is my reason ---- scikit-learn produces numpy data structures and everyone know how to manipulate them (by Googling for Numpy), but IDataView is not at that stage yet.

}
}

private class DataPoint

This comment has been minimized.

@wschin

wschin Mar 15, 2019

Member

This is really good! An example with 50 features!

@shmoradims shmoradims merged commit b6f94bc into dotnet:master Mar 15, 2019

2 of 3 checks passed

MachineLearning-CodeCoverage #20190315.1 failed
Details
MachineLearning-CI #20190315.1 succeeded
Details
license/cla All CLA requirements met.
Details

@shauheen shauheen added this to the 0319 milestone Mar 15, 2019

@shmoradims shmoradims deleted the shmoradims:tree_docs4 branch Mar 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.