Converting KMeans++trainer to estimator.#979
Conversation
| } | ||
|
|
||
| /// <summary> | ||
| /// FastTreeBinaryClassification TrainerEstimator test |
There was a problem hiding this comment.
FastTreeBinaryClassification [](start = 12, length = 28)
fix comment #Resolved
| return Math.Max(1, maxThreads); | ||
| } | ||
|
|
||
| private static SchemaShape.Column MakeWeightColumn(string weightColumn) |
There was a problem hiding this comment.
MakeWeightColumn [](start = 42, length = 16)
re-use from TrainerUtils #Resolved
| /// Base class for the <see cref="ISingleFeaturePredictionTransformer{TModel}"/> working on clustering tasks. | ||
| /// </summary> | ||
| /// <typeparam name="TModel">An implementation of the <see cref="IPredictorProducing{TResult}"/></typeparam> | ||
| public sealed class ClusteringPredictionTransformer<TModel> : SingleFeaturePredictionTransformerBase<TModel> |
There was a problem hiding this comment.
ClusteringPredictionTransformer [](start = 24, length = 31)
I still don't like not making those generic on the scorer as well... #Resolved
| using Microsoft.ML.Runtime.Training; | ||
| using Microsoft.ML.Runtime.Internal.Internallearn; | ||
| using Microsoft.ML.Runtime.EntryPoints; | ||
| using Microsoft.ML.Core.Data; |
There was a problem hiding this comment.
sort order #Resolved
|
|
||
| // *** Binary format *** | ||
| // <base info> | ||
| // id of string: scorer threshold column |
There was a problem hiding this comment.
// id of string: scorer threshold column [](start = 11, length = 41)
doesn't look right #Resolved
…inelearning-1 into KmeansPCAEstimators
| { | ||
| Trainer = ComponentFactoryUtils.CreateFromFunction(host => | ||
| new KMeansPlusPlusTrainer(host, new KMeansPlusPlusTrainer.Arguments() | ||
| new KMeansPlusPlusTrainer(host, "Features", advancedSettings: s=> |
There was a problem hiding this comment.
advancedSettings [](start = 68, length = 16)
I think K needs to be elevated out of 'advanced #Closed
| name = "mnistTiny28", | ||
| trainFilename = @"..\MNIST\Train-Tiny-28x28.txt", | ||
| testFilename = @"..\MNIST\Test-Tiny-28x28.txt" | ||
| trainFilename = @"Train-Tiny-28x28.txt", |
There was a problem hiding this comment.
Train [](start = 30, length = 5)
intentional? #Closed
There was a problem hiding this comment.
yes. The 'old' path is from the TLC solution structure.
In reply to: 219997779 [](ancestors = 219997779)
| advancedSettings: s => { s.InitAlgorithm = KMeansPlusPlusTrainer.InitAlgorithm.KMeansParallel; }); | ||
|
|
||
| TestEstimatorCore(pipeline, data); | ||
| } |
There was a problem hiding this comment.
} [](start = 8, length = 1)
Call Done() at the end #Closed
| /// </summary> | ||
| /// <param name="env">The private instance of <see cref="IHostEnvironment"/>.</param> | ||
| /// <param name="featureColumn">The name of the feature column.</param> | ||
| /// <param name="weightColumn">The name for the column containing the initial weight.</param> |
There was a problem hiding this comment.
initial [](start = 78, length = 7)
example weights, not initial weight #Closed
|
|
||
| Host.CheckUserArg(args.K > 0, nameof(args.K), "Must be positive"); | ||
|
|
||
| // is this even necessary, if there is only one column, for example |
There was a problem hiding this comment.
is this even necessary, if there is only one column, for example [](start = 15, length = 64)
It checks for non-emptiness of string #Closed
|
This calls for a |
| { | ||
| Host.CheckValue(args, nameof(args)); | ||
| if (args == null) | ||
| args = new Arguments(); |
There was a problem hiding this comment.
The net effect of this change is that the internal constructor is now tolerant to passing in null arguments.. Even though the constructor is internal I'd prefer to keep it clean, and maintain the invariants we have everywhere else. Note the Host.CheckValue(args, nameof(args)); check that was deleted.
If you still had the check, and just passed in new Arguments in the above constructor, that would have had no more lines of code. #Resolved
There was a problem hiding this comment.
elevating K to constructor param
…d, the KMeans extension on the clustering context and a tet for it.
| /// <param name="predictedLabel">The name of the predicted label column in <paramref name="data"/>.</param> | ||
| /// <param name="features">The name of the feature column in <paramref name="data"/>.</param> | ||
| /// <returns>The evaluation results.</returns> | ||
| public Result Evaluate(IDataView data, string label, string score, string predictedLabel, string features = null) |
There was a problem hiding this comment.
label [](start = 54, length = 5)
what is 'label' and why is it required? Or 'predictedLabel' for that matter? #Resolved
There was a problem hiding this comment.
OK, I double-checked. label is required for NMI, if it's not provided, NMI is not calculated. That makes sense. Let's make it optional.
predictedLabel is never required, as I suspected. Let's remove it.
In reply to: 220990067 [](ancestors = 220990067)
Turning the label column into an optional one.
Ongoing work to address #754