New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of value tuples in TrainTest and CrossValidation #2507

Merged
merged 3 commits into from Feb 12, 2019

Conversation

Projects
None yet
3 participants
@Ivanidzo4ka
Copy link
Member

Ivanidzo4ka commented Feb 11, 2019

Fixes #2501

@Ivanidzo4ka Ivanidzo4ka requested review from rogancarr , sfilipi and TomFinley Feb 11, 2019

}
}

public class CrossValidationResult<T> where T : class

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Feb 11, 2019

Author Member

CrossValidationResult [](start = 21, length = 21)

Summary #Closed

/// <summary>
/// Metrics for cross validation fold.
/// </summary>
public readonly T Metrics;

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Feb 11, 2019

Author Member

T [](start = 28, length = 1)

Part of me want to create interface and mark with it all Metric classes. Another part of me remembers what we try get rid of all empty interfaces.
Maybe empty base class?
Would like to hear your comments as well.

This comment has been minimized.

@rogancarr

rogancarr Feb 11, 2019

Contributor

+1 for some way of making this not generic.


In reply to: 255714622 [](ancestors = 255714622)

/// </summary>
public readonly ITransformer Model;
/// <summary>
/// <see cref="IDataView"/> for scored fold.

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Feb 11, 2019

Author Member

for scored fold [](start = 40, length = 15)

Scored test fold? Any one has good ideas? #Resolved

This comment has been minimized.

@rogancarr

rogancarr Feb 11, 2019

Contributor

"The scored hold-out set for this fold"


In reply to: 255714995 [](ancestors = 255714995)

This comment has been minimized.

@rogancarr

rogancarr Feb 11, 2019

Contributor

ScoredHoldOutSet is literal, but long.


In reply to: 255726140 [](ancestors = 255726140,255714995)

/// </summary>
public readonly int Fold;

public CrossValidationResult(ITransformer model, T metrics, IDataView scores, int fold)

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Feb 11, 2019

Author Member

CrossValidationResult [](start = 19, length = 21)

summary or internal, probably internal. #Closed

@Ivanidzo4ka

This comment has been minimized.

Copy link
Member Author

Ivanidzo4ka commented Feb 11, 2019

    public (BinaryClassificationMetrics metrics, ITransformer model, IDataView scoredTestData)[] CrossValidateNonCalibrated(

miss this one. #Closed


Refers to: src/Microsoft.ML.Data/TrainCatalog.cs:344 in 3d9b174. [](commit_id = 3d9b174, deletion_comment = False)

/// A pair of datasets, for the train and test set.
/// </summary>
public struct TrainTestData
{

This comment has been minimized.

@rogancarr

rogancarr Feb 11, 2019

Contributor

Why struct? #Resolved

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Feb 11, 2019

Author Member

Why class?
it's a small collection of immutable objects.


In reply to: 255723990 [](ancestors = 255723990)

}
}
/// <summary>
/// Results of running crossvalidation.

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Feb 11, 2019

Author Member

crossvalidation [](start = 31, length = 15)

two words #Resolved

This comment has been minimized.

@rogancarr

rogancarr Feb 11, 2019

Contributor

hyphenated


In reply to: 255724522 [](ancestors = 255724522)

/// A pair of datasets, for the train and test set.
/// </summary>
public struct TrainTestData
{

This comment has been minimized.

@rogancarr

rogancarr Feb 11, 2019

Contributor

I could see us wanting to expand this in the future to also include a validation set, so it could be prudent to keep the name a bit vague. PartitionedData? #Resolved

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Feb 11, 2019

Author Member

What would be summary for PartitionedData?
This one used as output for TrainTestSplit function, if we ever introduce TrainTestValidationSplit I would prefer to create another object for that.


In reply to: 255725077 [](ancestors = 255725077)

This comment has been minimized.

@rogancarr

rogancarr Feb 11, 2019

Contributor

Good point!


In reply to: 255725959 [](ancestors = 255725959,255725077)

}

/// <summary>
/// Results for specific cross validation fold.

This comment has been minimized.

@rogancarr

rogancarr Feb 11, 2019

Contributor

cross validation [](start = 33, length = 16)

Hyphenated #Resolved

/// Results of running crossvalidation.
/// </summary>
/// <typeparam name="T">Type of metric class</typeparam>
public class CrossValidationResult<T> where T : class

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Feb 11, 2019

Author Member

CrossValidationResult [](start = 21, length = 21)

sealed. #Resolved

/// <summary>
/// Fold number.
/// </summary>
public readonly int Fold;

This comment has been minimized.

@rogancarr

rogancarr Feb 11, 2019

Contributor

public readonly int Fold [](start = 12, length = 24)

Is this necessary, since they will be returned in an (ordered) array? And it'll be confusing if the order of the array doesn't match the fold number. #Resolved

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Feb 11, 2019

Author Member

Is it necessary ?- no, is it nice to have? - I think so. As soon as you tear them away from array, and let's say you want to report specific folds and do some stuff on them, you need to pass around fold anyway.

it'll be confusing if the order of the array doesn't match the fold number.
Not the case right now, but I would prefer to know real fold than index in array.


In reply to: 255727531 [](ancestors = 255727531)

@rogancarr

This comment has been minimized.

Copy link
Contributor

rogancarr commented Feb 11, 2019

    public static (DataView<T> trainSet, DataView<T> testSet) TrainTestSplit<T>(this TrainCatalogBase catalog,

This should return the TrainTestSplit object too.


Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:34 in 0790005. [](commit_id = 0790005, deletion_comment = False)

@rogancarr

This comment has been minimized.

Copy link
Contributor

rogancarr commented Feb 11, 2019

    /// <typeparam name="T">The tuple describing the data schema.</typeparam>

Another Tuple we will want to get rid of.


Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:23 in 0790005. [](commit_id = 0790005, deletion_comment = False)

@rogancarr

This comment has been minimized.

Copy link
Contributor

rogancarr commented Feb 11, 2019

    public static (RegressionMetrics metrics, Transformer<TInShape, TOutShape, TTransformer> model, DataView<TOutShape> scoredTestData)[] CrossValidate<TInShape, TOutShape, TTransformer>(

Tuple to Object


Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:76 in 0790005. [](commit_id = 0790005, deletion_comment = False)

@rogancarr

This comment has been minimized.

Copy link
Contributor

rogancarr commented Feb 11, 2019

    public static (MultiClassClassifierMetrics metrics, Transformer<TInShape, TOutShape, TTransformer> model, DataView<TOutShape> scoredTestData)[] CrossValidate<TInShape, TOutShape, TTransformer>(

Tuple => Object.


Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:134 in 0790005. [](commit_id = 0790005, deletion_comment = False)

@rogancarr

This comment has been minimized.

Copy link
Contributor

rogancarr commented Feb 11, 2019

    public static (BinaryClassificationMetrics metrics, Transformer<TInShape, TOutShape, TTransformer> model, DataView<TOutShape> scoredTestData)[] CrossValidateNonCalibrated<TInShape, TOutShape, TTransformer>(

Tuple to Object


Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:192 in 0790005. [](commit_id = 0790005, deletion_comment = False)

@Ivanidzo4ka

This comment has been minimized.

Copy link
Member Author

Ivanidzo4ka commented Feb 11, 2019

    /// <typeparam name="T">The tuple describing the data schema.</typeparam>

If I correctly understand @TomFinley, he is fine with tuples in StaticPipe project. Can be wrong. Would prefer to hear words of wisdom from him.


In reply to: 462528351 [](ancestors = 462528351)


Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:23 in 0790005. [](commit_id = 0790005, deletion_comment = False)

@rogancarr

This comment has been minimized.

Copy link
Contributor

rogancarr commented Feb 11, 2019

    public static (CalibratedBinaryClassificationMetrics metrics, Transformer<TInShape, TOutShape, TTransformer> model, DataView<TOutShape> scoredTestData)[] CrossValidate<TInShape, TOutShape, TTransformer>(

Tuple to Object


Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:250 in 0790005. [](commit_id = 0790005, deletion_comment = False)

@rogancarr
Copy link
Contributor

rogancarr left a comment

Overall, looks good! Just a few comments.

Question: Does this compile? There are quite a few samples that use this API that you haven't bumped yet.

@codecov

This comment has been minimized.

Copy link

codecov bot commented Feb 12, 2019

Codecov Report

Merging #2507 into master will increase coverage by <.01%.
The diff coverage is 76.81%.

@@            Coverage Diff             @@
##           master    #2507      +/-   ##
==========================================
+ Coverage   71.24%   71.24%   +<.01%     
==========================================
  Files         798      798              
  Lines      141231   141252      +21     
  Branches    16112    16112              
==========================================
+ Hits       100623   100641      +18     
- Misses      36142    36145       +3     
  Partials     4466     4466
Flag Coverage Δ
#Debug 71.24% <76.81%> (ø) ⬆️
#production 67.57% <66.66%> (ø) ⬆️
#test 85.35% <100%> (ø) ⬆️
@rogancarr

This comment has been minimized.

Copy link
Contributor

rogancarr commented Feb 12, 2019

    /// <typeparam name="T">The tuple describing the data schema.</typeparam>

Got it. No worries, if that's the case.


In reply to: 462529091 [](ancestors = 462529091,462528351)


Refers to: src/Microsoft.ML.StaticPipe/TrainingStaticExtensions.cs:23 in 0790005. [](commit_id = 0790005, deletion_comment = False)

@rogancarr
Copy link
Contributor

rogancarr left a comment

:shipit:

@Ivanidzo4ka Ivanidzo4ka requested a review from yaeldekel Feb 12, 2019

@@ -263,13 +344,14 @@ internal BinaryClassificationTrainers(BinaryClassificationCatalog catalog)
/// If the <paramref name="stratificationColumn"/> is not provided, the random numbers generated to create it, will use this seed as value.
/// And if it is not provided, the default value will be used.</param>
/// <returns>Per-fold results: metrics, models, scored datasets.</returns>

This comment has been minimized.

@sfilipi

sfilipi Feb 12, 2019

Member

Per-fold results [](start = 21, length = 16)

maybe<see cref="CrossValidationResult"

@sfilipi
Copy link
Member

sfilipi left a comment

:shipit:

@Ivanidzo4ka Ivanidzo4ka merged commit 56607ba into dotnet:master Feb 12, 2019

3 checks passed

MachineLearning-CI #20190211.18 succeeded
Details
MachineLearning-CodeCoverage #20190211.18 succeeded
Details
license/cla All CLA requirements met.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment