Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning TrainCatalog and RecommenderCatalog #2973

Merged
merged 5 commits into from Mar 19, 2019

Conversation

@artidoro
Copy link
Contributor

artidoro commented Mar 15, 2019

Fixes #2972.

In this PR I clean TrainCatalog (commit 1) and RecommenderCatalog (commit 2).
The other commits are simply the adjustments in the code that need to be done to compile.

@artidoro artidoro added the api label Mar 15, 2019

@artidoro artidoro added this to the 0319 milestone Mar 15, 2019

@artidoro artidoro self-assigned this Mar 15, 2019

@artidoro artidoro requested review from Ivanidzo4ka , wschin , abgoswam and sfilipi Mar 15, 2019

@codecov

This comment has been minimized.

Copy link

codecov bot commented Mar 15, 2019

Codecov Report

Merging #2973 into master will increase coverage by 0.36%.
The diff coverage is 76.81%.

@@            Coverage Diff             @@
##           master    #2973      +/-   ##
==========================================
+ Coverage   72.35%   72.71%   +0.36%     
==========================================
  Files         803      803              
  Lines      143290   145370    +2080     
  Branches    16155    16766     +611     
==========================================
+ Hits       103673   105710    +2037     
- Misses      35191    35231      +40     
- Partials     4426     4429       +3
Flag Coverage Δ
#Debug 72.71% <76.81%> (+0.36%) ⬆️
#production 68.43% <71.42%> (+0.36%) ⬆️
#test 88.8% <100%> (+0.28%) ⬆️
Impacted Files Coverage Δ
...cenariosWithDirectInstantiation/TensorflowTests.cs 90.81% <100%> (ø) ⬆️
test/Microsoft.ML.Tests/Scenarios/Api/TestApi.cs 97.61% <100%> (ø) ⬆️
test/Microsoft.ML.Functional.Tests/Validation.cs 100% <100%> (ø) ⬆️
...ests/TrainerEstimators/MatrixFactorizationTests.cs 97.17% <100%> (+0.25%) ⬆️
...s/Api/CookbookSamples/CookbookSamplesDynamicApi.cs 93.12% <100%> (-0.41%) ⬇️
...soft.ML.Data/DataLoadSave/DataOperationsCatalog.cs 71.11% <100%> (-2.13%) ⬇️
test/Microsoft.ML.Functional.Tests/Evaluation.cs 100% <100%> (ø) ⬆️
src/Microsoft.ML.Recommender/RecommenderCatalog.cs 70.83% <42.85%> (ø) ⬆️
src/Microsoft.ML.Data/TrainCatalog.cs 84.11% <72.72%> (ø) ⬆️
...rosoft.ML.Data/DataLoadSave/CompositeDataLoader.cs 73.23% <0%> (-16.35%) ⬇️
... and 23 more
@abgoswam
Copy link
Member

abgoswam left a comment

:shipit:

IDataView data, IEstimator<ITransformer> estimator, int numFolds = 5, string labelColumn = DefaultColumnNames.Label,
string stratificationColumn = null, int? seed = null)
IDataView data, IEstimator<ITransformer> estimator, int numberOfFolds = 5, string labelColumnName = DefaultColumnNames.Label,
string samplingKeyColumn = null, int? seed = null)

This comment has been minimized.

@wschin

wschin Mar 15, 2019

Member

samplingKeyColumn [](start = 19, length = 17)

Would you consider FoldColumnName? #Resolved

This comment has been minimized.

@artidoro

artidoro Mar 18, 2019

Author Contributor

Wei-Sheng suggested partitionColumnName`, which I think reflects the role of the column better.


In reply to: 266177816 [](ancestors = 266177816)

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Mar 18, 2019

Member

#2537
#2536
We had discussion regarding name of this column.
If you want to change it again, I would advice ask opinion of @rogancarr and @TomFinley

This comment has been minimized.

@TomFinley

TomFinley Mar 19, 2019

Contributor

So, I also considered in the issues @Ivanidzo4ka linked the merits of "partition" in my own thinking, though I did not write it down (as I had internally rejected it). I will now write down why I rejected it here, and you can see if you agree or disagree.

The major objection is that, if I look forward, I anticipate a future where this same sort of key is useful in any sampling utilities. (There are multiple ways to sample a dataset than simple partitionings!) For the sake of that more forward looking perspective, I wanted it named something else. And we ultimately settled on "sampling key."

So I would prefer if we did not act on this suggestion.

This comment has been minimized.

@TomFinley

TomFinley Mar 19, 2019

Contributor

Incidentally I wouldn't consider it a disaster if we did this. (Keeping it as stratification would have been a disaster! 😄) It merely is slightly misleading I think. But of course I also see @wschin's point.

This comment has been minimized.

@artidoro

artidoro Mar 19, 2019

Author Contributor

Thank you @Ivanidzo4ka, and @TomFinley for your comments on this, I am reverting the commit to leave the name of samplingKeyColumnName as is.

@wschin

wschin approved these changes Mar 15, 2019

Copy link
Member

wschin left a comment

Only have a comment to naming. Please let know if it doesn't make sense to you.

@artidoro artidoro force-pushed the artidoro:scrubbing branch 2 times, most recently from 822686e to 9021a94 Mar 18, 2019

@artidoro artidoro merged commit 71693b3 into dotnet:master Mar 19, 2019

2 of 3 checks passed

MachineLearning-CodeCoverage in progress
Details
MachineLearning-CI #20190318.54 succeeded
Details
license/cla All CLA requirements met.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.