Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename CV and TrainTest "stratification" parameter #2537

Merged

Conversation

rogancarr
Copy link
Contributor

@rogancarr rogancarr commented Feb 13, 2019

This PR changes the CrossValidation and TrainTest parameter StratificationColumn to be GroupPreservationColumn and updates the docstrings to give a clearer explanation.

Fixes #2536

See conversation in #2536

Related to #1204, but that issue might be asking for further coverage.

@rogancarr
Copy link
Contributor Author

Something has gone weird with my master branch, and I'm seeing every merge from upstream as a commit. I'll look into this, but let's work with it for now ;)

@codecov
Copy link

codecov bot commented Feb 13, 2019

Codecov Report

Merging #2537 into master will increase coverage by 0.17%.
The diff coverage is 76.19%.

@@            Coverage Diff             @@
##           master    #2537      +/-   ##
==========================================
+ Coverage   71.26%   71.44%   +0.17%     
==========================================
  Files         797      801       +4     
  Lines      141292   141855     +563     
  Branches    16118    16141      +23     
==========================================
+ Hits       100692   101346     +654     
+ Misses      36138    36041      -97     
- Partials     4462     4468       +6
Flag Coverage Δ
#Debug 71.44% <76.19%> (+0.17%) ⬆️
#production 67.73% <73.68%> (+0.14%) ⬆️
#test 85.53% <100%> (+0.17%) ⬆️

/// If the <paramref name="stratificationColumn"/> is not provided, the random numbers generated to create it, will use this seed as value.
/// And if it is not provided, the default value will be used.</param>
public TrainTestData TrainTestSplit(IDataView data, double testFraction = 0.1, string stratificationColumn = null, uint? seed = null)
/// <param name="groupPreservationColumn">Name of a column to use as an ID for grouping rows. If two examples share the same value of the <paramref name="groupPreservationColumn"/>,
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you left that description from previous iteration with idColumn.
Do you want to rephrase it, since ID looks weird. #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also ID is already a thing on IDs, naming something different that would be super confusing.

public abstract ValueGetter<DataViewRowId> GetIdGetter();

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I @rogancarr , thanks for looking at this.

Group column is already an established name used to describe actual things that are intended to form a group. This is more general than this. So I do not like the name "group preservation." It's far too close. "Hey I specified this group column, how come my ranking metrics are all funky?" "Ah, you did not specify the group column, but the group preservation column!" Looks silly.

I'd almost prefer a random name drawn out of a dictionary to this. That would just look weird, but this will just be consistently confusing. I'll post suggestions in issue.

@TomFinley
Copy link
Contributor

Something has gone weird with my master branch, and I'm seeing every merge from upstream as a commit. I'll look into this, but let's work with it for now ;)

You commited directly into master branch on your own fork about a week ago, somehow didn't catch it, pushed that cahnge, and you've been doing pulls ever since.

Once you're ready to restore your master branch, the easiest way to do this would be to git reset --hard upstream/master (assuming you set up the original remote name as upstream, as most people here seem to, if the original source repo remote you gave some other name, obviously replace that), which will update your local branches. Then git push -f to force push that update the master on your origin branch. Thenceforth remember that commiting directly to master is a really bad idea.

@rogancarr
Copy link
Contributor Author

You commited directly into master branch on your own fork about a week ago, somehow didn't catch it, pushed that cahnge, and you've been doing pulls ever since.

Once you're ready to restore your master branch, the easiest way to do this would be to git reset --hard upstream/master (assuming you set up the original remote name as upstream, as most people here seem to, if the original source repo remote you gave some other name, obviously replace that), which will update your local branches. Then git push -f to force push that update the master on your origin branch. Thenceforth remember that commiting directly to master is a really bad idea.

Yes. I fixed that a couple days ago, but this PR still has the remnants :)

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rogancarr , this makes me a bit more comfortable.

Copy link
Member

@singlis singlis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@rogancarr rogancarr merged commit 832ecad into dotnet:master Feb 15, 2019
@rogancarr rogancarr deleted the 2536_rename_stratification_parameter branch February 15, 2019 21:56
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants