-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streamline the results file for cross-validation task #372
Conversation
- Include folds file path as one of the return values so that it can be included in the results file.
- Include the path to the folds file when specified. - Do not include stratified folds information when folds file is specified. - Modify number of CV folds to be `K via folds file` if a folds file is specified. - Modify grid search folds to be `K via folds file` if they also end up using the folds file. - Show the value of `use_folds_file_for_grid_search` when appropriate.
- Modify tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I think I found a couple typo bugs. Not really sure.
if lrd['task'] == 'cross_validate': | ||
print('Number of Folds: {}'.format(lrd['cv_folds']), | ||
file=output_file) | ||
print('Stratified Folds: {}'.format(lrd['stratified_folds']), | ||
file=output_file) | ||
if not lrd['cv_folds'].endswith('folds file'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "folds file" supposed to be "folds_file" here? Can we just make it check if the string is equal to "folds file"/"folds_file" rather than using endswith or is there some reason that might not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, it's trying to match the string on the right hand side of the colon in the results file which should say 3 via folds file
so it will always ends with "folds file" not "folds_file".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see. Misread that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. If you want to see some example output for various conditions, you can play around with cross_validate.cfg
in the titanic example and look at the results files.
file=output_file) | ||
if (lrd['task'] == 'cross_validate' and | ||
lrd['grid_search'] and | ||
lrd['cv_folds'].endswith('folds file')): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See reply above.
Matt didn't actually mean to request changes :)
tests/test_output.py
Outdated
@@ -170,7 +171,7 @@ def check_summary_score(use_feature_hashing=False): | |||
# the learner results dictionaries should have 29 rows, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment needs to be updated with the new total number of rows.
The
.results
file is produced for theevaluate
and thecross_validate
tasks. Currently, for thecross_validate
task, the.results
file (a) prints out the entire folds dictionary iffolds_file
is specified (b) includes details that aren't relevant to the experiment.In this branch, I streamline the results file in the following way:
_parse_config_file
to include the folds file path.<n> via folds file
if a folds file is specified.<n> via folds file
if the grid search ends up using the folds file.use_folds_file_for_grid_search
when appropriate.grid_search_folds
andgrid_objective
) only when we are actually doing grid search.To test this, I also included an entirely new test in
test_output.py
. Additional changes in the branch include:.results
and, hence,.results.json
files.