Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline the results file for cross-validation task #372

Merged
merged 14 commits into from
Oct 18, 2017

Conversation

desilinguist
Copy link
Member

@desilinguist desilinguist commented Oct 18, 2017

The .results file is produced for the evaluate and the cross_validate tasks. Currently, for the cross_validate task, the .results file (a) prints out the entire folds dictionary if folds_file is specified (b) includes details that aren't relevant to the experiment.

In this branch, I streamline the results file in the following way:

  • Include the path to the folds file when specified. This required the modification of the return tuple from _parse_config_file to include the folds file path.
  • Do not include stratified folds information when a folds file is specified since it's not used in that context.
  • Modify number of CV folds to be <n> via folds file if a folds file is specified.
  • Modify grid search folds to be <n> via folds file if the grid search ends up using the folds file.
  • Show the value of use_folds_file_for_grid_search when appropriate.
  • Show grid search related information (grid_search_folds and grid_objective) only when we are actually doing grid search.

To test this, I also included an entirely new test in test_output.py. Additional changes in the branch include:

  • Modify some additional tests to deal with the modifications to the .results and, hence, .results.json files.
  • Minor changes to docstrings and documentation.

- Include folds file path as one of the return values so that it can be included in the results file.
- Include the path to the folds file when specified.
- Do not include stratified folds information when folds file is specified.
- Modify number of CV folds to be `K via folds file` if a folds file is specified.
- Modify grid search folds to be `K via folds file` if they also end up using the folds file.
- Show the value of `use_folds_file_for_grid_search` when appropriate.
@coveralls
Copy link

coveralls commented Oct 18, 2017

Coverage Status

Coverage increased (+0.04%) to 92.026% when pulling 7c8e800 on fix/streamline-fancy-results-file into 0052077 on master.

mulhod
mulhod previously requested changes Oct 18, 2017
Copy link
Contributor

@mulhod mulhod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I think I found a couple typo bugs. Not really sure.

if lrd['task'] == 'cross_validate':
print('Number of Folds: {}'.format(lrd['cv_folds']),
file=output_file)
print('Stratified Folds: {}'.format(lrd['stratified_folds']),
file=output_file)
if not lrd['cv_folds'].endswith('folds file'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "folds file" supposed to be "folds_file" here? Can we just make it check if the string is equal to "folds file"/"folds_file" rather than using endswith or is there some reason that might not work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it's trying to match the string on the right hand side of the colon in the results file which should say 3 via folds file so it will always ends with "folds file" not "folds_file".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see. Misread that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. If you want to see some example output for various conditions, you can play around with cross_validate.cfg in the titanic example and look at the results files.

file=output_file)
if (lrd['task'] == 'cross_validate' and
lrd['grid_search'] and
lrd['cv_folds'].endswith('folds file')):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See reply above.

@desilinguist desilinguist dismissed mulhod’s stale review October 18, 2017 19:45

Matt didn't actually mean to request changes :)

@@ -170,7 +171,7 @@ def check_summary_score(use_feature_hashing=False):
# the learner results dictionaries should have 29 rows,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment needs to be updated with the new total number of rows.

@desilinguist desilinguist merged commit 4a1cc23 into master Oct 18, 2017
@desilinguist desilinguist deleted the fix/streamline-fancy-results-file branch October 18, 2017 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants