Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warnings for conflicts between fixed params and param grids #297

Conversation

mulhod
Copy link
Contributor

@mulhod mulhod commented Feb 13, 2016

Added warnings for conflicts between fixed parameters and parameter grids and for when grid search is set to False and paramater grids are specified.

…arnings in config.py so that user will be warned of conflicts and if grid_search is False and param_grids/fixed_parameters are specified (or at least one of them)
skll/config.py Outdated
'cases.'.format(learner,
', '.join(overlap_params)))
else:
if param_grid_list or fixed_parameter_list:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the problem with specifying fixed_parameter_list even if we aren't doing grid search?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it

@desilinguist desilinguist changed the title Feature/warning needed for conflicts between fixed params and param grids 185 Add warnings for conflicts between fixed params and param grids Feb 15, 2016
@desilinguist
Copy link
Member

Okay, @aoifecahill and I talked about this a bit and we think this is a little bit tricker:

  1. If users specify a parameter in fixed_params but also specified the same parameter as part of their parameter grid to search, that should actually be an error since it's not clear what their intent is.
  2. However, if users specify a parameter in fixed_params but have only specified grid_search=True and the default parameter grid happens to include said parameter, there should be a warning saying that the fixed parameter will be ignored. But that warning should also tell the user how to avoid this if they want - by specifying a parameter grid explicitly which does not include this parameter.
  3. We should also update the documentation for fixed_params to indicate all of the above.

Thoughts?

@mulhod
Copy link
Contributor Author

mulhod commented Feb 18, 2016

Seems pretty clear to me. I'll update this shortly.

@dan-blanchard
Copy link
Contributor

However, if users specify a parameter in fixed_params but have only specified grid_search=True and the default parameter grid happens to include said parameter, there should be a warning saying that the fixed parameter will be ignored. But that warning should also tell the user how to avoid this if they want - by specifying a parameter grid explicitly which does not include this parameter.

To me it would make the most sense in this case for the fixed parameter to win out over the default parameter grid. We could just modify the grid for them automatically and likely get them what they wanted.

@desilinguist
Copy link
Member

Hmm, yeah I agree. That's a better way of doing it but we should still output a warning saying that we did that.

@dan-blanchard
Copy link
Contributor

Yeah, I agree about the warning.

@mulhod
Copy link
Contributor Author

mulhod commented Feb 19, 2016

Couple observations:

  • First, I noticed an error in the check the way I implemented it and it's been fixed: I was zipping some things together and got the sequence of the zipping wrong. So, I just wanted to mention this in case it's been noticed.
  • More importantly, I have realized that the original way I was doing this wasn't correct. I was treating it as if param_grid_list would be populated with the default values before the point when it would get to the check at the center of this PR. However, the default values only get substituted in later on if param_grid_list either just wasn't specified at all or if some of the parameter grids in it are equal to an empty list, in which case it is my understanding that the default parameter grid is used. This was written this way in order to make it so that users could specify some, but not all parameter grids. Having an empty list was the same as saying, "Use the default grid". Given that the value of param_grid_list will either be None (if not specified) or a list of parameter grid dictionary lists (which might contain some empty lists), my check here will have to be a lot more involved -- if fixed_parameters and grid_search are specified. My check will basically need to look at the default values and see if there are conflicts and, in some situations, actually override values. And, since the would have happened after this point the way things were, it will now happen here.

…lues; make new unit test to test for conflicts when both are specified in the configuration file (more unit tests to come soon)
… parameter grids can be attained and add a new unit test in (more to come later)
@desilinguist desilinguist changed the title Add warnings for conflicts between fixed params and param grids [WIP] Add warnings for conflicts between fixed params and param grids Feb 19, 2016
@mulhod
Copy link
Contributor Author

mulhod commented Feb 19, 2016

Ok, I've made a bunch of changes and I think this is finished -- or, at least, I don't want to work on this further until somebody else takes a look because I had to change quite a few things and I want to know first if this is what you had in mind. As I mentioned earlier, in order to intelligently resolve conflicts, logging warnings sometimes, raising exceptions others, or doing some things silently, the approach had to be much more involved than it was initially. Partially this is due to the scope of the PR changing a little after further discussion. Anyway, tests pass locally, so I'm fairly certain that it works as is.

(and the defaults will be used).
(and the defaults will be used). If there is a conflict between parameters
specified in :ref:`param_grids` and :ref:`fixed_parameters`, :ref:`param_grids`
values will take precedence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the opposite of what we finally agreed on (after @dan-blanchard's comment)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't dealt with updating the documentation yet after making the most recent changes and I'd prefer to get the greenlight on what I have so far before making those changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. I just wanted to make sure that the code has been updated to do the right thing and it's just the documentation that's out of sync.

@mulhod
Copy link
Contributor Author

mulhod commented Feb 19, 2016

So, it seems that some of my tests are failing, but not the ones that are really for the core of this PR. Specifically, they are the tests for the learner names in the config file. I thought that I could simply tests all the input names and then raise an exception if they weren't in the list of learner classes. But, I wasn't accounting for custom learners. What I'm wondering is: did you not have this type of check specifically to allow custom learners? I could change my exception to a warning that simply says that the user is using an unrecognized/custom learner. Or I could take it out. Also, the reason I had added that in there was because I needed to be able to refer to a learner's corresponding class (without using eval) to get its default parameter grid. I could change that check below to look for the parameter grid and if none is found (i.e., because it's a custom learner), then obviously don't do the usual check since there will be no parameter grid.

@desilinguist
Copy link
Member

Yeah, I think we weren't checking the learners's identity because people could write custom learners. I think this PR requires a bit of careful reviewing since the scope has expanded. I would like all of us to think about this carefully and reach a consensus. So, this can probably go into either 1.2.1 or 1.3, whatever release we do next.

@mulhod
Copy link
Contributor Author

mulhod commented Feb 23, 2016

The check for a valid learner type, which my changes fail on due to the fact that users can provide custom learners, could actually be resolved just simply by checking if the learner ends with ".py", couldn't it? I think the error that I have put in could be kept without much modification.

I see now that it's a little more complicated. I thought that the learner name itself ended in .py for some reason, but that's only true for the custom_learner_path field in the config file. In any case, there might still be a way to check if that field is specified and just skip the check for invalid learners. Then, later on when we're trying to get the default parameter grids, we can just skip the check for any learners that aren't found in the list of possible learner types under the assumption that they are custom learners.

@mulhod mulhod changed the title Add warnings for conflicts between fixed params and param grids [WIP] Add warnings for conflicts between fixed params and param grids Nov 1, 2017
@coveralls
Copy link

coveralls commented Nov 27, 2017

Coverage Status

Coverage decreased (-0.1%) to 92.236% when pulling eed801c on feature/warning_needed_for_conflicts_between_fixed_params_and_param_grids_185 into 112cd33 on master.

@mulhod mulhod changed the title [WIP] Add warnings for conflicts between fixed params and param grids Add warnings for conflicts between fixed params and param grids Nov 27, 2017
@mulhod mulhod changed the title Add warnings for conflicts between fixed params and param grids [WIP] Add warnings for conflicts between fixed params and param grids Nov 27, 2017
@mulhod mulhod force-pushed the feature/warning_needed_for_conflicts_between_fixed_params_and_param_grids_185 branch from eed801c to 6d82e56 Compare November 27, 2017 16:36
@coveralls
Copy link

coveralls commented Nov 27, 2017

Coverage Status

Coverage decreased (-0.03%) to 92.345% when pulling 6d82e56 on feature/warning_needed_for_conflicts_between_fixed_params_and_param_grids_185 into 112cd33 on master.

@mulhod mulhod changed the title [WIP] Add warnings for conflicts between fixed params and param grids Add warnings for conflicts between fixed params and param grids Nov 27, 2017
@mulhod
Copy link
Contributor Author

mulhod commented Nov 27, 2017

The coverage decreased, but all that was added (now that some things have been removed) are basically logging statements. Tests could be added to check that the statements are logged, but I'm wondering if it's necessary or more trouble than it's worth in this case.

@aoifecahill
Copy link
Collaborator

wouldn't we want to check that the correct warnings are being logged for the various scenarios?

@mulhod
Copy link
Contributor Author

mulhod commented Nov 27, 2017

It would be good, sure, but there are no checks for any of the other logging statements in similar situations. Also, it seems to me that it would be better if all of the warnings were actual warnings (that could be written to the log as well). That would be easier to test.

@aoifecahill
Copy link
Collaborator

Ah yes, true. But since PR #380 has been merged, shouldn't that now be possible? i.e. using proper logging and verifying the logging messages?

@mulhod
Copy link
Contributor Author

mulhod commented Nov 27, 2017

Yeah, I see the case where it was used to verify the number of folds, etc. I think it's possible. I just wonder about 1) how worthwhile it is and 2) whether the warnings should be more like actual warnings. I can get around to adding some tests using the log statements for now.

@aoifecahill
Copy link
Collaborator

Yes, I agree that the warning should be more like actual warning following the logging changes that @desilinguist implemented recently.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 92.382% when pulling c347269 on feature/warning_needed_for_conflicts_between_fixed_params_and_param_grids_185 into 112cd33 on master.

5 similar comments
@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 92.382% when pulling c347269 on feature/warning_needed_for_conflicts_between_fixed_params_and_param_grids_185 into 112cd33 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 92.382% when pulling c347269 on feature/warning_needed_for_conflicts_between_fixed_params_and_param_grids_185 into 112cd33 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 92.382% when pulling c347269 on feature/warning_needed_for_conflicts_between_fixed_params_and_param_grids_185 into 112cd33 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 92.382% when pulling c347269 on feature/warning_needed_for_conflicts_between_fixed_params_and_param_grids_185 into 112cd33 on master.

@coveralls
Copy link

coveralls commented Nov 28, 2017

Coverage Status

Coverage increased (+0.01%) to 92.382% when pulling c347269 on feature/warning_needed_for_conflicts_between_fixed_params_and_param_grids_185 into 112cd33 on master.

@mulhod
Copy link
Contributor Author

mulhod commented Nov 28, 2017

I have now added some tests that search for the warning in the log file. What I mean about using actual warnings, though, was about using the warnings module directly. I guess it doesn't matter since it's not extremely different in the end.

I think this is ready to go again.

Copy link
Member

@desilinguist desilinguist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor rephrasing of the warning text which would require changing the tests as well. Thanks again for taking care of this!

skll/config.py Outdated
@@ -582,6 +581,23 @@ def _parse_config_file(config_path, log_level=logging.INFO):
# using the defaults?
do_grid_search = config.getboolean("Tuning", "grid_search")

# Check if `param_grids` is specified, but `grid_search` is False
if param_grid_list and not do_grid_search:
logger.warning('"param_grids" was specified despite the fact that '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rephrase this to state simply that Since "grid_search" is set to False, the specified "param_grids" will be ignored..

skll/config.py Outdated
# `param_grid_list` (or values passed in by default) if
# `do_grid_search` is True
if do_grid_search and fixed_parameter_list:
logger.warning('"grid_search" is set to True and "fixed_parameters"'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor rephrasing: Note that "grid_search" is set to True and "fixed_parameters" is also specified. If there is a conflict between the grid search parameter space and the fixed parameter values, the fixed parameter values will take precedence.

@coveralls
Copy link

coveralls commented Dec 6, 2017

Coverage Status

Coverage increased (+0.01%) to 92.382% when pulling 1061cc9 on feature/warning_needed_for_conflicts_between_fixed_params_and_param_grids_185 into 112cd33 on master.

@desilinguist desilinguist merged commit 4111440 into master Dec 7, 2017
@desilinguist desilinguist deleted the feature/warning_needed_for_conflicts_between_fixed_params_and_param_grids_185 branch December 7, 2017 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants