Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RandomSearch and GridSearch Tuners #240

Merged
merged 39 commits into from Mar 24, 2020
Merged

Implement RandomSearch and GridSearch Tuners #240

merged 39 commits into from Mar 24, 2020

Conversation

christopherbunn
Copy link
Contributor

@christopherbunn christopherbunn commented Nov 22, 2019

Resolves #230

@codecov
Copy link

codecov bot commented Nov 25, 2019

Codecov Report

Merging #240 into master will not change coverage by %.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #240   +/-   ##
=======================================
  Coverage   98.58%   98.58%           
=======================================
  Files         111      111           
  Lines        3744     3744           
=======================================
  Hits         3691     3691           
  Misses         53       53           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b92ed5...5b92ed5. Read the comment docs.

Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Looks like a good start! Can you also add in the AutoBase changes and some test cases for the next iteration?

evalml/tuners/grid_search_tuner.py Outdated Show resolved Hide resolved
evalml/tuners/grid_search_tuner.py Outdated Show resolved Hide resolved
evalml/tuners/grid_search_tuner.py Outdated Show resolved Hide resolved
evalml/tuners/grid_search_tuner.py Outdated Show resolved Hide resolved
evalml/tuners/grid_search_tuner.py Outdated Show resolved Hide resolved
evalml/tuners/random_search_tuner.py Outdated Show resolved Hide resolved
evalml/tuners/random_search_tuner.py Outdated Show resolved Hide resolved
evalml/tuners/random_search_tuner.py Outdated Show resolved Hide resolved
evalml/tuners/random_search_tuner.py Outdated Show resolved Hide resolved
evalml/tuners/random_search_tuner.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@christopherbunn christopherbunn left a comment

Some additional questions:

evalml/tuners/grid_search_tuner.py Outdated Show resolved Hide resolved
evalml/tuners/grid_search_tuner.py Outdated Show resolved Hide resolved
evalml/tuners/grid_search_tuner.py Outdated Show resolved Hide resolved
evalml/tuners/grid_search_tuner.py Outdated Show resolved Hide resolved
evalml/tuners/grid_search_tuner.py Outdated Show resolved Hide resolved
evalml/tuners/random_search_tuner.py Outdated Show resolved Hide resolved
evalml/tuners/random_search_tuner.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Looking good, just need a little bit more on testing and a more graceful handling of the exception on the AutoML side. Adding a tutorial to the docs and API reference can come next as well.

evalml/models/auto_base.py Outdated Show resolved Hide resolved
evalml/models/auto_base.py Outdated Show resolved Hide resolved
evalml/models/auto_base.py Outdated Show resolved Hide resolved
evalml/models/auto_base.py Outdated Show resolved Hide resolved
evalml/models/auto_base.py Outdated Show resolved Hide resolved
evalml/tests/tuners_tests/test_tuners.py Outdated Show resolved Hide resolved
evalml/tests/tuners_tests/test_tuners.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Looking good, just need tests on AutoML side taking in new tuners and handling the exceptions.

@angela97lin angela97lin mentioned this pull request Dec 10, 2019
@christopherbunn
Copy link
Contributor Author

christopherbunn commented Dec 16, 2019

I discussed this issue with @jeremyliweishih last Friday, but the current structure as of now makes it difficult to pass in parameters for the tuner. For example, the GridSearchTuner has a parameter to determine how many points to sample from the search space. Here's two ways that we could pass in these parameters:

  • Rather than pass in the class type for the tuner, pass in a tuner object with the class defined
    • This requires changing the way we pass in the space for each pipeline into the tuner
  • Create another parameter for AutoBase that takes in a dictionary of parameters for the tuner

The current implementation here hard codes these parameters for now. If we want to, we could merge this in to show that we have multiple tuners and then refactor them once we solidify #272.

@jeremyliweishih
Copy link
Contributor

jeremyliweishih commented Jan 7, 2020

@dsherry aside of these issues, this PR should be ready to be merged. We should discuss how we want to proceed in the next planning meeting.

I discussed this issue with @jeremyliweishih last Friday, but the current structure as of now makes it difficult to pass in parameters for the tuner. For example, the GridSearchTuner has a parameter to determine how many points to sample from the search space. Here's two ways that we could pass in these parameters:

  • Rather than pass in the class type for the tuner, pass in a tuner object with the class defined

    • This requires changing the way we pass in the space for each pipeline into the tuner
  • Create another parameter for AutoBase that takes in a dictionary of parameters for the tuner

The current implementation here hard codes these parameters for now. If we want to, we could merge this in to show that we have multiple tuners and then refactor them once we solidify #272.

except Exception as e:
self.parameter_exception = True
self.logger.log('\n✘ ' + str(e) + '\n')
return
Copy link
Collaborator

@dsherry dsherry Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean an error would show up in the terminal output, if we're using tqdm?

@christopherbunn christopherbunn requested a review from dsherry Jan 21, 2020
@christopherbunn christopherbunn changed the title [WIP] Implement RandomSearch and GridSearch Tuners Implement RandomSearch and GridSearch Tuners Jan 28, 2020
@christopherbunn christopherbunn force-pushed the tuners branch 3 times, most recently from 7e635e7 to 3fdb7ac Compare Jan 31, 2020
try:
parameters = self._propose_parameters(pipeline_class)
except Exception as e:
raise e
Copy link
Collaborator

@dsherry dsherry Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this?

Copy link
Collaborator

@dsherry dsherry Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't rethrow here. If the tuner says the search space has been exhausted, we need automl to handle that. I think we need to have that in this PR in order to merge.

I see two options to do that:

  1. Update Tuner API. Add a Tuner.is_search_space_exhausted() member (name could be better). Have AutoBase._check_stopping_condition call this and return true if its true
  2. Just update autobase code. Add an instance var _search_space_exhausted to AutoBase, and set that to true here when NoParamsException comes up the stack. Update AutoBase._check_stopping_condition() to return true if self._search_space_exhausted.

Option 1 feels cleaner. It does make each tuner's implementation more complex. But that's what I'd do. @christopherbunn lmk if you'd like to discuss this / talk about how to implement it.

Copy link
Contributor

@jeremyliweishih jeremyliweishih Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like option 1!

evalml/tuners/grid_search_tuner.py Outdated Show resolved Hide resolved
Arguments:
points: The number of points to uniformly sample from \
Real dimensions.
random_state: Not used in grid search, kept for compatibility
Copy link
Collaborator

@dsherry dsherry Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing a description of the space parameter

class GridSearchTuner:
"""Grid Search Optimizer"""

def __init__(self, space, points=10, random_state=None):
Copy link
Collaborator

@dsherry dsherry Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit-pick, but could we call points n_points instead? To me, the "n" prefix implies that this is a number, rather than a list of points or something


Arguments:
points: The number of points to uniformly sample from \
Real dimensions.
Copy link
Collaborator

@dsherry dsherry Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is meant by "Real dimensions"?

Copy link
Contributor Author

@christopherbunn christopherbunn Feb 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Real dimensions" refers to a search space dimension that can take on any real value. Maybe it would be better to reword this to "The number of points to uniformly sample from Real search spaces"

Copy link
Collaborator

@dsherry dsherry Feb 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, thanks. Personally I find sklearn's docstring there to be unclear/cryptic. How about "the number of points to sample from along each dimension defined in the space argument" ?

Copy link
Collaborator

@dsherry dsherry Mar 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I stand by what I said, would be nice to update this docstring a bit

Arguments:
points: The number of points to uniformly sample from \
Real dimensions.
random_state: Not used in grid search, kept for compatibility
Copy link
Collaborator

@dsherry dsherry Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just say "unused in this class"

else:
return Exception("Invalid dimension type in tuner")
raw_dimensions.append(range_values)
self.grid_points = list(itertools.product(*raw_dimensions))
Copy link
Collaborator

@dsherry dsherry Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By calling list() here, you're coercing the itertools.product generator into a list. That will cause all the combinations to be generated here, which could be harmful performance-wise as we increase n_points.

Fortunately, iterators are awesome! Here you can just say

self._grid_points = itertools.product(*raw_dimensions)

And then:

def propose(self):
    try:
        return next(self._grid_points)
    except StopIteration:
        raise NoParamsException("Grid search has exhausted all possible parameters.")

Here's a page from the python doc which shows the approximate code used to implement itertools.product. Note the use of yield, which adds an option to a generator rather than generating the output at that moment. Lazy computation ftw :)

self.grid_points = list(itertools.product(*raw_dimensions))

def add(self, parameters, score):
# Since this is a grid search, we don't need to store the results.
Copy link
Collaborator

@dsherry dsherry Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal preference is to delete this comment. In general I think code comments (distinct from docstrings) should only exist if they explain something which would be nearly impossible to discern from just reading the code. But that's just my opinion and I respect if you feel differently.

@christopherbunn
Copy link
Contributor Author

christopherbunn commented Mar 20, 2020

Hey @dsherry and @jeremyliweishih, thanks for taking a look a few days ago. I incorporated most of the changes you all suggested. A few notes:

Re:

We can't rethrow here. If the tuner says the search space has been exhausted, we need automl to handle that. I think we need to have that in this PR in order to merge.

I see two options to do that:

Update Tuner API. Add a Tuner.is_search_space_exhausted() member (name could be better). Have AutoBase._check_stopping_condition call this and return true if its true

I gave updating the Tuner API a shot so that it handles this stopping condition more gracefully. I'm not sure if it's the most efficient method especially since it requires moving up getting the type of pipeline in AutoBase.check_stopping_condition() so I'm definitely looking for feedback on this section.

Re: pipeline mocking, I didn't really push much on this front. While it would definitely speed up performance I think it would be better to have a consistent mocking strategy by having a more broader implementation. The current tests are fairly minimal in terms of search & training. Imo, mocking fits better in a separate PR but again I'm open to thoughts/opinions.

Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Looks good from the questions I had!

# get new pipeline and check tuner
self._current_pipeline_class = self._select_pipeline()
if self.tuners[self._current_pipeline_class.name].is_search_space_exhausted():
return False
Copy link
Collaborator

@dsherry dsherry Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great for now :) good going

@@ -178,6 +178,11 @@ def search(self, X, y, feature_types=None, raise_errors=False, show_iteration_pl
pbar.close()

def _check_stopping_condition(self, start):
# get new pipeline and check tuner
self._current_pipeline_class = self._select_pipeline()
Copy link
Collaborator

@dsherry dsherry Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Let's add this to __init__:

self._current_pipeline_class = None

@@ -178,6 +178,11 @@ def search(self, X, y, feature_types=None, raise_errors=False, show_iteration_pl
pbar.close()

def _check_stopping_condition(self, start):
# get new pipeline and check tuner
self._current_pipeline_class = self._select_pipeline()
Copy link
Collaborator

@dsherry dsherry Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And can we rename this to _next_pipeline_class? I feel like that's a better description because it's the next class we'll be using in _do_iteration

Copy link
Collaborator

@dsherry dsherry Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I acknowledge this is really just a nit-pick haha

Copy link
Collaborator

@dsherry dsherry left a comment

This looks great! Awesome work, particularly on updating auto_base.py

I agree with @jeremyliweishih 's suggestion about mocking in the tests. And I left a couple small comments. Otherwise, 🚢 :)

@christopherbunn christopherbunn merged commit 99a8ffd into master Mar 24, 2020
2 checks passed
@dsherry dsherry deleted the tuners branch Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature Features which don't yet exist.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Random Search and Grid Search Tuners
3 participants