Add problem_configuration parameter to AutoMLSearch #1457

freddyaboulton · 2020-11-23T19:59:05Z

Pull Request Description

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

codecov · 2020-11-23T20:05:29Z

Codecov Report

Merging #1457 (830058f) into main (67b534f) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@            Coverage Diff            @@
##             main    #1457     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         223      223             
  Lines       14930    15001     +71     
=========================================
+ Hits        14923    14994     +71     
  Misses          7        7

Impacted Files	Coverage Δ
evalml/automl/utils.py	`100.0% <ø> (ø)`
evalml/automl/__init__.py	`100.0% <100.0%> (ø)`
...lml/automl/automl_algorithm/iterative_algorithm.py	`100.0% <100.0%> (ø)`
evalml/automl/automl_search.py	`99.7% <100.0%> (+0.1%)`	⬆️
evalml/data_checks/default_data_checks.py	`100.0% <100.0%> (ø)`
evalml/problem_types/problem_types.py	`100.0% <100.0%> (ø)`
evalml/problem_types/utils.py	`100.0% <100.0%> (ø)`
evalml/tests/automl_tests/test_automl.py	`100.0% <100.0%> (ø)`
...lml/tests/automl_tests/test_iterative_algorithm.py	`100.0% <100.0%> (ø)`
evalml/tests/data_checks_tests/test_data_checks.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 67b534f...830058f. Read the comment docs.

freddyaboulton · 2020-11-23T22:33:58Z

evalml/automl/automl_algorithm/iterative_algorithm.py

@@ -119,6 +122,8 @@ def add_result(self, score_to_minimize, pipeline):
    def _transform_parameters(self, pipeline_class, proposed_parameters):
        """Given a pipeline parameters dict, make sure n_jobs and number_features are set."""
        parameters = {}
+        if self._pipeline_params:


Need the if .. here so that they are only passed to the pipeline if they are needed

freddyaboulton · 2020-11-23T22:42:28Z

evalml/data_checks/default_data_checks.py

@@ -32,7 +32,7 @@ def __init__(self, problem_type):
        Arguments:
            problem_type (str): The problem type that is being validated. Can be regression, binary, or multiclass.
        """
-        if handle_problem_types(problem_type) == ProblemTypes.REGRESSION:
+        if handle_problem_types(problem_type) in [ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION]:


This should have been done in #1378 😬

Good catch!

We must've been missing unit test coverage for the default data checks for the time series regression problem type. Could we add that? Should just be able to clone an existing test, ensure the right data checks show up just like regression.

I see you added that. Champion! 🏅 🤣

freddyaboulton · 2020-11-23T22:43:03Z

evalml/problem_types/problem_types.py

@@ -21,6 +21,10 @@ def __str__(self):
                             ProblemTypes.TIME_SERIES_REGRESSION.name: "time series regression"}
        return problem_type_dict[self.name]

+    @classproperty


Need this so that users can specify the problem type as "time series regression" (which matches the enum value) as opposed to "time_series_regression".

Ah got it. Where were the underscores coming from previously?

Oh is this because ProblemTypes.TIME_SERIES_REGRESSION.name is "time_series_regression", whereas ProblemTypes.TIME_SERIES_REGRESSION.value is "time series regression"?

Yes exactly! By default, you can't look up the enum by .value, only the .name, but we prefer users to not have to use underscores to keep it consistent with the .value!

freddyaboulton · 2020-11-23T22:48:24Z

evalml/tests/automl_tests/test_iterative_algorithm.py

-        def __init__(self, dummy_parameter='default', random_state=0):
-            super().__init__(parameters={'dummy_parameter': dummy_parameter}, component_obj=None, random_state=random_state)
+        def __init__(self, dummy_parameter='default', random_state=0, **kwargs):
+            super().__init__(parameters={'dummy_parameter': dummy_parameter, **kwargs},


I need this to accept kwargs for one of my tests but we should do this anyway because our convention is to allow kwargs to estimators.

dsherry

Great!! I didn't have any suggestions other than deleting a test, 🚢 !

dsherry · 2020-11-24T15:28:35Z

evalml/automl/automl_algorithm/iterative_algorithm.py

+            # Pass the pipeline params to the components that need them
+            for param_name, value in self._pipeline_params.items():
+                if param_name in init_params:
+                    component_parameters[param_name] = value


@freddyaboulton got it, looks good to me!

dsherry · 2020-11-24T15:31:19Z

evalml/automl/automl_search.py

@@ -163,6 +167,9 @@ def __init__(self,
            max_batches (int): The maximum number of batches of pipelines to search. Parameters max_time, and
                max_iterations have precedence over stopping the search.

+            problem_configuration (dict, None): Additional parameters needed to configure the search. For example,
+                in time series problems, values should be passed in for the gap and max_delay variables.


dsherry · 2020-11-24T15:32:54Z

evalml/automl/automl_search.py

+    def _validate_problem_configuration(self, problem_configuration=None):
+        if self.problem_type in [ProblemTypes.TIME_SERIES_REGRESSION]:
+            required_parameters = {'gap', 'max_delay'}
+            if not problem_configuration or not all(p in problem_configuration for p in required_parameters):


Ooh, fancy usage of all

This validation logic lgtm!

dsherry · 2020-11-24T15:34:32Z

evalml/automl/automl_search.py

@@ -593,7 +613,7 @@ def _add_baseline_pipelines(self, X, y):
            baseline = ModeBaselineBinaryPipeline(parameters={})
        elif self.problem_type == ProblemTypes.MULTICLASS:
            baseline = ModeBaselineMulticlassPipeline(parameters={})
-        elif self.problem_type == ProblemTypes.REGRESSION:
+        else:


Ah, got it. This is great. I do wonder if we'll want to update our timeseries "baseline" to a weighted moving average or something. We can wait and see!

Yep, lots of options here! Another naive thing we could do is just use the previous target value for "today's" prediction.

dsherry · 2020-11-24T15:34:56Z

evalml/automl/automl_search.py

@@ -363,6 +379,9 @@ def _set_data_split(self, X):
            default_data_split = KFold(n_splits=3, random_state=self.random_state, shuffle=True)
        elif self.problem_type in [ProblemTypes.BINARY, ProblemTypes.MULTICLASS]:
            default_data_split = StratifiedKFold(n_splits=3, random_state=self.random_state, shuffle=True)
+        elif self.problem_type in [ProblemTypes.TIME_SERIES_REGRESSION]:
+            default_data_split = TimeSeriesSplit(n_splits=3, gap=self.problem_configuration['gap'],
+                                                 max_delay=self.problem_configuration['max_delay'])


dsherry · 2020-11-24T15:38:28Z

evalml/data_checks/default_data_checks.py

@@ -32,7 +32,7 @@ def __init__(self, problem_type):
        Arguments:
            problem_type (str): The problem type that is being validated. Can be regression, binary, or multiclass.
        """
-        if handle_problem_types(problem_type) == ProblemTypes.REGRESSION:
+        if handle_problem_types(problem_type) in [ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION]:


Good catch!

We must've been missing unit test coverage for the default data checks for the time series regression problem type. Could we add that? Should just be able to clone an existing test, ensure the right data checks show up just like regression.

dsherry · 2020-11-24T15:43:35Z

evalml/problem_types/problem_types.py

@@ -21,6 +21,10 @@ def __str__(self):
                             ProblemTypes.TIME_SERIES_REGRESSION.name: "time series regression"}
        return problem_type_dict[self.name]

+    @classproperty


Ah got it. Where were the underscores coming from previously?

dsherry · 2020-11-24T15:44:46Z

evalml/problem_types/problem_types.py

@@ -21,6 +21,10 @@ def __str__(self):
                             ProblemTypes.TIME_SERIES_REGRESSION.name: "time series regression"}
        return problem_type_dict[self.name]

+    @classproperty


Oh is this because ProblemTypes.TIME_SERIES_REGRESSION.name is "time_series_regression", whereas ProblemTypes.TIME_SERIES_REGRESSION.value is "time series regression"?

dsherry · 2020-11-24T15:50:24Z

evalml/tests/automl_tests/test_automl.py

+    problem_params = {"gap": 3, "max_delay": 2, "extra": "foo"}
+    automl = AutoMLSearch(problem_type=problem_type, problem_configuration=problem_params, max_iterations=1)
+    automl.search(X, y)
+    assert automl._automl_algorithm._pipeline_params == problem_params


@freddyaboulton this is great. But I think the real test would be, do the pipelines created by the automl algo contain the correct parameters?

Oh lol I see that's your next test. Cool!

So in that case, between the iterative algo test and the test below this, is this test necessary?

No I don't think we need it! Good catch. I added this test before realizing we needed something more thorough. I'll delete!

dsherry · 2020-11-24T15:52:24Z

evalml/data_checks/default_data_checks.py

@@ -32,7 +32,7 @@ def __init__(self, problem_type):
        Arguments:
            problem_type (str): The problem type that is being validated. Can be regression, binary, or multiclass.
        """
-        if handle_problem_types(problem_type) == ProblemTypes.REGRESSION:
+        if handle_problem_types(problem_type) in [ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION]:


I see you added that. Champion! 🏅 🤣

… checks.

freddyaboulton changed the title ~~Passing problem configuration parameters to created pipelines.~~ Add problem_configuration field to AutoMLSearch Nov 23, 2020

freddyaboulton changed the title ~~Add problem_configuration field to AutoMLSearch~~ Add problem_configuration parameter to AutoMLSearch Nov 23, 2020

freddyaboulton force-pushed the 1382-automl-problem-configuration branch from 33a3faa to 5560011 Compare November 23, 2020 21:48

freddyaboulton marked this pull request as ready for review November 23, 2020 22:10

freddyaboulton requested review from dsherry, angela97lin, bchen1116, christopherbunn and ParthivNaresh November 23, 2020 22:10

freddyaboulton commented Nov 23, 2020

View reviewed changes

freddyaboulton force-pushed the 1382-automl-problem-configuration branch from cc1c7df to 7c06b74 Compare November 23, 2020 22:45

freddyaboulton commented Nov 23, 2020

View reviewed changes

freddyaboulton force-pushed the 1382-automl-problem-configuration branch from 7c06b74 to c6fdecf Compare November 24, 2020 15:34

freddyaboulton self-assigned this Nov 24, 2020

dsherry approved these changes Nov 24, 2020

View reviewed changes

freddyaboulton added 11 commits November 24, 2020 11:26

Passing problem configuration parameters to created pipelines.

7cc825c

Adding PR 1457 to release notes.

662298b

Only passing the pipeline parameters to the components that need them.

8a65d91

Checking that TimeSeriesSplit is used.

119d2ae

Linting imports in test_automl.py

e75de63

Removing redundant if check in IterativeAlgorithm._transform_parameters.

c5d6c1b

Setting time series regression default data checks to regression data…

7790fc3

… checks.

Refactor _validate_problem_configuration in AutoMLSearch.

e3ed225

Rename some automl tests to use problem_configuration.

19d2c10

Aligning function arguments in test_automl with previous line

b8a36c0

Deleting test_automl_creates_algo_with_problem_configuration

830058f

freddyaboulton force-pushed the 1382-automl-problem-configuration branch from 05f3d61 to 830058f Compare November 24, 2020 16:26

freddyaboulton merged commit ef01e1b into main Nov 24, 2020

freddyaboulton deleted the 1382-automl-problem-configuration branch November 24, 2020 17:03

dsherry mentioned this pull request Nov 24, 2020

Release v0.16.0 #1468

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add problem_configuration parameter to AutoMLSearch #1457

Add problem_configuration parameter to AutoMLSearch #1457

freddyaboulton commented Nov 23, 2020

codecov bot commented Nov 23, 2020 •

edited

Loading

freddyaboulton Nov 23, 2020

freddyaboulton Nov 23, 2020

dsherry Nov 24, 2020

dsherry Nov 24, 2020

freddyaboulton Nov 23, 2020

dsherry Nov 24, 2020

dsherry Nov 24, 2020

freddyaboulton Nov 24, 2020

freddyaboulton Nov 23, 2020

dsherry left a comment

dsherry Nov 24, 2020

dsherry Nov 24, 2020

dsherry Nov 24, 2020

dsherry Nov 24, 2020

freddyaboulton Nov 24, 2020

dsherry Nov 24, 2020

dsherry Nov 24, 2020

dsherry Nov 24, 2020

dsherry Nov 24, 2020

dsherry Nov 24, 2020

dsherry Nov 24, 2020

freddyaboulton Nov 24, 2020

dsherry Nov 24, 2020

Add problem_configuration parameter to AutoMLSearch #1457

Add problem_configuration parameter to AutoMLSearch #1457

Conversation

freddyaboulton commented Nov 23, 2020

Pull Request Description

codecov bot commented Nov 23, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsherry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 23, 2020 •

edited

Loading