Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
118 commits
Select commit Hold shift + click to select a range
393052b
init
angela97lin Dec 28, 2020
2746e74
updated imputer init, starting to update tests...
angela97lin Jan 8, 2021
7260503
Merge branch '1406_components_return_ww' of github.com:alteryx/evalml…
angela97lin Jan 8, 2021
f1d74c2
Merge branch 'main' into 1406_components_return_ww
angela97lin Jan 8, 2021
1aee7f4
what a mess! messing around with simpleimputer logic and type inference
angela97lin Jan 8, 2021
9abf008
Merge branch '1406_components_return_ww' of github.com:alteryx/evalml…
angela97lin Jan 8, 2021
d311015
clean up imputer tests
angela97lin Jan 8, 2021
3c27d68
update datetime featurizer
angela97lin Jan 8, 2021
482d2f1
update per_column_imputer
angela97lin Jan 9, 2021
2ad0526
fix per col imputer tests
angela97lin Jan 9, 2021
001dc59
fix drop null cols tests
angela97lin Jan 9, 2021
a24468d
fix ohe tests
angela97lin Jan 9, 2021
f4eb8c0
fix pca
angela97lin Jan 10, 2021
49c8400
fix lda
angela97lin Jan 10, 2021
ab8789f
fix lsa and text featurizer
angela97lin Jan 10, 2021
0865d32
update featuretools
angela97lin Jan 11, 2021
c99808b
update col selector transformer
angela97lin Jan 11, 2021
704c1c4
update baseline tests
angela97lin Jan 11, 2021
c348d79
update baseline regressor
angela97lin Jan 11, 2021
231523b
update target encoder
angela97lin Jan 11, 2021
a93c8b3
update delated feature transformer
angela97lin Jan 12, 2021
4b519f5
Merge branch 'main' into 1406_components_return_ww
angela97lin Jan 12, 2021
094698d
fix estimator tests
angela97lin Jan 12, 2021
559fa05
Merge branch '1406_components_return_ww' of github.com:alteryx/evalml…
angela97lin Jan 12, 2021
72a0b9d
fix some component tests, more to go
angela97lin Jan 12, 2021
86f4dbc
continue fixing tests, more to go
angela97lin Jan 12, 2021
350aa69
fix one more test
angela97lin Jan 12, 2021
fc45967
fix component tests
angela97lin Jan 12, 2021
d619154
fix more pipeline tests
angela97lin Jan 12, 2021
90a4190
fix stacked ensemble component tests
angela97lin Jan 12, 2021
fa9f098
fix more tests in automl
angela97lin Jan 12, 2021
8e22a48
fix component graph and regression pipeline tests
angela97lin Jan 13, 2021
bb10ab1
fix time series pipeline tests
angela97lin Jan 13, 2021
8699d37
fix more component tests
angela97lin Jan 13, 2021
94e4655
fix some more tests
angela97lin Jan 13, 2021
93dcc79
fix baseline classification test
angela97lin Jan 13, 2021
f247141
fixing more automl and pipeline test
angela97lin Jan 13, 2021
2e7cfe6
fix time series baseline regressor tests
angela97lin Jan 13, 2021
4c2666d
fix baseline regression pipeline tests and cbm and component graph tests
angela97lin Jan 13, 2021
b2244b9
fix prediction explanation algo tests
angela97lin Jan 13, 2021
da5c82d
fix explainer tests and pipeline misc tests
angela97lin Jan 13, 2021
18a7ee4
holy potato fix partial dependence tests
angela97lin Jan 14, 2021
8cc1e52
remove unnecessary try/finally block
angela97lin Jan 14, 2021
318d785
update regression test to use OHE instead of target
angela97lin Jan 14, 2021
b27eca8
push for tests
angela97lin Jan 14, 2021
20a8116
hmmm... adding code to component graph to handle carrying original lo…
angela97lin Jan 15, 2021
f43d4d1
add check for data column and data check in component graph
angela97lin Jan 15, 2021
5430ebc
update component graph to handle naming
angela97lin Jan 15, 2021
16f58a9
fix docs
angela97lin Jan 15, 2021
b241e39
uncomment test
angela97lin Jan 15, 2021
12c30ae
fix pipelines docs
angela97lin Jan 15, 2021
3e0fe81
merging
angela97lin Jan 15, 2021
2d4a0e6
mini cleanup here
angela97lin Jan 15, 2021
b90f915
fix tests
angela97lin Jan 15, 2021
03662c6
a bit of cleanup
angela97lin Jan 15, 2021
1c4d1cf
Merge branch 'main' into 1406_components_return_ww
angela97lin Jan 15, 2021
4425ad7
fix tests
angela97lin Jan 15, 2021
646c985
Merge branch '1406_components_return_ww' of github.com:alteryx/evalml…
angela97lin Jan 15, 2021
0295348
remove catboost changes from this branch
angela97lin Jan 15, 2021
326b550
cleanup some comments
angela97lin Jan 15, 2021
d6fec28
clean up some estimators
angela97lin Jan 15, 2021
95f11f7
more minor cleanup
angela97lin Jan 16, 2021
cc95be1
a little more cleanup
angela97lin Jan 16, 2021
a3fd671
even more cleanup
angela97lin Jan 16, 2021
37f8028
fix feature selector
angela97lin Jan 16, 2021
a713574
clean up and add durations flag to pytest
angela97lin Jan 19, 2021
f18b64d
oops fix typos
angela97lin Jan 19, 2021
1ccb152
update partial dependence impl
angela97lin Jan 19, 2021
b0dd92e
merging and fixing tests
angela97lin Jan 20, 2021
afe0741
fix knn
angela97lin Jan 20, 2021
69c83ca
clean up graphs
angela97lin Jan 20, 2021
7c38b62
some more cleanup
angela97lin Jan 20, 2021
8ce1c11
cleanup gen utils
angela97lin Jan 20, 2021
1171abf
Merge branch 'main' into 1406_components_return_ww
angela97lin Jan 21, 2021
4dcddb9
fix tests
angela97lin Jan 21, 2021
0bc7136
Merge branch '1406_components_return_ww' of github.com:alteryx/evalml…
angela97lin Jan 21, 2021
7b4e4e5
major cleanup, condense component graph
angela97lin Jan 21, 2021
c21c3aa
oops fix test
angela97lin Jan 21, 2021
316d7d2
fix tests
angela97lin Jan 21, 2021
0c69dfb
fix more tests
angela97lin Jan 21, 2021
152d1ce
rename helper and add docstrings
angela97lin Jan 21, 2021
0ac7bb9
cleaning up docstrings and linting
angela97lin Jan 21, 2021
23edbce
more docstring updates
angela97lin Jan 21, 2021
35d5972
more cleanup of impl and docstrings
angela97lin Jan 21, 2021
96505ec
more cleanup
angela97lin Jan 21, 2021
17ed8be
some cleanup of unnecessary code in standard scaler
angela97lin Jan 21, 2021
1164572
make classification and time series classification predict same
angela97lin Jan 21, 2021
f88392d
fix tests and more cleanup
angela97lin Jan 21, 2021
8ec9d01
oops fix test
angela97lin Jan 21, 2021
7a4aae5
oops fix imputer
angela97lin Jan 21, 2021
37594b3
actually fixing tests
angela97lin Jan 21, 2021
66c04f4
fix delayed feature transformer not returning
angela97lin Jan 21, 2021
61d0ac0
clean up mock
angela97lin Jan 21, 2021
e0a839b
Merge branch 'main' into 1406_components_return_ww
angela97lin Jan 21, 2021
9394045
combine prediction compution to one function
angela97lin Jan 21, 2021
e7233f7
oops, fix typo
angela97lin Jan 22, 2021
6af7dfb
some final touchups
angela97lin Jan 22, 2021
c163d98
docstring update
angela97lin Jan 22, 2021
9ecbace
updating component graph impl and adding test
angela97lin Jan 22, 2021
ed9cbd6
Merge branch 'main' into 1406_components_return_ww
angela97lin Jan 23, 2021
bb1337d
merging
angela97lin Jan 25, 2021
6c1ef89
fix docs
angela97lin Jan 25, 2021
94ad165
fi
angela97lin Jan 26, 2021
e144a0a
lint and document
angela97lin Jan 26, 2021
6e41ee7
fix some tests
angela97lin Jan 26, 2021
0d8d540
merging and cleanup
angela97lin Jan 26, 2021
ade7fb5
fix docstr
angela97lin Jan 26, 2021
c084c4d
update tests
angela97lin Jan 26, 2021
a8149f7
test docstr update
angela97lin Jan 26, 2021
bd182e1
more cleanup, update partial dep impl
angela97lin Jan 26, 2021
11dd1af
some more cleanup of feature selector and baseline tests
angela97lin Jan 26, 2021
ecd99b1
Merge branch 'main' into 1406_components_return_ww
angela97lin Jan 26, 2021
8692830
clean up components notebook
angela97lin Jan 26, 2021
253ced5
Merge branch '1406_components_return_ww' of github.com:alteryx/evalml…
angela97lin Jan 26, 2021
f8eb2b1
Merge branch 'main' into 1406_components_return_ww
angela97lin Jan 26, 2021
0babaca
the tinest of docstring caps cleanup
angela97lin Jan 26, 2021
72389b7
Merge branch '1406_components_return_ww' of github.com:alteryx/evalml…
angela97lin Jan 26, 2021
f6d343f
Merge branch 'main' into 1406_components_return_ww
angela97lin Jan 27, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ Release Notes
* Changes
* Added labeling to ``graph_confusion_matrix`` :pr:`1632`
* Rerunning search for ``AutoMLSearch`` results in a message thrown rather than failing the search, and removed ``has_searched`` property :pr:`1647`
* Updated components and pipelines to return ``Woodwork`` data structures :pr:`1668`
* Changed tuner class to allow and ignore single parameter values as input :pr:`1686`
* Capped LightGBM version limit to remove bug in docs :pr:`1711`
* Removed support for `np.random.RandomState` in EvalML :pr:`1727`
Expand All @@ -64,6 +65,7 @@ Release Notes

**Breaking Changes**
* Removed ``has_searched`` property from ``AutoMLSearch`` :pr:`1647`
* Components and pipelines return ``Woodwork`` data structures instead of ``pandas`` data structures :pr:`1668`
* Removed support for `np.random.RandomState` in EvalML. Rather than passing ``np.random.RandomState`` as component and pipeline random_state values, we use int random_seed :pr:`1727`


Expand Down
150 changes: 37 additions & 113 deletions docs/source/user_guide/components.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -148,8 +148,11 @@
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from evalml.pipelines.components import Transformer\n",
"from evalml.utils.gen_utils import (\n",
" _convert_to_woodwork_structure,\n",
" _convert_woodwork_types_wrapper\n",
")\n",
"\n",
"class DropNullColumns(Transformer):\n",
" \"\"\"Transformer to drop features whose percentage of NaN values exceeds a specified threshold\"\"\"\n",
Expand All @@ -175,10 +178,19 @@
" random_state=random_state)\n",
"\n",
" def fit(self, X, y=None):\n",
" \"\"\"Fits DropNullColumns component to data\n",
"\n",
" Arguments:\n",
" X (list, ww.DataTable, pd.DataFrame): The input training data of shape [n_samples, n_features]\n",
" y (list, ww.DataColumn, pd.Series, np.ndarray, optional): The target training data of length [n_samples]\n",
"\n",
" Returns:\n",
" self\n",
" \"\"\"\n",
" pct_null_threshold = self.parameters[\"pct_null_threshold\"]\n",
" if not isinstance(X, pd.DataFrame):\n",
" X = pd.DataFrame(X)\n",
" percent_null = X.isnull().mean()\n",
" X_t = _convert_to_woodwork_structure(X)\n",
" X_t = _convert_woodwork_types_wrapper(X_t.to_dataframe())\n",
" percent_null = X_t.isnull().mean()\n",
" if pct_null_threshold == 0.0:\n",
" null_cols = percent_null[percent_null > 0]\n",
" else:\n",
Expand All @@ -188,16 +200,16 @@
"\n",
" def transform(self, X, y=None):\n",
" \"\"\"Transforms data X by dropping columns that exceed the threshold of null values.\n",
"\n",
" Arguments:\n",
" X (pd.DataFrame): Data to transform\n",
" y (pd.Series, optional): Targets\n",
" X (ww.DataTable, pd.DataFrame): Data to transform\n",
" y (ww.DataColumn, pd.Series, optional): Ignored.\n",
"\n",
" Returns:\n",
" pd.DataFrame: Transformed X\n",
" ww.DataTable: Transformed X\n",
" \"\"\"\n",
"\n",
" if not isinstance(X, pd.DataFrame):\n",
" X = pd.DataFrame(X)\n",
" return X.drop(columns=self._cols_to_drop, axis=1)"
" X_t = _convert_to_woodwork_structure(X)\n",
" return X_t.drop(self._cols_to_drop)"
]
},
{
Expand All @@ -214,9 +226,9 @@
"\n",
"- `__init__()` - the `__init__()` method of your transformer will need to call `super().__init__()` and pass three parameters in: a `parameters` dictionary holding the parameters to the component, the `component_obj`, and the `random_state` value. You can see that `component_obj` is set to `None` above and we will discuss `component_obj` in depth later on.\n",
"\n",
"- `fit()` - the `fit()` method is responsible for fitting your component on training data.\n",
"- `fit()` - the `fit()` method is responsible for fitting your component on training data. It should return the component object.\n",
"\n",
"- `transform()` - after fitting a component, the `transform()` method will take in new data and transform accordingly. Note: a component must call `fit()` before `transform()`.\n",
"- `transform()` - after fitting a component, the `transform()` method will take in new data and transform accordingly. It should return a Woodwork DataTable. Note: a component must call `fit()` before `transform()`.\n",
"\n",
"You can also call or override `fit_transform()` that combines `fit()` and `transform()` into one method."
]
Expand Down Expand Up @@ -252,14 +264,14 @@
" name = \"Baseline Regressor\"\n",
" hyperparameter_ranges = {}\n",
" model_family = ModelFamily.BASELINE\n",
" supported_problem_types = [ProblemTypes.REGRESSION]\n",
" supported_problem_types = [ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION]\n",
"\n",
" def __init__(self, strategy=\"mean\", random_state=0, **kwargs):\n",
" \"\"\"Baseline regressor that uses a simple strategy to make predictions.\n",
"\n",
" Arguments:\n",
" strategy (str): Method used to predict. Valid options are \"mean\", \"median\". Defaults to \"mean\".\n",
" random_state (int): Seed for the random number generator\n",
" random_state (int): Seed for the random number generator. Defaults to 0.\n",
"\n",
" \"\"\"\n",
" if strategy not in [\"mean\", \"median\"]:\n",
Expand All @@ -276,9 +288,9 @@
" def fit(self, X, y=None):\n",
" if y is None:\n",
" raise ValueError(\"Cannot fit Baseline regressor if y is None\")\n",
"\n",
" if not isinstance(y, pd.Series):\n",
" y = pd.Series(y)\n",
" X = _convert_to_woodwork_structure(X)\n",
" y = _convert_to_woodwork_structure(y)\n",
" y = _convert_woodwork_types_wrapper(y.to_series())\n",
"\n",
" if self.parameters[\"strategy\"] == \"mean\":\n",
" self._prediction_value = y.mean()\n",
Expand All @@ -288,7 +300,9 @@
" return self\n",
"\n",
" def predict(self, X):\n",
" return pd.Series([self._prediction_value] * len(X))\n",
" X = _convert_to_woodwork_structure(X)\n",
" predictions = pd.Series([self._prediction_value] * len(X))\n",
" return _convert_to_woodwork_structure(predictions)\n",
"\n",
" @property\n",
" def feature_importance(self):\n",
Expand All @@ -298,7 +312,7 @@
" np.ndarray (float): An array of zeroes\n",
"\n",
" \"\"\"\n",
" return np.zeros(self._num_features)"
" return np.zeros(self._num_features)\n"
]
},
{
Expand Down Expand Up @@ -402,45 +416,6 @@
"AutoML will perform a search over the allowed ranges for each parameter to select models which produce optimal performance within those ranges. AutoML gets the allowed ranges for each component from the component's `hyperparameter_ranges` class attribute. Any component parameter you add an entry for in `hyperparameter_ranges` will be included in the AutoML search. If parameters are omitted, AutoML will use the default value in all pipelines. "
]
},
{
"cell_type": "code",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is accidental duplicate code, deleting 😱

"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.linear_model import LinearRegression as SKLinearRegression\n",
"\n",
"from evalml.model_family import ModelFamily\n",
"from evalml.pipelines.components.estimators import Estimator\n",
"from evalml.problem_types import ProblemTypes\n",
"\n",
"class LinearRegressor(Estimator):\n",
" \"\"\"Linear Regressor.\"\"\"\n",
" name = \"Linear Regressor\"\n",
" hyperparameter_ranges = {\n",
" 'fit_intercept': [True, False],\n",
" 'normalize': [True, False]\n",
" }\n",
" model_family = ModelFamily.LINEAR_MODEL\n",
" supported_problem_types = [ProblemTypes.REGRESSION]\n",
"\n",
" def __init__(self, fit_intercept=True, normalize=False, n_jobs=-1, random_state=0, **kwargs):\n",
" parameters = {\n",
" 'fit_intercept': fit_intercept,\n",
" 'normalize': normalize,\n",
" 'n_jobs': n_jobs\n",
" }\n",
" parameters.update(kwargs)\n",
" linear_regressor = SKLinearRegression(**parameters)\n",
" super().__init__(parameters=parameters,\n",
" component_obj=linear_regressor,\n",
" random_state=random_state)\n",
"\n",
" @property\n",
" def feature_importance(self):\n",
" return self._component_obj.coef_"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -471,8 +446,7 @@
"outputs": [],
"source": [
"# this string can then be copy and pasted into a separate window and executed as python code\n",
"exec(code)\n",
"logisticRegressionClassifier"
"exec(code)"
]
},
{
Expand All @@ -481,60 +455,10 @@
"metadata": {},
"outputs": [],
"source": [
"# custom component\n",
"from evalml.pipelines.components import Transformer\n",
"import pandas as pd\n",
"# We can also do this for custom components\n",
"from evalml.pipelines.components.utils import generate_component_code\n",
"\n",
"class MyDropNullColumns(Transformer):\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to repeat this, I believe the only difference is the name and there's nothing special about this necessary for code gen so deleting!

" \"\"\"Transformer to drop features whose percentage of NaN values exceeds a specified threshold\"\"\"\n",
" name = \"My Drop Null Columns Transformer\"\n",
" hyperparameter_ranges = {}\n",
"\n",
" def __init__(self, pct_null_threshold=1.0, random_state=0, **kwargs):\n",
" \"\"\"Initalizes an transformer to drop features whose percentage of NaN values exceeds a specified threshold.\n",
"\n",
" Arguments:\n",
" pct_null_threshold(float): The percentage of NaN values in an input feature to drop.\n",
" Must be a value between [0, 1] inclusive. If equal to 0.0, will drop columns with any null values.\n",
" If equal to 1.0, will drop columns with all null values. Defaults to 0.95.\n",
" \"\"\"\n",
" if pct_null_threshold < 0 or pct_null_threshold > 1:\n",
" raise ValueError(\"pct_null_threshold must be a float between 0 and 1, inclusive.\")\n",
" parameters = {\"pct_null_threshold\": pct_null_threshold}\n",
" parameters.update(kwargs)\n",
"\n",
" self._cols_to_drop = None\n",
" super().__init__(parameters=parameters,\n",
" component_obj=None,\n",
" random_state=random_state)\n",
"\n",
" def fit(self, X, y=None):\n",
" pct_null_threshold = self.parameters[\"pct_null_threshold\"]\n",
" if not isinstance(X, pd.DataFrame):\n",
" X = pd.DataFrame(X)\n",
" percent_null = X.isnull().mean()\n",
" if pct_null_threshold == 0.0:\n",
" null_cols = percent_null[percent_null > 0]\n",
" else:\n",
" null_cols = percent_null[percent_null >= pct_null_threshold]\n",
" self._cols_to_drop = list(null_cols.index)\n",
" return self\n",
"\n",
" def transform(self, X, y=None):\n",
" \"\"\"Transforms data X by dropping columns that exceed the threshold of null values.\n",
" Arguments:\n",
" X (pd.DataFrame): Data to transform\n",
" y (pd.Series, optional): Targets\n",
" Returns:\n",
" pd.DataFrame: Transformed X\n",
" \"\"\"\n",
"\n",
" if not isinstance(X, pd.DataFrame):\n",
" X = pd.DataFrame(X)\n",
" return X.drop(columns=self._cols_to_drop, axis=1)\n",
" \n",
"myDropNull = MyDropNullColumns()\n",
"myDropNull = DropNullColumns()\n",
"print(generate_component_code(myDropNull))"
]
},
Expand Down
11 changes: 6 additions & 5 deletions docs/source/user_guide/pipelines.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,7 @@
"from evalml.pipelines.utils import generate_pipeline_code\n",
"from evalml.pipelines import MulticlassClassificationPipeline\n",
"import pandas as pd\n",
"from evalml.utils import _convert_to_woodwork_structure, _convert_woodwork_types_wrapper\n",
"\n",
"class MyDropNullColumns(Transformer):\n",
" \"\"\"Transformer to drop features whose percentage of NaN values exceeds a specified threshold\"\"\"\n",
Expand All @@ -415,8 +416,8 @@
"\n",
" def fit(self, X, y=None):\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating demo code to be a transformer that handles woodwork :') necessary for fitting of pipeline below!

" pct_null_threshold = self.parameters[\"pct_null_threshold\"]\n",
" if not isinstance(X, pd.DataFrame):\n",
" X = pd.DataFrame(X)\n",
" X = _convert_to_woodwork_structure(X)\n",
" X = _convert_woodwork_types_wrapper(X.to_dataframe())\n",
" percent_null = X.isnull().mean()\n",
" if pct_null_threshold == 0.0:\n",
" null_cols = percent_null[percent_null > 0]\n",
Expand All @@ -434,9 +435,9 @@
" pd.DataFrame: Transformed X\n",
" \"\"\"\n",
"\n",
" if not isinstance(X, pd.DataFrame):\n",
" X = pd.DataFrame(X)\n",
" return X.drop(columns=self._cols_to_drop, axis=1)\n",
" X = _convert_to_woodwork_structure(X)\n",
" return X.drop(columns=self._cols_to_drop)\n",
"\n",
"\n",
"class CustomPipeline(MulticlassClassificationPipeline):\n",
" name = \"Custom Pipeline\"\n",
Expand Down
2 changes: 1 addition & 1 deletion evalml/automl/automl_algorithm/automl_algorithm.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def __init__(self,
allowed_pipelines (list(class)): A list of PipelineBase subclasses indicating the pipelines allowed in the search. The default of None indicates all pipelines for this problem type are allowed.
max_iterations (int): The maximum number of iterations to be evaluated.
tuner_class (class): A subclass of Tuner, to be used to find parameters for each pipeline. The default of None indicates the SKOptTuner will be used.
random_state (int): The random seed. Defaults to 0.
random_state (int): Seed for the random number generator. Defaults to 0.
"""
self.random_state = get_random_seed(random_state)
self.allowed_pipelines = allowed_pipelines or []
Expand Down
2 changes: 1 addition & 1 deletion evalml/automl/automl_algorithm/iterative_algorithm.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def __init__(self,
allowed_pipelines (list(class)): A list of PipelineBase subclasses indicating the pipelines allowed in the search. The default of None indicates all pipelines for this problem type are allowed.
max_iterations (int): The maximum number of iterations to be evaluated.
tuner_class (class): A subclass of Tuner, to be used to find parameters for each pipeline. The default of None indicates the SKOptTuner will be used.
random_state (int): The random seed. Defaults to 0.
random_state (int): Seed for the random number generator. Defaults to 0.
pipelines_per_batch (int): The number of pipelines to be evaluated in each batch, after the first batch.
n_jobs (int or None): Non-negative integer describing level of parallelism used for pipelines.
number_features (int): The number of columns in the input features.
Expand Down
13 changes: 5 additions & 8 deletions evalml/automl/automl_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ def __init__(self,
additional_objectives (list): Custom set of objectives to score on.
Will override default objectives for problem type if not empty.

random_state (int): The random seed. Defaults to 0.
random_state (int): Seed for the random number generator. Defaults to 0.

n_jobs (int or None): Non-negative integer describing level of parallelism used for pipelines.
None and 1 are equivalent. If set to -1, all CPUs are used. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used.
Expand Down Expand Up @@ -566,8 +566,8 @@ def _tune_binary_threshold(self, pipeline, X_threshold_tuning, y_threshold_tunin

Arguments:
pipeline (Pipeline): Pipeline instance to threshold
X_threshold_tuning (ww DataTable): X data to tune pipeline to
y_threshold_tuning (ww DataColumn): Target data to tune pipeline to
X_threshold_tuning (ww.DataTable): X data to tune pipeline to
y_threshold_tuning (ww.DataColumn): Target data to tune pipeline to

Returns:
Trained pipeline instance
Expand All @@ -576,10 +576,7 @@ def _tune_binary_threshold(self, pipeline, X_threshold_tuning, y_threshold_tunin
pipeline.threshold = 0.5
if X_threshold_tuning:
y_predict_proba = pipeline.predict_proba(X_threshold_tuning)
if isinstance(y_predict_proba, pd.DataFrame):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup: we no longer need to worry about whether y_predict_proba is returning np/pd, so no need for this check :D

y_predict_proba = y_predict_proba.iloc[:, 1]
else:
y_predict_proba = y_predict_proba[:, 1]
y_predict_proba = y_predict_proba.iloc[:, 1]
pipeline.threshold = self.objective.optimize_threshold(y_predict_proba, y_threshold_tuning, X=X_threshold_tuning)
return pipeline

Expand Down Expand Up @@ -849,7 +846,7 @@ def get_pipeline(self, pipeline_id, random_state=0):

Arguments:
pipeline_id (int): pipeline to retrieve
random_state (int): The random seed. Defaults to 0.
random_state (int): Seed for the random number generator. Defaults to 0.

Returns:
PipelineBase: untrained pipeline instance associated with the provided ID
Expand Down
16 changes: 8 additions & 8 deletions evalml/automl/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,17 +39,17 @@ def make_data_splitter(X, y, problem_type, problem_configuration=None, n_splits=
"""Given the training data and ML problem parameters, compute a data splitting method to use during AutoML search.

Arguments:
X (pd.DataFrame, ww.DataTable): The input training data of shape [n_samples, n_features].
y (pd.Series, ww.DataColumn): The target training data of length [n_samples].
problem_type (ProblemType): the type of machine learning problem.
X (ww.DataTable, pd.DataFrame): The input training data of shape [n_samples, n_features].
y (ww.DataColumn, pd.Series): The target training data of length [n_samples].
problem_type (ProblemType): The type of machine learning problem.
problem_configuration (dict, None): Additional parameters needed to configure the search. For example,
in time series problems, values should be passed in for the gap and max_delay variables.
n_splits (int, None): the number of CV splits, if applicable. Default 3.
shuffle (bool): whether or not to shuffle the data before splitting, if applicable. Default True.
random_state (int): The random seed. Defaults to 0.
in time series problems, values should be passed in for the gap and max_delay variables. Defaults to None.
n_splits (int, None): The number of CV splits, if applicable. Defaults to 3.
shuffle (bool): Whether or not to shuffle the data before splitting, if applicable. Defaults to True.
random_state (int): Seed for the random number generator. Defaults to 0.

Returns:
sklearn.model_selection.BaseCrossValidator: data splitting method.
sklearn.model_selection.BaseCrossValidator: Data splitting method.
"""
problem_type = handle_problem_types(problem_type)
data_splitter = None
Expand Down
Loading