Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove _needs_fitting attribute from Components #398

Merged
merged 31 commits into from Feb 26, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
5231bd6
remove _needs_fitting
jeremyliweishih Feb 19, 2020
f560ae8
Edit changelog
jeremyliweishih Feb 19, 2020
2e10e1e
Switch to RunTimeError from Transformer
jeremyliweishih Feb 19, 2020
f079c83
Merge branch 'master' into 366_needs_fitting
jeremyliweishih Feb 19, 2020
6d139f6
Fit then transform
jeremyliweishih Feb 19, 2020
266e869
Add y to transform
jeremyliweishih Feb 19, 2020
44a0515
Remmove last _needs_fitting
jeremyliweishih Feb 19, 2020
978fc26
Merge branch '366_needs_fitting' of https://github.com/FeatureLabs/ev…
jeremyliweishih Feb 19, 2020
d00fc98
Merge branch 'master' into 366_needs_fitting
jeremyliweishih Feb 19, 2020
f0cc076
Refactor all transformers to have optional y
jeremyliweishih Feb 20, 2020
3686cbe
Merge branch '366_needs_fitting' of https://github.com/FeatureLabs/ev…
jeremyliweishih Feb 20, 2020
47a4639
Cap xgboost for now
jeremyliweishih Feb 20, 2020
af324ab
Add test for fit_transform
jeremyliweishih Feb 20, 2020
f1a171b
Merge branch 'master' into 366_needs_fitting
jeremyliweishih Feb 20, 2020
8b49718
Add specialized error for missing method/attributes
jeremyliweishih Feb 24, 2020
2a666de
Merge branch '366_needs_fitting' of https://github.com/FeatureLabs/ev…
jeremyliweishih Feb 24, 2020
3e59341
Reword y param docstring
jeremyliweishih Feb 24, 2020
22b224d
Edit test to also use Y
jeremyliweishih Feb 24, 2020
604b39e
Try both fit_transform and fit then transform
jeremyliweishih Feb 24, 2020
6d09a2c
lint
jeremyliweishih Feb 24, 2020
6a8d24e
Fix tests with fit than transform
jeremyliweishih Feb 24, 2020
f2908a0
Merge branch 'master' into 366_needs_fitting
jeremyliweishih Feb 24, 2020
9635d59
lint
jeremyliweishih Feb 24, 2020
431995c
Merge branch '366_needs_fitting' of https://github.com/FeatureLabs/ev…
jeremyliweishih Feb 24, 2020
52024b9
Add docstring to suggestions
jeremyliweishih Feb 25, 2020
ad0d50d
Add all cases for fit_transform tests
jeremyliweishih Feb 25, 2020
2cb5e98
Merge branch 'master' into 366_needs_fitting
jeremyliweishih Feb 25, 2020
b568531
Merge branch 'master' into 366_needs_fitting
jeremyliweishih Feb 25, 2020
3750fc2
Merge branch 'master' into 366_needs_fitting
jeremyliweishih Feb 26, 2020
5ed2f49
Edit docstring
jeremyliweishih Feb 26, 2020
e4e161d
Merge branch 'master' into 366_needs_fitting
jeremyliweishih Feb 26, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/changelog.rst
Expand Up @@ -15,6 +15,7 @@ Changelog
* Remove unused parameter ObjectiveBase.fit_needs_proba :pr:`320`
* Remove extraneous parameter component_type from all components :pr:`361`
* Remove unused rankings.csv file :pr:`397`
* Remove `_needs_fitting` attribute from Components :pr:`398`
* Changed plot.feature_importance to show only non-zero feature importances by default, added optional parameter to show all :pr:`413`
* Documentation Changes
* Update release.md with instructions to release to internal license key :pr:`354`
Expand Down
2 changes: 2 additions & 0 deletions evalml/exceptions/__init__.py
@@ -0,0 +1,2 @@
# flake8:noqa
from .exceptions import MethodPropertyNotFoundError
3 changes: 3 additions & 0 deletions evalml/exceptions/exceptions.py
@@ -0,0 +1,3 @@
class MethodPropertyNotFoundError(Exception):
"""Exception to raise when a class is does not have an expected method or property."""
pass
jeremyliweishih marked this conversation as resolved.
Show resolved Hide resolved
5 changes: 3 additions & 2 deletions evalml/pipelines/components/component_base.py
@@ -1,3 +1,4 @@
from evalml.exceptions import MethodPropertyNotFoundError
from evalml.utils import Logger


Expand All @@ -8,7 +9,7 @@ def __init__(self, parameters, component_obj, random_state):
self.parameters = parameters
self.logger = Logger()

attributes_to_check = ['_needs_fitting', "name"]
jeremyliweishih marked this conversation as resolved.
Show resolved Hide resolved
attributes_to_check = ["name"]

for attribute in attributes_to_check:
if not hasattr(self, attribute):
Expand All @@ -28,7 +29,7 @@ def fit(self, X, y=None):
self._component_obj.fit(X, y)
return self
except AttributeError:
raise RuntimeError("Component requires a fit method or a component_obj that implements fit")
raise MethodPropertyNotFoundError("Component requires a fit method or a component_obj that implements fit")

def describe(self, print_name=False, return_dict=False):
"""Describe a component and its parameters
Expand Down
Expand Up @@ -17,7 +17,6 @@ class CatBoostClassifier(Estimator):
For more information, check out https://catboost.ai/
"""
name = "CatBoost Classifier"
_needs_fitting = True
hyperparameter_ranges = {
"n_estimators": Integer(10, 1000),
"eta": Real(0, 1),
Expand Down
Expand Up @@ -12,7 +12,6 @@ class LogisticRegressionClassifier(Estimator):
Logistic Regression Classifier
"""
name = "Logistic Regression Classifier"
_needs_fitting = True
hyperparameter_ranges = {
"penalty": ["l2"],
"C": Real(.01, 10),
Expand Down
Expand Up @@ -9,7 +9,6 @@
class RandomForestClassifier(Estimator):
"""Random Forest Classifier"""
name = "Random Forest Classifier"
_needs_fitting = True
hyperparameter_ranges = {
"n_estimators": Integer(10, 1000),
"max_depth": Integer(1, 32),
Expand Down
Expand Up @@ -9,7 +9,6 @@
class XGBoostClassifier(Estimator):
"""XGBoost Classifier"""
name = "XGBoost Classifier"
_needs_fitting = True
hyperparameter_ranges = {
"eta": Real(0, 1),
"max_depth": Integer(1, 20),
Expand Down
7 changes: 4 additions & 3 deletions evalml/pipelines/components/estimators/estimator.py
@@ -1,3 +1,4 @@
from evalml.exceptions import MethodPropertyNotFoundError
from evalml.pipelines.components import ComponentBase


Expand All @@ -16,7 +17,7 @@ def predict(self, X):
try:
return self._component_obj.predict(X)
except AttributeError:
raise RuntimeError("Estimator requires a predict method or a component_obj that implements predict")
raise MethodPropertyNotFoundError("Estimator requires a predict method or a component_obj that implements predict")

def predict_proba(self, X):
"""Make probability estimates for labels.
Expand All @@ -30,11 +31,11 @@ def predict_proba(self, X):
try:
return self._component_obj.predict_proba(X)
except AttributeError:
raise RuntimeError("Estimator requires a predict_proba method or a component_obj that implements predict_proba")
raise MethodPropertyNotFoundError("Estimator requires a predict_proba method or a component_obj that implements predict_proba")

@property
def feature_importances(self):
try:
return self._component_obj.feature_importances_
except AttributeError:
raise RuntimeError("Estimator requires a feature_importances property or a component_obj that implements feature_importances_")
raise MethodPropertyNotFoundError("Estimator requires a feature_importances property or a component_obj that implements feature_importances_")
Expand Up @@ -14,7 +14,6 @@ class CatBoostRegressor(Estimator):
For more information, check out https://catboost.ai/
"""
name = "CatBoost Regressor"
_needs_fitting = True
hyperparameter_ranges = {
"n_estimators": Integer(10, 1000),
"eta": Real(0, 1),
Expand Down
Expand Up @@ -8,7 +8,6 @@
class LinearRegressor(Estimator):
"""Linear Regressor"""
name = "Linear Regressor"
_needs_fitting = True
hyperparameter_ranges = {
'fit_intercept': [True, False],
'normalize': [True, False]
Expand Down
Expand Up @@ -9,7 +9,6 @@
class RandomForestRegressor(Estimator):
"""Random Forest Regressor"""
name = "Random Forest Regressor"
_needs_fitting = True
hyperparameter_ranges = {
"n_estimators": Integer(10, 1000),
"max_depth": Integer(1, 32),
Expand Down
Expand Up @@ -7,7 +7,6 @@ class OneHotEncoder(CategoricalEncoder):

"""Creates one-hot encoding for non-numeric data"""
name = 'One Hot Encoder'
_needs_fitting = True
hyperparameter_ranges = {}

def __init__(self):
Expand Down
Expand Up @@ -24,12 +24,12 @@ def get_names(self):
selected_masks = self._component_obj.get_support()
return [feature_name for (selected, feature_name) in zip(selected_masks, self.input_feature_names) if selected]

def transform(self, X):
"""Transforms data X
def transform(self, X, y=None):
jeremyliweishih marked this conversation as resolved.
Show resolved Hide resolved
"""Transforms data X by selecting features

Arguments:
X (pd.DataFrame): Data to transform

y (pd.Series, optional): Input Labels
Returns:
pd.DataFrame: Transformed X
"""
Expand All @@ -50,11 +50,11 @@ def transform(self, X):
raise RuntimeError("Transformer requires a transform method or a component_obj that implements transform")

def fit_transform(self, X, y=None):
"""Fits on X and transforms X
"""Fits feature selector on data X then transforms X by selecting features

Arguments:
X (pd.DataFrame): Data to fit and transform

y (pd.Series): Labels to fit and transform
Returns:
pd.DataFrame: Transformed X
"""
Expand Down
Expand Up @@ -9,7 +9,6 @@
class RFClassifierSelectFromModel(FeatureSelector):
"""Selects top features based on importance weights using a Random Forest classifier"""
name = 'RF Classifier Select From Model'
_needs_fitting = True
hyperparameter_ranges = {
"percent_features": Real(.01, 1),
"threshold": ['mean', -np.inf]
Expand Down
Expand Up @@ -9,7 +9,6 @@
class RFRegressorSelectFromModel(FeatureSelector):
"""Selects top features based on importance weights using a Random Forest regressor"""
name = 'RF Regressor Select From Model'
_needs_fitting = True
hyperparameter_ranges = {
"percent_features": Real(.01, 1),
"threshold": ['mean', -np.inf]
Expand Down
Expand Up @@ -7,7 +7,6 @@
class SimpleImputer(Transformer):
"""Imputes missing data with either mean, median and most_frequent for numerical data or most_frequent for categorical data"""
name = 'Simple Imputer'
_needs_fitting = True
hyperparameter_ranges = {"impute_strategy": ["mean", "median", "most_frequent"]}

def __init__(self, impute_strategy="most_frequent"):
Expand All @@ -17,14 +16,30 @@ def __init__(self, impute_strategy="most_frequent"):
component_obj=imputer,
random_state=0)

def transform(self, X):
def transform(self, X, y=None):
"""Transforms data X by imputing missing values

Arguments:
X (pd.DataFrame): Data to transform
y (pd.Series, optional): Input Labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the optional should go somewhere after the colon?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry optional inside the parentheses is currently our convention in the codebase. It would be easier to keep this as is and keep track of the convention in a separate issue if we want to change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool, didn't know.

Returns:
pd.DataFrame: Transformed X
"""
X_t = self._component_obj.transform(X)
if not isinstance(X_t, pd.DataFrame) and isinstance(X, pd.DataFrame):
# skLearn's SimpleImputer loses track of column type, so we need to restore
X_t = pd.DataFrame(X_t, columns=X.columns, index=X.index).astype(X.dtypes.to_dict())
return X_t

def fit_transform(self, X, y=None):
"""Fits imputer on data X then imputes missing values in X

Arguments:
X (pd.DataFrame): Data to fit and transform
y (pd.Series): Labels to fit and transform
Returns:
pd.DataFrame: Transformed X
"""
X_t = self._component_obj.fit_transform(X, y)
if not isinstance(X_t, pd.DataFrame) and isinstance(X, pd.DataFrame):
# skLearn's SimpleImputer loses track of column type, so we need to restore
Expand Down
Expand Up @@ -6,7 +6,6 @@
class StandardScaler(Transformer):
"""Standardize features: removes mean and scales to unit variance"""
name = "Standard Scaler"
_needs_fitting = True
hyperparameter_ranges = {}

def __init__(self):
Expand Down
22 changes: 14 additions & 8 deletions evalml/pipelines/components/transformers/transformer.py
@@ -1,5 +1,6 @@
import pandas as pd

from evalml.exceptions import MethodPropertyNotFoundError
from evalml.pipelines.components import ComponentBase


Expand All @@ -8,12 +9,12 @@ class Transformer(ComponentBase):
These components are used before an estimator.
"""

def transform(self, X):
def transform(self, X, y=None):
jeremyliweishih marked this conversation as resolved.
Show resolved Hide resolved
"""Transforms data X

Arguments:
X (pd.DataFrame): Data to transform

y (pd.Series, optional): Input Labels
Returns:
pd.DataFrame: Transformed X
"""
Expand All @@ -23,21 +24,26 @@ def transform(self, X):
X_t = pd.DataFrame(X_t, columns=X.columns, index=X.index)
return X_t
except AttributeError:
raise RuntimeError("Transformer requires a transform method or a component_obj that implements transform")
raise MethodPropertyNotFoundError("Transformer requires a transform method or a component_obj that implements transform")

def fit_transform(self, X, y=None):
"""Fits on X and transforms X

Arguments:
X (pd.DataFrame): Data to fit and transform

y (pd. DataFrame): Labels to fit and transform
Returns:
pd.DataFrame: Transformed X
"""
try:
X_t = self._component_obj.fit_transform(X, y)
if not isinstance(X_t, pd.DataFrame) and isinstance(X, pd.DataFrame):
X_t = pd.DataFrame(X_t, columns=X.columns, index=X.index)
return X_t
except AttributeError:
raise RuntimeError("Transformer requires a fit_transform method or a component_obj that implements fit_transform")
try:
self.fit(X, y)
X_t = self.transform(X, y)
except MethodPropertyNotFoundError as e:
raise e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this. Let's make sure we have test coverage for:

  • Component has valid fit_transform method
  • Component has fit_transform method but it throws an exception, like a RuntimeError
  • Component has no fit_transform method, but fit and transform work
  • Component has no fit_transform method, and fit or transform throw


if not isinstance(X_t, pd.DataFrame) and isinstance(X, pd.DataFrame):
X_t = pd.DataFrame(X_t, columns=X.columns, index=X.index)
return X_t
6 changes: 2 additions & 4 deletions evalml/pipelines/pipeline_base.py
Expand Up @@ -134,10 +134,8 @@ def _fit(self, X, y):
y_t = y
for component in self.component_list[:-1]:
self.input_feature_names.update({component.name: list(pd.DataFrame(X_t))})
if component._needs_fitting:
X_t = component.fit_transform(X_t, y_t)
else:
X_t = component.transform(X_t, y_t)
X_t = component.fit_transform(X_t, y_t)

self.input_feature_names.update({self.estimator.name: list(pd.DataFrame(X_t))})
self.estimator.fit(X_t, y_t)

Expand Down