Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor for model family #507

Merged
merged 29 commits into from Mar 19, 2020
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
ed433e5
Add model_family'
jeremyliweishih Mar 18, 2020
e23607d
Update pipeline utils
jeremyliweishih Mar 18, 2020
201dfa0
Add ABC and classproperty
jeremyliweishih Mar 18, 2020
920a1d2
Remove class property'
jeremyliweishih Mar 18, 2020
a767529
Revert "Remove class property'"
jeremyliweishih Mar 18, 2020
695d725
Fix component and add to pipelinebase
jeremyliweishih Mar 18, 2020
f5a477c
Remove model type from pipelines
jeremyliweishih Mar 18, 2020
22401fd
Add model family to estimators
jeremyliweishih Mar 18, 2020
38e672a
Add none model_family to transformer
jeremyliweishih Mar 18, 2020
8279452
Refactor tests
jeremyliweishih Mar 18, 2020
d9e8461
Add model_family for autobase
jeremyliweishih Mar 18, 2020
2110c32
lint
jeremyliweishih Mar 18, 2020
98ca1e1
Handle compoent graph
jeremyliweishih Mar 18, 2020
4169812
Fix describe'
jeremyliweishih Mar 18, 2020
76b9b6b
Rename to allowed_model_families
jeremyliweishih Mar 18, 2020
b99a2de
fix docs
jeremyliweishih Mar 18, 2020
b087991
changelog
jeremyliweishih Mar 18, 2020
e53b179
Merge branch 'master' of https://github.com/FeatureLabs/evalml into j…
jeremyliweishih Mar 18, 2020
f989d1f
Fix describe test
jeremyliweishih Mar 18, 2020
1a1fbbc
Fix api reference
jeremyliweishih Mar 18, 2020
86a86fd
Merge branch 'master' into js_379_model_family
jeremyliweishih Mar 18, 2020
6e14373
Merge branch 'master' into js_379_model_family
jeremyliweishih Mar 18, 2020
270a980
add breaking changes
jeremyliweishih Mar 19, 2020
6b16e51
Move return statement
jeremyliweishih Mar 19, 2020
b6f67eb
Add none modelfamily
jeremyliweishih Mar 19, 2020
13f586e
Make none explicit in transformer
jeremyliweishih Mar 19, 2020
3fd74fc
Add unit tests for component model_family
jeremyliweishih Mar 19, 2020
560a2bd
Merge branch 'js_379_model_family' of https://github.com/FeatureLabs/…
jeremyliweishih Mar 19, 2020
da0e987
Merge branch 'master' of https://github.com/FeatureLabs/evalml into j…
jeremyliweishih Mar 19, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 11 additions & 3 deletions docs/source/api_reference.rst
Expand Up @@ -60,16 +60,24 @@ Plotting
AutoClassificationSearch.plot.normalize_confusion_matrix


Model Types
.. currentmodule:: evalml.model_family

Model Family
===========

.. autosummary::
:toctree: generated
:template: class.rst
:nosignatures:

ModelFamily

.. autosummary::
:toctree: generated
:nosignatures:

list_model_types
list_model_families

.. currentmodule:: evalml.pipelines.components

Components
==========
Expand Down
10 changes: 5 additions & 5 deletions docs/source/automl/pipeline_search.ipynb
Expand Up @@ -134,7 +134,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"By default, all model types are considered. You can control which model types to search with the `model_types` parameters"
"By default, all model types are considered. You can control which model types to search with the `allowed_model_families` parameters"
]
},
{
Expand All @@ -144,7 +144,7 @@
"outputs": [],
"source": [
"automl = AutoClassificationSearch(objective=\"f1\",\n",
" model_types=[\"random_forest\"])"
" allowed_model_families=[\"random_forest\"])"
]
},
{
Expand Down Expand Up @@ -176,7 +176,7 @@
"metadata": {},
"outputs": [],
"source": [
"evalml.list_model_types(\"binary\") # `binary` for binary classification and `multiclass` for multiclass classification"
"evalml.list_model_families(\"binary\") # `binary` for binary classification and `multiclass` for multiclass classification"
]
},
{
Expand All @@ -185,7 +185,7 @@
"metadata": {},
"outputs": [],
"source": [
"evalml.list_model_types(\"regression\")"
"evalml.list_model_families(\"regression\")"
]
},
{
Expand Down Expand Up @@ -319,4 +319,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}
1 change: 1 addition & 0 deletions docs/source/changelog.rst
Expand Up @@ -11,6 +11,7 @@ Changelog
* Undo version cap in XGBoost placed in :pr:`402` and allowed all released of XGBoost :pr:`407`
* Support pandas 1.0.0 :pr:`486`
* Made all references to the logger static :pr:`503`
* Refactored `model_type` parameter for components and pipelines to `model_family` :pr:`507`
jeremyliweishih marked this conversation as resolved.
Show resolved Hide resolved
* Documentation Changes
* Updated API reference to remove PipelinePlot and added moved PipelineBase plotting methods :pr:`483`
* Add code style and github issue guides :pr:`463`
Expand Down
2 changes: 1 addition & 1 deletion docs/source/guardrails/overfitting.ipynb
Expand Up @@ -49,7 +49,7 @@
"\n",
"automl = evalml.AutoClassificationSearch(\n",
" max_pipelines=1,\n",
" model_types=[\"linear_model\"],\n",
" allowed_model_families=[\"linear_model\"],\n",
")\n",
"\n",
"automl.search(X, y)"
Expand Down
4 changes: 2 additions & 2 deletions evalml/__init__.py
Expand Up @@ -12,15 +12,15 @@
import skopt

import evalml.demos
import evalml.model_types
import evalml.model_family
import evalml.objectives
import evalml.pipelines
import evalml.preprocessing
import evalml.problem_types
import evalml.utils
import evalml.guardrails

from evalml.pipelines import list_model_types, save_pipeline, load_pipeline
from evalml.pipelines import list_model_families, save_pipeline, load_pipeline
from evalml.automl import AutoClassificationSearch, AutoRegressionSearch

warnings.filterwarnings("ignore", category=FutureWarning)
Expand Down
14 changes: 8 additions & 6 deletions evalml/automl/auto_base.py
Expand Up @@ -27,22 +27,24 @@ class AutoBase:
plot = PipelineSearchPlots

def __init__(self, problem_type, tuner, cv, objective, max_pipelines, max_time,
patience, tolerance, model_types, detect_label_leakage, start_iteration_callback,
patience, tolerance, allowed_model_families, detect_label_leakage, start_iteration_callback,
add_result_callback, additional_objectives, random_state, n_jobs, verbose):
if tuner is None:
tuner = SKOptTuner
self.objective = get_objective(objective)
self.problem_type = problem_type
self.max_pipelines = max_pipelines
self.model_types = model_types
self.allowed_model_families = allowed_model_families
self.detect_label_leakage = detect_label_leakage
self.start_iteration_callback = start_iteration_callback
self.add_result_callback = add_result_callback
self.cv = cv
self.verbose = verbose
logger.verbose = verbose
self.possible_pipelines = get_pipelines(problem_type=self.problem_type, model_types=model_types)
self.possible_pipelines = get_pipelines(problem_type=self.problem_type, model_families=allowed_model_families)
self.objective = get_objective(objective)

logger.verbose = verbose

if self.problem_type not in self.objective.problem_types:
raise ValueError("Given objective {} is not compatible with a {} problem.".format(self.objective.name, self.problem_type.value))

Expand Down Expand Up @@ -82,7 +84,7 @@ def __init__(self, problem_type, tuner, cv, objective, max_pipelines, max_time,
np.random.seed(seed=self.random_state)

self.n_jobs = n_jobs
self.possible_model_types = list(set([p.model_type for p in self.possible_pipelines]))
self.possible_model_families = list(set([p.model_family for p in self.possible_pipelines]))

self.tuners = {}
self.search_spaces = {}
Expand Down Expand Up @@ -149,7 +151,7 @@ def search(self, X, y, feature_types=None, raise_errors=False, show_iteration_pl
logger.log("Searching up to %s pipelines. " % self.max_pipelines)
if self.max_time:
logger.log("Will stop searching for new pipelines after %d seconds.\n" % self.max_time)
logger.log("Possible model types: %s\n" % ", ".join([model.value for model in self.possible_model_types]))
logger.log("Possible model families: %s\n" % ", ".join([model.value for model in self.possible_model_families]))

if self.detect_label_leakage:
leaked = guardrails.detect_label_leakage(X, y)
Expand Down
8 changes: 4 additions & 4 deletions evalml/automl/auto_classification_search.py
Expand Up @@ -17,7 +17,7 @@ def __init__(self,
max_time=None,
patience=None,
tolerance=None,
model_types=None,
allowed_model_families=None,
cv=None,
tuner=None,
detect_label_leakage=True,
Expand Down Expand Up @@ -48,8 +48,8 @@ def __init__(self,
tolerance (float): Minimum percentage difference to qualify as score improvement for early stopping.
Only applicable if patience is not None. Defaults to None.

model_types (list): The model types to search. By default searches over all
model_types. Run evalml.list_model_types("classification") to see options.
allowed_model_families (list): The model families to search. By default searches over all
model families. Run evalml.list_model_families("classification") to see options.

cv: cross validation method to use. By default StratifiedKFold

Expand Down Expand Up @@ -96,7 +96,7 @@ def __init__(self,
max_time=max_time,
patience=patience,
tolerance=tolerance,
model_types=model_types,
allowed_model_families=allowed_model_families,
problem_type=problem_type,
detect_label_leakage=detect_label_leakage,
start_iteration_callback=start_iteration_callback,
Expand Down
8 changes: 4 additions & 4 deletions evalml/automl/auto_regression_search.py
Expand Up @@ -16,7 +16,7 @@ def __init__(self,
max_time=None,
patience=None,
tolerance=None,
model_types=None,
allowed_model_families=None,
cv=None,
tuner=None,
detect_label_leakage=True,
Expand All @@ -39,8 +39,8 @@ def __init__(self,
has elapsed. If it is an integer, then the time will be in seconds.
For strings, time can be specified as seconds, minutes, or hours.

model_types (list): The model types to search. By default searches over all
model_types. Run evalml.list_model_types("regression") to see options.
allowed_model_families (list): The model families to search. By default searches over all
model families. Run evalml.list_model_families("regression") to see options.

patience (int): Number of iterations without improvement to stop search early. Must be positive.
If None, early stopping is disabled. Defaults to None.
Expand Down Expand Up @@ -88,7 +88,7 @@ def __init__(self,
max_time=max_time,
patience=patience,
tolerance=tolerance,
model_types=model_types,
allowed_model_families=allowed_model_families,
problem_type=problem_type,
detect_label_leakage=detect_label_leakage,
start_iteration_callback=start_iteration_callback,
Expand Down
3 changes: 3 additions & 0 deletions evalml/model_family/__init__.py
@@ -0,0 +1,3 @@
# flake8:noqas
from .model_family import ModelFamily
from .utils import handle_model_family
16 changes: 16 additions & 0 deletions evalml/model_family/model_family.py
@@ -0,0 +1,16 @@
from enum import Enum


class ModelFamily(Enum):
"""Enum for family of machine learning models."""
RANDOM_FOREST = 'random_forest'
XGBOOST = 'xgboost'
LINEAR_MODEL = 'linear_model'
CATBOOST = 'catboost'

def __str__(self):
model_family_dict = {ModelFamily.RANDOM_FOREST.name: "Random Forest",
jeremyliweishih marked this conversation as resolved.
Show resolved Hide resolved
ModelFamily.XGBOOST.name: "XGBoost Classifier",
ModelFamily.LINEAR_MODEL.name: "Linear Model",
ModelFamily.CATBOOST.name: "CatBoost Classifier"}
jeremyliweishih marked this conversation as resolved.
Show resolved Hide resolved
return model_family_dict[self.name]
20 changes: 20 additions & 0 deletions evalml/model_family/utils.py
@@ -0,0 +1,20 @@
from .model_family import ModelFamily


def handle_model_family(model_family):
"""Handles model_family by either returning the ModelFamily or converting from a str
Args:
model_family (str or ModelFamily) : model type that needs to be handled
Returns:
ModelFamily
"""

if isinstance(model_family, str):
try:
tpe = ModelFamily[model_family.upper()]
except KeyError:
raise KeyError('Model family \'{}\' does not exist'.format(model_family))
return tpe
jeremyliweishih marked this conversation as resolved.
Show resolved Hide resolved
if isinstance(model_family, ModelFamily):
return model_family
raise ValueError('`handle_model_family` was not passed a str or ModelFamily object')
3 changes: 0 additions & 3 deletions evalml/model_types/__init__.py

This file was deleted.

16 changes: 0 additions & 16 deletions evalml/model_types/model_types.py

This file was deleted.

22 changes: 0 additions & 22 deletions evalml/model_types/utils.py

This file was deleted.

2 changes: 1 addition & 1 deletion evalml/pipelines/__init__.py
Expand Up @@ -33,7 +33,7 @@
)
from .utils import (
get_pipelines,
list_model_types,
list_model_families,
load_pipeline,
save_pipeline
)
4 changes: 2 additions & 2 deletions evalml/pipelines/classification/catboost.py
@@ -1,6 +1,6 @@
from skopt.space import Integer, Real

from evalml.model_types import ModelTypes
from evalml.model_family import ModelFamily
from evalml.pipelines import PipelineBase


Expand All @@ -13,7 +13,7 @@ class CatBoostClassificationPipeline(PipelineBase):
Note: impute_strategy must support both string and numeric data
"""
name = "CatBoost Classifier w/ Simple Imputer"
model_type = ModelTypes.CATBOOST
model_family = ModelFamily.CATBOOST
component_graph = ['Simple Imputer', 'CatBoost Classifier']
problem_types = ['binary', 'multiclass']
hyperparameters = {
Expand Down
2 changes: 0 additions & 2 deletions evalml/pipelines/classification/logistic_regression.py
@@ -1,13 +1,11 @@
from skopt.space import Real

from evalml.model_types import ModelTypes
from evalml.pipelines import PipelineBase


class LogisticRegressionPipeline(PipelineBase):
"""Logistic Regression Pipeline for both binary and multiclass classification"""
name = "Logistic Regression Classifier w/ One Hot Encoder + Simple Imputer + Standard Scaler"
model_type = ModelTypes.LINEAR_MODEL
component_graph = ['One Hot Encoder', 'Simple Imputer', 'Standard Scaler', 'Logistic Regression Classifier']
problem_types = ['binary', 'multiclass']

Expand Down
2 changes: 0 additions & 2 deletions evalml/pipelines/classification/random_forest.py
@@ -1,13 +1,11 @@
from skopt.space import Integer, Real

from evalml.model_types import ModelTypes
from evalml.pipelines import PipelineBase


class RFClassificationPipeline(PipelineBase):
"""Random Forest Pipeline for both binary and multiclass classification"""
name = "Random Forest Classifier w/ One Hot Encoder + Simple Imputer + RF Classifier Select From Model"
model_type = ModelTypes.RANDOM_FOREST
component_graph = ['One Hot Encoder', 'Simple Imputer', 'RF Classifier Select From Model', 'Random Forest Classifier']
problem_types = ['binary', 'multiclass']

Expand Down
2 changes: 0 additions & 2 deletions evalml/pipelines/classification/xgboost.py
@@ -1,14 +1,12 @@
from skopt.space import Integer, Real

from evalml.model_types import ModelTypes
from evalml.pipelines import PipelineBase
from evalml.problem_types import ProblemTypes


class XGBoostPipeline(PipelineBase):
"""XGBoost Pipeline for both binary and multiclass classification"""
name = "XGBoost Classifier w/ One Hot Encoder + Simple Imputer + RF Classifier Select From Model"
model_type = ModelTypes.XGBOOST
component_graph = ['One Hot Encoder', 'Simple Imputer', 'RF Classifier Select From Model', 'XGBoost Classifier']
problem_types = [ProblemTypes.BINARY, ProblemTypes.MULTICLASS]

Expand Down