diff --git a/docs/api/regressor/alphas.rst b/docs/api/regressor/alphas.rst index d8a55eebb..7f8c22403 100644 --- a/docs/api/regressor/alphas.rst +++ b/docs/api/regressor/alphas.rst @@ -5,7 +5,7 @@ Alpha Selection Regularization is designed to penalize model complexity, therefore the higher the alpha, the less complex the model, decreasing the error due to variance (overfit). Alphas that are too high on the other hand increase the error due to bias (underfit). It is important, therefore to choose an optimal alpha such that the error is minimized in both directions. -The AlphaSelection Visualizer demonstrates how different values of alpha influence model selection during the regularization of linear models. Generally speaking, alpha increases the affect of regularization, e.g. if alpha is zero there is no regularization and the higher the alpha, the more the regularization parameter influences the final model. +The ``AlphaSelection`` Visualizer demonstrates how different values of alpha influence model selection during the regularization of linear models. Generally speaking, alpha increases the affect of regularization, e.g. if alpha is zero there is no regularization and the higher the alpha, the more the regularization parameter influences the final model. ================= ============================== Visualizer :class:`~yellowbrick.regressor.alphas.AlphaSelection` @@ -14,6 +14,22 @@ Models Regression Workflow Model selection, Hyperparameter tuning ================= ============================== +For Estimators *with* Built-in Cross-Validation +----------------------------------------------- + +The ``AlphaSelection`` visualizer wraps a "RegressionCV" model and +visualizes the alpha/error curve. Use this visualization to detect if +the model is responding to regularization, e.g. as you increase or +decrease alpha, the model responds and error is decreased. If the +visualization shows a jagged or random plot, then potentially the model +is not sensitive to that type of regularization and another is required +(e.g. L1 or ``Lasso`` regularization). + +.. NOTE:: + The ``AlphaSelection`` visualizer requires a "RegressorCV" model, e.g. + a specialized class that performs cross-validated alpha-selection + on behalf of the model. See the ``ManualAlphaSelection`` visualizer if + your regression model does not include cross-validation. .. plot:: :context: close-figs @@ -22,8 +38,8 @@ Workflow Model selection, Hyperparameter tuning import numpy as np from sklearn.linear_model import LassoCV - from yellowbrick.regressor import AlphaSelection from yellowbrick.datasets import load_concrete + from yellowbrick.regressor import AlphaSelection # Load the regression dataset X, y = load_concrete() @@ -37,9 +53,46 @@ Workflow Model selection, Hyperparameter tuning visualizer.fit(X, y) visualizer.show() +For Estimators *without* Built-in Cross-Validation +-------------------------------------------------- + +Most scikit-learn ``Estimators`` with ``alpha`` parameters +have a version with built-in cross-validation. However, if the +regressor you wish to use doesn't have an associated "CV" estimator, +or for some reason you would like to specify more control over the +alpha selection process, then you can use the ``ManualAlphaSelection`` +visualizer. This visualizer is essentially a wrapper for scikit-learn's +``cross_val_score`` method, fitting a model for each alpha specified. + +.. plot:: + :context: close-figs + :alt: Manual alpha selection on the concrete data set + + import numpy as np + + from sklearn.linear_model import Ridge + from yellowbrick.datasets import load_concrete + from yellowbrick.regressor import ManualAlphaSelection + + # Load the regression dataset + X, y = load_concrete() + + # Create a list of alphas to cross-validate against + alphas = np.logspace(1, 4, 50) + + # Instantiate the visualizer + visualizer = ManualAlphaSelection( + Ridge(), + alphas=alphas, + cv=12, + scoring="neg_mean_squared_error" + ) -Quick Method ------------- + visualizer.fit(X, y) + visualizer.show() + +Quick Methods +------------- The same functionality above can be achieved with the associated quick method `alphas`. This method will build the ``AlphaSelection`` Visualizer object with the associated arguments, fit it, then (optionally) immediately show it. @@ -60,10 +113,31 @@ The same functionality above can be achieved with the associated quick method `a alphas(LassoCV(random_state=0), X, y) +The ``ManualAlphaSelection`` visualizer can also be used as a oneliner: + +.. plot:: + :context: close-figs + :alt: manual alphas on the energy dataset + + from sklearn.linear_model import ElasticNet + from yellowbrick.regressor.alphas import manual_alphas + + from yellowbrick.datasets import load_energy + + # Load dataset + X, y = load_energy() + + # Instantiate a model + model = ElasticNet(tol=0.01, max_iter=10000) + + # Use the quick method and immediately show the figure + manual_alphas(model, X, y, cv=6) + + API Reference ------------- .. automodule:: yellowbrick.regressor.alphas - :members: AlphaSelection, ManualAlphaSelection, alphas + :members: AlphaSelection, ManualAlphaSelection, alphas, manual_alphas :undoc-members: :show-inheritance: diff --git a/tests/baseline_images/test_regressor/test_alphas/test_quick_method_manual.png b/tests/baseline_images/test_regressor/test_alphas/test_quick_method_manual.png new file mode 100644 index 000000000..5c2022842 Binary files /dev/null and b/tests/baseline_images/test_regressor/test_alphas/test_quick_method_manual.png differ diff --git a/tests/baseline_images/test_regressor/test_alphas/test_similar_image_manual.png b/tests/baseline_images/test_regressor/test_alphas/test_similar_image_manual.png new file mode 100644 index 000000000..0c0941b38 Binary files /dev/null and b/tests/baseline_images/test_regressor/test_alphas/test_similar_image_manual.png differ diff --git a/tests/test_regressor/test_alphas.py b/tests/test_regressor/test_alphas.py index 4d0e586b1..54bdddf38 100644 --- a/tests/test_regressor/test_alphas.py +++ b/tests/test_regressor/test_alphas.py @@ -28,6 +28,8 @@ from yellowbrick.exceptions import YellowbrickTypeError from yellowbrick.exceptions import YellowbrickValueError from yellowbrick.regressor.alphas import AlphaSelection, alphas +from yellowbrick.regressor.alphas import ManualAlphaSelection, manual_alphas + from sklearn.svm import SVR, SVC from sklearn.cluster import KMeans @@ -167,3 +169,52 @@ def test_quick_method(self): ) assert isinstance(visualizer, AlphaSelection) self.assert_images_similar(visualizer) + + +class TestManualAlphaSelection(VisualTestCase): + """ + Test the ManualAlphaSelection visualizer + """ + def test_similar_image_manual(self): + """ + Integration test with image similarity comparison + """ + + visualizer = ManualAlphaSelection(Lasso(random_state=0), cv=5) + + X, y = make_regression(random_state=0) + visualizer.fit(X, y) + visualizer.finalize() + + # Image comparison fails on Appveyor with RMS 0.024 + self.assert_images_similar(visualizer, tol=0.1) + + @pytest.mark.parametrize("model", [RidgeCV, LassoCV, LassoLarsCV, ElasticNetCV]) + def test_manual_with_cv(self, model): + """ + Ensure only non-CV regressors are allowed + """ + with pytest.raises(YellowbrickTypeError): + ManualAlphaSelection(model()) + + @pytest.mark.parametrize("model", [SVR, Ridge, Lasso, LassoLars, ElasticNet]) + def test_manual_no_cv(self, model): + """ + Ensure non-CV regressors are allowed + """ + try: + ManualAlphaSelection(model()) + except YellowbrickTypeError: + pytest.fail("could not instantiate Regressor on alpha selection") + + def test_quick_method_manual(self): + """ + Test the manual alphas quick method producing a valid visualization + """ + X, y = load_energy(return_dataset=True).to_numpy() + + visualizer = manual_alphas( + ElasticNet(random_state=0), X, y, cv=3, is_fitted=False, show=False + ) + assert isinstance(visualizer, ManualAlphaSelection) + self.assert_images_similar(visualizer) diff --git a/yellowbrick/regressor/alphas.py b/yellowbrick/regressor/alphas.py index 97aac33af..0ae25293d 100644 --- a/yellowbrick/regressor/alphas.py +++ b/yellowbrick/regressor/alphas.py @@ -311,10 +311,13 @@ def __init__(self, model, ax=None, alphas=None, cv=None, scoring=None, **kwargs) ) # Call super to initialize the class - super(ManualAlphaSelection, self).__init__(model, ax=ax, **kwargs) + super(AlphaSelection, self).__init__(model, ax=ax, **kwargs) # Set manual alpha selection parameters - self.alphas = alphas or np.logspace(-10, -2, 200) + if alphas is not None: + self.alphas = alphas + else: + self.alphas = np.logspace(-10, -2, 200) self.errors = None self.score_method = partial(cross_val_score, cv=cv, scoring=scoring) @@ -361,7 +364,7 @@ def draw(self): ########################################################################## -## Quick Method +## Quick Methods ########################################################################## @@ -426,3 +429,84 @@ def alphas(model, X, y=None, ax=None, is_fitted="auto", show=True, **kwargs): # Return the visualizer return visualizer + + +def manual_alphas( + model, + X, + y=None, + ax=None, + alphas=None, + cv=None, + scoring=None, + show=True, + **kwargs +): + """Quick Method: + The Manual Alpha Selection Visualizer demonstrates how different values of alpha + influence model selection during the regularization of linear models. + Generally speaking, alpha increases the affect of regularization, e.g. if + alpha is zero there is no regularization and the higher the alpha, the + more the regularization parameter influences the final model. + + Parameters + ---------- + + model : an unfitted Scikit-Learn regressor + Should be an instance of an unfitted regressor, and specifically one + whose name doesn't end with "CV". The regressor must support a call to + ``set_params(alpha=alpha)`` and be fit multiple times. If the + regressor name ends with "CV" a ``YellowbrickValueError`` is raised. + + ax : matplotlib Axes, default: None + The axes to plot the figure on. If None is passed in the current axes + will be used (or generated if required). + + alphas : ndarray or Series, default: np.logspace(-10, 2, 200) + An array of alphas to fit each model with + + cv : int, cross-validation generator or an iterable, optional + Determines the cross-validation splitting strategy. + Possible inputs for cv are: + + - None, to use the default 3-fold cross validation, + - integer, to specify the number of folds in a `(Stratified)KFold`, + - An object to be used as a cross-validation generator. + - An iterable yielding train, test splits. + + This argument is passed to the + ``sklearn.model_selection.cross_val_score`` method to produce the + cross validated score for each alpha. + + scoring : string, callable or None, optional, default: None + A string (see model evaluation documentation) or + a scorer callable object / function with signature + ``scorer(estimator, X, y)``. + + This argument is passed to the + ``sklearn.model_selection.cross_val_score`` method to produce the + cross validated score for each alpha. + + kwargs : dict + Keyword arguments that are passed to the base class and may influence + the visualization as defined in other Visualizers. + + Returns + ------- + visualizer : AlphaSelection + Returns the alpha selection visualizer + """ + # Instantiate the visualizer + visualizer = ManualAlphaSelection( + model, ax, alphas=alphas, scoring=scoring, cv=cv, **kwargs + ) + + visualizer.fit(X, y) + + if show: + visualizer.show() + else: + visualizer.finalize() + + # Return the visualizer + return visualizer