Skip to content

Latest commit

 

History

History
31 lines (22 loc) · 2.3 KB

index.rst

File metadata and controls

31 lines (22 loc) · 2.3 KB

Model Selection Visualizers

Yellowbrick visualizers are intended to steer the model selection process. Generally, model selection is a search problem defined as follows: given N instances described by numeric properties and (optionally) a target for estimation, find a model described by a triple composed of features, an algorithm and hyperparameters that best fits the data. For most purposes the "best" triple refers to the triple that receives the best cross-validated score for the model type.

The yellowbrick.model_selection package provides visualizers for inspecting the performance of cross validation and hyper parameter tuning. Many visualizers wrap functionality found in sklearn.model_selection and others build upon it for performing multi-model comparisons.

The currently implemented model selection visualizers are as follows:

  • validation_curve: visualizes how the adjustment of a hyperparameter influences training and test scores to tune the bias/variance trade-off.
  • learning_curve: shows how the size of training data influences the model to diagnose if a model suffers more from variance error vs. bias error.
  • cross_validation: displays cross-validated scores as a bar chart with average as a horizontal line.
  • importances: rank features by relative importance in a model
  • rfecv: select a subset of features by importance
  • dropping_curve: select subsets of features randomly

Model selection makes heavy use of cross validation to measure the performance of an estimator. Cross validation splits a dataset into a training data set and a test data set; the model is fit on the training data and evaluated on the test data. This helps avoid a common pitfall, overfitting, where the model simply memorizes the training data and does not generalize well to new or unknown input.

There are many ways to define how to split a dataset for cross validation. For more information on how scikit-learn implements these mechanisms, please review Cross-validation: evaluating estimator performance in the scikit-learn documentation.

validation_curve learning_curve cross_validation importances rfecv dropping_curve