A suite of visual analysis and diagnostic tools to facilitate feature selection, model selection, and parameter tuning for machine learning.
Image by Quatro Cinco, used with permission, Flickr Creative Commons.
What is Yellowbrick?
Yellowbrick is a suite of visual analysis and diagnostic tools to facilitate feature selection, model selection, and parameter tuning for machine learning. All visualizations are generated in Matplotlib. Custom
yellowbrick visualization tools include:
Tools for feature analysis and selection
- Boxplots (box-and-whisker plots)
- Scatter plot matrices (sploms)
- Radial visualizations (radviz)
- Parallel coordinates
- Rank 1D
- Rank 2D
Tools for model evaluation
- ROC-AUC curves
- Classification heatmaps
- Class balance chart
- Prediction error plots
- Residual plots
- Most informative features
- Density measures
Tools for parameter tuning
- Validation curves
- Gridsearch heatmaps
The Yellowbrick API is specifically designed to play nicely with Scikit-Learn. Here is an example of a typical workflow sequence with Scikit-Learn and Yellowbrick:
In this example, we see how Rank2D performs pairwise comparisons of each feature in the data set with a specific metric or algorithm, then returns them ranked as a lower left triangle diagram.
from yellowbrick.features import Rank2D visualizer = Rank2D(features=features, algorithm='covariance') visualizer.fit(X, y) # Fit the data to the visualizer visualizer.transform(X) # Transform the data visualizer.poof() # Draw/show/poof the data
In this example, we instantiate a Scikit-Learn classifier, and then we use Yellowbrick's ROCAUC class to visualize the tradeoff between the classifier's sensitivity and specificity.
from sklearn.svm import LinearSVC from yellowbrick.classifier import ROCAUC model = LinearSVC() model.fit(X,y) visualizer = ROCAUC(model) visualizer.score(X,y) visualizer.poof()
For additional information on getting started with Yellowbrick, check out our examples notebook.
We also have a quick start guide.
Contributing to Yellowbrick
Yellowbrick is an open source tool designed to enable more informed machine learning through visualizations. If you would like to contribute, you can do so in the following ways:
- Add issues or bugs to the bug tracker: https://github.com/DistrictDataLabs/yellowbrick/issues
- Work on a card on the dev board: https://waffle.io/DistrictDataLabs/yellowbrick
- Create a pull request in Github: https://github.com/DistrictDataLabs/yellowbrick/pulls
This repository is set up in a typical production/release/development cycle as described in A Successful Git Branching Model. A typical workflow is as follows:
Select a card from the dev board - preferably one that is "ready" then move it to "in-progress".
Create a branch off of develop called "feature-[feature name]", work and commit into that branch.
~$ git checkout -b feature-myfeature develop
Once you are done working (and everything is tested) merge your feature into develop.
~$ git checkout develop ~$ git merge --no-ff feature-myfeature ~$ git branch -d feature-myfeature ~$ git push origin develop
Repeat. Releases will be routinely pushed into master via release branches, then deployed to the server.