A suite of visual analysis and diagnostic tools to facilitate feature selection, model selection, and parameter tuning for machine learning.
Branch: master
Clone or download
Pull request Compare This branch is 409 commits behind DistrictDataLabs:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs
examples
tests
yellowbrick
.gitignore
.travis.yml
DESCRIPTION.txt
LICENSE.txt
MANIFEST.in
Makefile
README.md
requirements.txt
setup.cfg
setup.py

README.md

Yellowbrick

Build Status Coverage Status Code Health Documentation Status Stories in Ready

A suite of visual analysis and diagnostic tools to facilitate feature selection, model selection, and parameter tuning for machine learning.

Follow the yellow brick road Image by Quatro Cinco, used with permission, Flickr Creative Commons.

What is Yellowbrick?

Yellowbrick is a suite of visual analysis and diagnostic tools to facilitate feature selection, model selection, and parameter tuning for machine learning. All visualizations are generated in Matplotlib. Custom yellowbrick visualization tools include:

Tools for feature analysis and selection

  • Boxplots (box-and-whisker plots)
  • Violinplots
  • Histograms
  • Scatter plot matrices (sploms)
  • Radial visualizations (radviz)
  • Parallel coordinates
  • Jointplots
  • Rank 1D
  • Rank 2D

Tools for model evaluation

Classification

  • ROC-AUC curves
  • Classification heatmaps
  • Class balance chart

Regression

  • Prediction error plots
  • Residual plots
  • Most informative features

Clustering

  • Silhouettes
  • Density measures

Tools for parameter tuning

  • Validation curves
  • Gridsearch heatmaps

Using Yellowbrick

The Yellowbrick API is specifically designed to play nicely with Scikit-Learn. Here is an example of a typical workflow sequence with Scikit-Learn and Yellowbrick:

Feature Visualization

In this example, we see how Rank2D performs pairwise comparisons of each feature in the data set with a specific metric or algorithm, then returns them ranked as a lower left triangle diagram.

from yellowbrick.features import Rank2D

visualizer = Rank2D(features=features, algorithm='covariance')
visualizer.fit(X, y)                # Fit the data to the visualizer
visualizer.transform(X)             # Transform the data
visualizer.poof()                   # Draw/show/poof the data

Model Visualization

In this example, we instantiate a Scikit-Learn classifier, and then we use Yellowbrick's ROCAUC class to visualize the tradeoff between the classifier's sensitivity and specificity.

from sklearn.svm import LinearSVC
from yellowbrick.classifier import ROCAUC

model = LinearSVC()
model.fit(X,y)
visualizer = ROCAUC(model)
visualizer.score(X,y)
visualizer.poof()

For additional information on getting started with Yellowbrick, check out our examples notebook.

We also have a quick start guide.

Contributing to Yellowbrick

Yellowbrick is an open source tool designed to enable more informed machine learning through visualizations. If you would like to contribute, you can do so in the following ways:

This repository is set up in a typical production/release/development cycle as described in A Successful Git Branching Model. A typical workflow is as follows:

  1. Select a card from the dev board - preferably one that is "ready" then move it to "in-progress".

  2. Create a branch off of develop called "feature-[feature name]", work and commit into that branch.

    ~$ git checkout -b feature-myfeature develop
    
  3. Once you are done working (and everything is tested) merge your feature into develop.

    ~$ git checkout develop
    ~$ git merge --no-ff feature-myfeature
    ~$ git branch -d feature-myfeature
    ~$ git push origin develop
    
  4. Repeat. Releases will be routinely pushed into master via release branches, then deployed to the server.