Be notified of new releases
Create your free GitHub account today to subscribe to this repository for new releases and build software alongside 28 million developers.Sign up
Deployed: Wednesday, November 14, 2018
Contributors: @rebeccabilbro, @bbengfort, @zjpoh, @Kautumn06, @ndanielsen, @drwaterman, @lwgray, @pdamodaran, @Juan0001, @abatula, @peterespinosa, @jlinGG, @rlshuhart, @archaeocharlie, @dschoenleber, @black-tea, @iguk1987, @mohfadhil, @lacanlale, @agodbehere, @sivu1, @gokriznastic
Target module added for visualizing dependent variable in supervised models.
- Added a prototype for a missing values visualizer to the
BalancedBinningReference visualizer for thresholding unbalanced data (undocumented).
CVScores visualizer to instrument cross-validation.
FeatureCorrelation visualizer to compare relationship between a single independent variable and the target.
ICDM visualizer, intercluster distance mapping using projections similar to those used in pyLDAVis.
PrecisionRecallCurve visualizer showing the relationship of precision and recall in a threshold-based classifier.
FeatureImportance for multi-target and multi-coefficient models (e.g probabilistic models) and allows stacked bar chart.
- Adds option to plot PDF to
- Adds document boundaries option to
DispersionPlot and uses colored markers to depict class.
- Added alpha parameter for opacity to the scatter plot visualizer.
KElbowVisualizer to accept a list of k values.
ROCAUC bugfix to allow binary classifiers that only have a decision function.
TSNE bugfix so that title and size params are respected.
ConfusionMatrix bugfix to correct percentage displays adding to 100.
ResidualsPlot bugfix to ensure specified colors are both in histogram and scatterplot.
- Fixed unicode decode error on Py2 compatible Windows using Hobbies corpus.
- Require matplotlib 1.5.1 or matplotlib 2.0 (matplotlib 3.0 not supported yet).
- Yellowbrick now depends on SciPy 1.0 and scikit-learn 0.20.
sample_weight arguments to
- Removed hardcoding of
SilhouetteVisualizer axes dimensions.
- Audit classifiers to ensure they conform to score API.
- Fix for
Manifold import bug.
- Started reworking datasets API for easier loading of examples.
- Added Timer utility for keeping track of fit times.
- Added slides to documentation for teachers teaching ML/Yellowbrick.
- Added an FAQ to the documentation.
- Manual legend drawing utility.
- New examples notebooks for Regression and Clustering.
- Example of interactive classification visualization using ipywidgets.
- Example of using Yellowbrick with PyTorch.
- Repairs to
ROCAUC tests and binary/multiclass ROCAUC construction.
- Rename tests/random.py to tests/rand.py to prevent NumPy errors.
- Fixed visual display bug in
- Fixed image in
- Clear figure option to poof.
- Fix color plotting error in residuals plot quick method.
- Fixed bugs in
FeatureImportance, Index, and Datasets documentation.
- Use LGTM for code quality analysis (replacing Landscape).
- Updated contributing docs for better PR workflow.
- Submitted JOSS paper.
Deployed: Thursday, July 12, 2018
Contributors: @bbengfort, @ndanielsen, @rebeccabilbro, @lwgray, @RaulPL, @Kautumn06, @ariley1472, @ralle123, @thekylesaurus, @lumega, @pdamodaran, @lumega, @chrisfs, @mitevpi, @sayali-sonawane
- Added Support to
ClassificationReport - @ariley1472
- We have an updated Image Gallery - @ralle123
- Improved performance of
ParallelCoordinates Visualizer @thekylesaurus
- Added Alpha Transparency to
RadViz Visualizer @lumega
CVScores Visualizer - @pdamodaran
- Added fast and alpha parameters to
ParallelCoordinates visualizer @bbengfort
- Make support an optional parameter for
- Bug Fix for Usage of multidimensional arrays in
FeatureImportance visualizer @rebeccabilbro
ScatterVisualizer to contrib @bbengfort
- Implements histogram alongside
- Adds biplot to the
PCADecomposition visualizer @RaulPL
- Adds Datasaurus Dataset to show importance of visualizing data @lwgray
DispersionPlot Plot @lwgray
- Fix grammar in tutorial.rst - @chrisfs
- Added Note to tutorial indicating subtle differences when working in Jupyter notebook - @chrisfs
- Update Issue template @bbengfort
- Added Test to check for NLTK postag data availability - @sayali-sonawane
- Clarify quick start documentation @mitevpi
- Threshold Visualization aliases deprecated
- New Feature! Manifold visualizers implement high-dimensional visualization for non-linear structural feature analysis.
- New Feature! There is now a
- New Feature! The
RFECV(recursive feature elimination) visualizer with cross-validation visualizes how removing the least performing features improves the overall model.
- New Feature! The
VisualizerGridis an implementation of the
MultipleVisualizerthat creates axes for each visualizer using
plt.subplots, laying the visualizers out as a grid.
- New Feature! Added
yellowbrick.datasetsto load example datasets.
- New Experimental Feature! An experimental
StatsModelsWrapperwas added to
yellowbrick.contrib.statsmodelsthat will allow user to use StatsModels estimators with visualizers.
ClassificationReportdocumentation to include more details about how to interpret each of the metrics and compare the reports against each other.
- Enhancement! Modifies scoring mechanism for regressor visualizers to include the R2 value in the plot itself with the legend.
- Enhancement! Updated and renamed the
ThreshVizto be defined as
DiscriminationThreshold, implements a few more discrimination features such as F1 score, maximizing arguments and annotations.
- Enhancement! Update clustering visualizers and corresponding
distortion_scoreto handle sparse matrices.
- Added code of conduct to meet the GitHub community guidelines as part of our contributing documentation.
is_probabilistictype checker and converted the type checking tests to pytest.
- Added a
DecisionBoundariesvisualizer has been moved to it until further work is completed.
- Numerous fixes and improvements to documentation and tests. Add academic citation example and Zenodo DOI to the Readme.
RandomVisualizerfor testing and add it to the
- Fix / update tests in
tests.test_classifier.test_class_prediction_error.pyto remove hardcoded data.
ScatterPlotVisualizeris being moved to contrib in 0.8
DecisionBoundaryVisualizeris being moved to contrib in 0.8
ThreshVizis renamed to
NOTE: These deprecation warnings originally mentioned deprecation in 0.7, but their life was extended by an additional version.
Markdown for GitHub repo:
- New Feature! The
FeatureImportancesVisualizer enables the user to visualize the most informative (relative and absolute) features in their model, plotting a bar graph of
- New Feature! The
ExplainedVarianceVisualizer produces a plot of the explained variance resulting from a dimensionality reduction to help identify the best tradeoff between number of dimensions and amount of information retained from the data.
- New Feature! The
GridSearchVisualizercreates a color plot showing the best grid search scores across two parameters.
- New Feature! The
ClassPredictionErrorVisualizer is a heatmap implementation of the class balance visualizer, which provides a way to quickly understand how successfully your classifier is predicting the correct classes.
- New Feature! The
ThresholdVisualizerallows the user to visualize the bounds of precision, recall and queue rate at different thresholds for binary targets after a given number of trials.
MultiFeatureVisualizerhelper class to provide base functionality for getting the names of features for use in plot annotation.
- Adds font size param to the confusion matrix to adjust its visibility.
- Add quick method to the confusion matrix
- Tests: In this version, we've switched from using nose to pytest. Image comparison tests have been added and the visual tests are updated to matplotlib 2.2.0. Test coverage has also been improved for a number of visualizers, including
- Documentation updates, including discussion of Image Comparison Tests for contributors.
- Fixes the resolve_colors function. You can now pass in a number of colors and a colormap and get back the correct number of colors.
TSNEVisualizerValue Error when no classes are specified.
- Adds the circle back to
RadViz! This visualizer has also been updated to ensure there's a visualization even when there are missing values
RocAucto correctly check the number of classes
- Switch from converting structured arrays to ndarrays using
np.tolistto avoid NumPy deprecation warning.
DataVisualizerupdated to remove
np.nanvalues and warn the user that nans are not plotted.
ClassificationReportno longer has lines that run through the numbers, is more grid-like
ScatterPlotVisualizeris being moved to contrib in 0.7
DecisionBoundaryVisualizeris being moved to contrib in 0.7
- Added VisualTestCase.
- New PCADecomposition Visualizer, which decomposes high dimensional data into two or three dimensions so that each instance can be plotted in a scatter plot.
- New and improved ROCAUC Visualizer, which now supports multiclass classification.
- Prototype Decision Boundary Visualizer, which is a bivariate data visualization algorithm that plots the decision boundaries of each class.
- Added Rank1D Visualizer, which is a one dimensional ranking of features that utilizes the Shapiro-Wilks ranking that takes into account only a single feature at a time (e.g. histogram analysis).
- Improved Prediction Error Plot with identity line, shared limits, and r squared.
- Updated FreqDist Visualizer to make word features a hyperparameter.
- Added normalization and scaling to Parallel Coordinates.
- Added Learning Curve Visualizer, which displays a learning curve based on the number of samples versus the training and cross validation scores to show how a model learns and improves with experience.
- Added data downloader module to the yellowbrick library.
- Complete overhaul of the yellowbrick documentation; categories of methods are located in separate pages to make it easier to read and contribute to the documentation.
- Added a new color palette inspired by ANN-generated colors
- Repairs to PCA, RadViz, FreqDist unit tests
- Repair to matplotlib version check in JointPlot Visualizer
This release is an intermediate version bump in anticipation of the PyCon 2017 sprints.
The primary goals of this version were to (1) update the Yellowbrick dependencies (2) enhance the Yellowbrick documentation to help orient new users and contributors, and (3) make several small additions and upgrades (e.g. pulling the Yellowbrick utils into a standalone module).
We have updated the Scikit-Learn and SciPy dependencies from version 0.17.1 or later to 0.18 or later. This primarily entails moving from
from sklearn.cross_validation import train_test_split to
from sklearn.model_selection import train_test_split.
The updates to the documentation include new Quickstart and Installation guides as well as updates to the Contributors documentation, which is modeled on the Scikit-Learn contributing documentation.
This version also included upgrades to the KMeans visualizer, which now supports not only
silhouette_score but also
distortion_score computes the mean distortion of all samples as the sum of the squared distances between each observation and its closest centroid. This is the metric that K-Means attempts to minimize as it is fitting the model. The
calinski_harabaz_score is defined as ratio between the within-cluster dispersion and the between-cluster dispersion.
Finally, this release includes a prototype of the
VisualPipeline, which extends Scikit-Learn's
Pipeline class, allowing multiple Visualizers to be chained or sequenced together.
- Score and model visualizers now wrap estimators as proxies so that all methods on the estimator can be directly accessed from the visualizer
- Updated Scikit-learn dependency from >=0.17.1 to >=0.18
- Updated SciPy dependency from >=0.17.1 to >=0.18
- ScoreVisualizer now subclasses ModelVisualizer; towards allowing both fitted and unfitted models passed to Visualizers
- Added CI tests for Python 3.6 compatibility
- Added new quickstart guide and install instructions
- Updates to the contributors documentation
calinski_harabaz_scorecomputations and visualizations to KMeans visualizer.
- Replaced the
self.axproperty on all of the individual
drawmethods with a new property on the
Visualizerclass that ensures all visualizers automatically have axes.
- Refactored the utils module into a package
- Continuing to update the docstrings to conform to Sphinx
- Added a prototype visual pipeline class that extends the Scikit-learn pipeline class to ensure that visualizers get called correctly.
- Fixed title bug in Rank2D FeatureVisualizer
This release is the culmination of the Spring 2017 DDL Research Labs that focused on developing Yellowbrick as a community effort guided by a sprint/agile workflow. We added several more visualizers, did a lot of user testing and bug fixes, updated the documentation, and generally discovered how best to make Yellowbrick a friendly project to contribute to.
Notable in this release is the inclusion of two new feature visualizers that use few, simple dimensions to visualize features against the target. The
JointPlotVisualizer graphs a scatter plot of two dimensions in the data set and plots a best fit line across it. The
ScatterVisualizer also uses two features, but also colors the graph by the target variable, adding a third dimension to the visualization.
This release also adds support for clustering visualizations, namely the elbow method for selecting K,
KElbowVisualizer and a visualization of cluster size and density using the
SilhouetteVisualizer. The release also adds support for regularization analysis using the
AlphaSelection visualizer. Both the text and classification modules were also improved with the inclusion of the
PosTagVisualizer and the
ConfusionMatrix visualizer respectively.
This release also added an Anaconda repository and distribution so that users can
conda install yellowbrick. Even more notable, we got yellowbrick stickers! We've also updated the documentation to make it more friendly and a bit more visual; fixing the API rendering errors. All-in-all, this was a big release with a lot of contributions and we thank everyone that participated in the lab!
- Part of speech tags visualizer --
- Alpha selection visualizer for regularized regression --
- Confusion Matrix Visualizer --
- Elbow method for selecting K vis --
- Silhouette score cluster visualization --
- Joint plot visualizer with best fit --
- Scatter visualization of features --
- Added three more example datasets: mushroom, game, and bike share
- Contributor's documentation and style guide
- Maintainers listing and contacts
- Light/Dark background color selection utility
- Structured array detection utility
- Updated classification report to use colormesh
- Added anacondas packaging and distribution
- Refactoring of the regression, cluster, and classification modules
- Image based testing methodology
- Docstrings updated to a uniform style and rendering
- Submission of several more user studies
Intermediate sprint to demonstrate prototype implementations of text visualizers for NLP models. Primary contributions were the
FreqDistVisualizer and the
TSNEVisualizer displays a projection of a vectorized corpus in two dimensions using TSNE, a nonlinear dimensionality reduction method that is particularly well suited to embedding in two or three dimensions for visualization as a scatter plot. TSNE is widely used in text analysis to show clusters or groups of documents or utterances and their relative proximities.
FreqDistVisualizer implements frequency distribution plot that tells us the frequency of each vocabulary item in the text. In general, it could count any kind of observable event. It is a distribution because it tells us how the total number of word tokens in the text are distributed across the vocabulary items.
- TSNEVisualizer for 2D projections of vectorized documents
- FreqDistVisualizer for token frequency of text in a corpus
- Added the user testing evaluation to the documentation
- Created scikit-yb.org and host documentation there with RFD
- Created a sample corpus and text examples notebook
- Created a base class for text,
- Model selection tutorial using Mushroom Dataset
- Created a text examples notebook but have not added to documentation.
Hardened the Yellowbrick API to elevate the idea of a Visualizer to a first principle. This included reconciling shifts in the development of the preliminary versions to the new API, formalizing Visualizer methods like
finalize(), and adding utilities that revolve around Scikit-Learn. To that end we also performed administrative tasks like refreshing the documentation and preparing the repository for more and varied open source contributions.
- Converted Mkdocs documentation to Sphinx documentation
- Updated docstrings for all Visualizers and functions
- Created a DataVisualizer base class for dataset visualization
- Single call functions for simple visualizer interaction
- Added yellowbrick specific color sequences and palettes and env handling
- More robust examples with downloader from DDL host
- Better axes handling in visualizer, matplotlib/sklearn integration
- Added a finalize method to complete drawing before render
- Improved testing on real data sets from examples
- Score visualizer renders in notebook but not in Python scripts.
- Tests updated to support new API