Feature analysis visualizers are designed to visualize instances in data space in order to detect features or targets that might impact downstream fitting. Because ML operates on high-dimensional data sets (usually at least 35), the visualizers focus on aggregation, optimization, and other techniques to give overviews of the data. It is our intent that the steering process will allow the data scientist to zoom and filter and explore the relationships between their instances and between dimensions.
At the moment we have the following feature analysis visualizers implemented:
rankd
: rank single and pairs of features to detect covarianceradviz
: plot data points along axes ordered around a circle to detect separabilitypcoords
: plot instances as lines along vertical axes to detect classes or clusterspca
: project higher dimensions into a visual space using PCAmanifold
: visualize high dimensional data using manifold learningjointplot
: (aka Jointplots) plot 2D correlation between features and target
Feature analysis visualizers implement the Transformer
API from scikit-learn, meaning they can be used as intermediate transform steps in a Pipeline
(particularly a VisualPipeline
). They are instantiated in the same way, and then fit and transform are called on them, which draws the instances correctly. Finally show
is called which finalizes and displays the image.
# Feature Analysis Imports
# NOTE that all these are available for import directly from the ``yellowbrick.features`` module
from yellowbrick.features.rankd import Rank1D, Rank2D
from yellowbrick.features.radviz import RadViz
from yellowbrick.features.pcoords import ParallelCoordinates
from yellowbrick.features.jointplot import JointPlotVisualizer
from yellowbrick.features.pca import PCADecomposition
from yellowbrick.features.manifold import Manifold
radviz rankd pcoords pca manifold jointplot