## Visualizing high-dimensional data with Bokeh and PCA/t-SNE

Using the toy digits dataset available from scikit-learn, this demonstrates how to make use of PCA & t-SNE to project high dimensional data (in our case, pixel values for different images) down to two dimensions, allowing us to reasonably visualize and explore how the data is distributed (**assuming the projections actually worked the way we expected them to**). This also uses Bokeh, a library that lets us interact with the visualizations from within the notebook. 

In [2]:
from bokeh.io import output_notebook, show
from bokeh.plotting import figure, ColumnDataSource
from bokeh.models import HoverTool, BoxSelectTool, BoxZoomTool, WheelZoomTool
import numpy as np
from sklearn import datasets
output_notebook()


### Loading Data

In [31]:
# load digits
digits = datasets.load_digits()
raw_X = digits['data']
target = digits['target']
images = digits['images']


### Using PCA & t-SNE to change the data

In [19]:
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA

In [27]:
raw_X.shape

(1797, 64)

In [28]:
pca = PCA(n_components=25)
pca_X = pca.fit_transform(raw_X)

In [29]:
pca_X.shape

(1797, 25)

In [21]:
tsne = TSNE()
tsne_X = tsne.fit_transform(pca_X)

In [30]:
tsne_X.shape

(1797, 2)

### Using Bokeh to plot the low-dimensional data

In [38]:
# Choose a set of colors that let's us distinguish between the 10 different classes in our data
colormap = np.array([
    "#1f77b4", "#aec7e8", "#ff7f0e", "#ffbb78", "#2ca02c", 
    "#98df8a", "#d62728", "#ff9896", "#9467bd", "#ffff00"])

## N.B. This was an attempt to try and add the /images/ to the tooltip as well, but I don't have the patience to figure it out 
#hover = HoverTool(tooltips="""
#        <div>
#            <span style="font-size: 17px; font-weight: bold; width:400px; display:block;">@target</span>
#            <span style="font-size: 15px; color: #966; width:400px; display:block;">[$index]</span>
#        </div>
#        <div>
#            <span style="font-size: 15px;">Image</span>
#            <span style="font-size: 8px; color: #696; display:block; width:400px; word-break: break-all;"><img>(@image)</img></span>
#        </div>
#""")
hover = HoverTool(tooltips="""
        <div>
            <span style="font-size: 17px; font-weight: bold; display:block;">@target</span>
            <span style="font-size: 15px; color: #966; display:block;">[$index]</span>
        </div>
""")

# Like in matplotlib, instantiate a figure to do our plotting on
p = figure(plot_width=1000, plot_height=1000, tools=[hover, BoxZoomTool(), WheelZoomTool()])

# Make a scatter plot with our two-dimensional data, and feed the hover tooltip the relevant information
p.scatter(x=tsne_X[:,0], y=tsne_X[:,1], 
          color=colormap[target],
          source=ColumnDataSource(data={
                        "target": target
                        #"image": [p.image(image=[img]) for img in images]
                    }))
# Show the results
show(p) 

<bokeh.io._CommsHandle at 0x110542490>