## Matplotlib limitations

Let's start by importing the tools we need: 

In [15]:
'''
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
np.random.seed(0xdeadbeef)
'''

'\nimport numpy as np\nimport matplotlib.pyplot as plt\n%matplotlib inline\nnp.random.seed(0xdeadbeef)\n'

Then we create a sample of (x,y) points. In this sample, 100 points are drawn from a Gaussian distribution centred at (0,0) with a width of 1. On top of this, we add 100 points drawn from another Gaussian distribution centred at (1,1), with a width of 0.05. 

In [16]:
'''
sample1 = np.random.multivariate_normal([0,0], [[1,0],[0,1]], 100)
sample2 = np.random.multivariate_normal([1,1], [[0.05,0],[0,0.05]], 100)
sample = np.concatenate([sample1, sample2])
plt.scatter(sample[:,0],sample[:,1])
'''

'\nsample1 = np.random.multivariate_normal([0,0], [[1,0],[0,1]], 100)\nsample2 = np.random.multivariate_normal([1,1], [[0.05,0],[0,0.05]], 100)\nsample = np.concatenate([sample1, sample2])\nplt.scatter(sample[:,0],sample[:,1])\n'

Since the number of points is not too large, we clearly see the two distributions. But let's add more points now. 

In [17]:
'''
sample1 = np.random.multivariate_normal([0,0], [[1,0],[0,1]], 5000)
sample2 = np.random.multivariate_normal([1,1], [[0.05,0],[0,0.05]], 5000)
sample = np.concatenate([sample1, sample2])
plt.scatter(sample[:,0],sample[:,1])
'''

'\nsample1 = np.random.multivariate_normal([0,0], [[1,0],[0,1]], 5000)\nsample2 = np.random.multivariate_normal([1,1], [[0.05,0],[0,0.05]], 5000)\nsample = np.concatenate([sample1, sample2])\nplt.scatter(sample[:,0],sample[:,1])\n'

And we just get a blob of mess, no way to see the two distributions. Of course, it's possible to tune our plotting options: 

In [18]:
'''
plt.figure(figsize=(10,10))
plt.scatter(sample[:,0],sample[:,1], alpha=0.5, marker='.')
'''

"\nplt.figure(figsize=(10,10))\nplt.scatter(sample[:,0],sample[:,1], alpha=0.5, marker='.')\n"

That's much nicer! 

Still, the plot is static. No way to zoom nor to get some information on these points. That's where bokeh will really help. 

## First visualization with bokeh

Let's import some tools from bokeh and initialize it: 

In [19]:
#!pip install bokeh

Then, we can do a simple plot with the following code. 
What is very nice is that we can now hover on the data to get some information, and do a box zoom to focus on part of the data. 

In [20]:
'''
tools = "hover, box_zoom, undo, crosshair"
p = figure(tools=tools)
p.scatter(sample[:,0], sample[:,1], alpha=0.5)
show(p)
'''

'\ntools = "hover, box_zoom, undo, crosshair"\np = figure(tools=tools)\np.scatter(sample[:,0], sample[:,1], alpha=0.5)\nshow(p)\n'

## Bokeh and pandas

The integration between bokeh and pandas works very well. In this section, we will use pandas to add another value to each data point, and we will see how to modify the bokeh tooltip to show this value while hovering. 

First we're going to import: 

* pandas: we will create a pandas dataframe from the numpy array holding our sample, so that we can add a new value to each point. 
* the bokeh ColumnDataSource: it will act as a convenient interface beteween bokeh and the dataframe. 
* the bokeh HoverTool: we'll need it to change the format of the tooltip

Then, we create the dataframe from our sample, and we print the first rows:

Let's now add a new value to each point, and print again. As an example, I put here the distance of the point from origin, but this value could be anything; it does not have to be a function of x and y. 

And finally, we do another plot, with a custom HoverTool: 

In [21]:
'''
source = ColumnDataSource(df)
tools = "box_zoom, undo, crosshair"
p = figure(tools=tools)
p.scatter('x','y', source=source, alpha=0.5)
p.add_tools(
    HoverTool(
        tooltips=[('value','@value{2.2f}'), 
                  ('index', '@index')]
    )
)
show(p)
'''

'\nsource = ColumnDataSource(df)\ntools = "box_zoom, undo, crosshair"\np = figure(tools=tools)\np.scatter(\'x\',\'y\', source=source, alpha=0.5)\np.add_tools(\n    HoverTool(\n        tooltips=[(\'value\',\'@value{2.2f}\'), \n                  (\'index\', \'@index\')]\n    )\n)\nshow(p)\n'

Using the index given by the tooltip, we can locate the corresponding row in the dataframe: 

In [22]:
#!jupyter labextension install @jupyter-widgets/jupyterlab-manager

In [23]:
#!jupyter labextension install @bokeh/jupyter_bokeh