## Lesson 16 - Interactive Visualization with Bokeh

From the Bokeh [homepage](http://bokeh.pydata.org/en/latest/):

Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.

To get started using Bokeh to make your visualizations, see the [User Guide](http://bokeh.pydata.org/en/latest/docs/user_guide.html).

A complete API reference of Bokeh is at [Reference Guide](http://bokeh.pydata.org/en/latest/docs/reference.html).

To see examples of how you might use Bokeh with your own data, check out the [Gallery](http://bokeh.pydata.org/en/latest/docs/gallery.html) and the [Notebook Viewer Gallery](http://nbviewer.jupyter.org/github/bokeh/bokeh-notebooks/blob/master/index.ipynb).

### Getting started

We are using Python 2 because, as of the time of making this notebook, Bokeh was very slow on Python 3. 

First make sure you have bokeh installed:

```
conda install bokeh
```

### Different interfaces for Bokeh

1. `bokeh.models` -- low-level, for application developers
2. `bokeh.plotting` -- intermediate-level, still plenty of control
3. `bokeh.charts` -- high-level, fast but less control

### Example: bokeh.charts interface

Import the required packages:

In [1]:
import pandas as pd
import numpy as np
from bokeh.charts import Scatter, output_file, show 

# note: output_file and show are also available in bokeh.charts

Import a DataFrame:

In [2]:
# format the data: covert to datetime, average precipitation per month, get month and year, reset index
df = pd.read_csv('la_jolla_precip.csv')
df['DATE'] = pd.to_datetime(df['DATE'])
df = df.groupby('DATE').mean()
df['MONTH'] = [x.month for x in df.index]
df['YEAR'] = [x.year for x in df.index]
df.reset_index(inplace=True)

In [3]:
df.head()

Unnamed: 0,DATE,LATITUDE,LONGITUDE,ELEVATION,PRCP,MONTH,YEAR
0,2008-12-01,32.8254,-117.2397,154.8,112.4,12,2008
1,2009-01-01,32.8254,-117.2397,154.8,6.9,1,2009
2,2009-02-01,32.8254,-117.2397,154.8,72.6,2,2009
3,2009-03-01,32.82555,-117.2449,152.25,6.1,3,2009
4,2009-04-01,32.82555,-117.2449,152.25,2.05,4,2009


Create a simple scatter plot:

In [4]:
p = Scatter(df, x='DATE', y='PRCP', title='Precipitation in La Jolla 2008-16', 
            xlabel='Date', ylabel='Precipitation (mm)')

Output to a named HTML file with `output_file(FILE.html)`:

In [5]:
output_file('scatter_charts.html')

Show the HTML file:

In [6]:
show(p)

### Example: bokeh.plotting interface

Use the figure object, within which we create "glyphs":

In [7]:
from bokeh.plotting import figure, output_file, show

Create a figure object, then create some glyphs, then customize it:

In [8]:
p = figure(plot_width=500, plot_height=400) # can also leave figure() empty

p.circle([0, 1, 2, 3, 4], [112, 7, 73, 6, 2], size=12, color='red')
p.triangle([0, 1, 2, 3, 4], [100, 12, 68, 15, 10], size=[5, 10, 15, 20, 25], color='blue', alpha=0.5)

p.title.text = 'plotting example'
p.title.text_color = 'orange'

p.xaxis.axis_label = 'X-axis label'
p.yaxis.axis_label = 'Y-axis label'

p.xaxis.minor_tick_line_color = 'red'
p.yaxis.minor_tick_line_color = None

output_file('scatter_plotting.html', mode='cdn')

show(p)

INFO:bokeh.core.state:Session output file 'scatter_plotting.html' already exists, will be overwritten.


Create a new plot with more glpyhs and elements:

In [9]:
# prepare some data
x = [0.1, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0]
y0 = [i**2 for i in x]
y1 = [10**i for i in x]
y2 = [10**(i**2) for i in x]

# output to static HTML file
output_file("log_lines.html")

# create a new plot
p = figure(
   tools="pan,box_zoom,reset,save",
   y_axis_type="log", y_range=[0.001, 10**11], title="log axis example",
   x_axis_label='sections', y_axis_label='particles'
)

# add some renderers
p.line(x, x, legend="y=x")
p.circle(x, x, legend="y=x", fill_color="white", size=8)
p.line(x, y0, legend="y=x^2", line_width=3)
p.line(x, y1, legend="y=10^x", line_color="red")
p.circle(x, y1, legend="y=10^x", fill_color="red", line_color="red", size=6)
p.line(x, y2, legend="y=10^x^2", line_color="orange", line_dash="4 4")

# show the results
show(p)

INFO:bokeh.core.state:Session output file 'log_lines.html' already exists, will be overwritten.


### About the HTML file

Open up the HTML source code in a text editor.

The HTML file is composed of HTML tags and code. The code is composed of Javascript and CSS. Javascript is the scripting language that makes possible interaction between the elements in the browser. CSS controls the styling of text and everything else.

The style sheet is stored in a remote server. Every time you generate a graph, Bokeh is pulling the CSS and Javascript files from a server (pydata.org). This is the default mode, called CDN (content delivery network). There are several options for where to host these files:

* `mode=cdn` -- online (default)
* `mode=relative` -- locally with relative path
* `mode=absolute` -- locally with absolute path
* `mode=inline` -- locally in the HTML file (stand-alone option)

### Tools

* Customization: `logo`, `tools`, `toolbar_location`.
* Lots more about tools [here](http://bokeh.pydata.org/en/latest/docs/user_guide/tools.html).
* By the way, another useful glyph is `line`.

In [10]:
p = figure(plot_width=500, plot_height=400, logo=None, tools='pan,wheel_zoom,box_zoom,lasso_select,reset,save',
          toolbar_location='below', toolbar_sticky=False) 

p.circle([0, 1, 2, 3, 4], [112, 7, 73, 6, 2], size=12, color='red')
p.line([0, 1, 2, 3, 4], [112, 7, 73, 6, 2], line_width=2, color='blue')

output_file('scatter_tools.html', mode='inline')

show(p)

INFO:bokeh.core.state:Session output file 'scatter_tools.html' already exists, will be overwritten.


### Time-series graphs

Notice a couple new things:
    
1. We can read a csv straight from the web, in this case stock data (see [here](https://greenido.wordpress.com/2009/12/22/work-like-a-pro-with-yahoo-finance-hidden-api/), [here](https://support.klipfolio.com/hc/en-us/articles/215546368-Use-Yahoo-Finance-as-a-data-source-), and [here](http://community.jaspersoft.com/wiki/building-custom-datasource-yahoo-finance-data)).
2. We can parse dates directly when we read the csv file, either by passing `parse_dates=[COL1,COL2]` or `parse_dates=True` if the date is in the index.

In [11]:
df2 = pd.read_csv("http://ichart.yahoo.com/table.csv?s=AAPL&a=01&b=20&c=2000&d=11&e=28&f=2016&g=d&ignore=.csv", 
                 parse_dates=[0])

In [12]:
df2.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
0,2016-11-28,111.43,112.470001,111.389999,111.57,27026600,111.57
1,2016-11-25,111.129997,111.870003,110.949997,111.790001,11424400,111.790001
2,2016-11-23,111.360001,111.510002,110.330002,111.230003,27387900,111.230003
3,2016-11-22,111.949997,112.419998,111.400002,111.800003,25922600,111.800003
4,2016-11-21,110.120003,111.989998,110.010002,111.730003,29119100,111.730003


In [13]:
df2['Date'].head()

0   2016-11-28
1   2016-11-25
2   2016-11-23
3   2016-11-22
4   2016-11-21
Name: Date, dtype: datetime64[ns]

The key part of a time-series graph is: ` x_axis_type='datetime'`.

In [14]:
p = figure(width=1000, height=500, x_axis_type='datetime')

p.line(df2.Date, df2.Close, color='maroon', alpha=0.5)

p.xaxis.axis_label = 'Date'
p.yaxis.axis_label = 'Price'

p.ygrid.band_fill_color="olive"
p.ygrid.band_fill_alpha = 0.1

output_file('timeseries.html')

show(p)

INFO:bokeh.core.state:Session output file 'timeseries.html' already exists, will be overwritten.


Let's look at the precipitation data again:

In [15]:
p = figure(width=1000, height=500, x_axis_type='datetime', tools='pan,wheel_zoom,box_zoom,lasso_select,hover,reset,save')

p.circle(df.DATE, df.PRCP, color='blue', size=10, alpha=0.5)
p.line(df.DATE, df.PRCP, color='blue')

output_file('precip.html')

show(p)

INFO:bokeh.core.state:Session output file 'precip.html' already exists, will be overwritten.


### Vectorized colors and sizes

This example shows how it is possible to provide sequences of data values for glyph attributes like `fill_color` and `radius`. Other things to look out for in this example:

* supplying an explicit list of tool names to `figure()`
* fetching BokehJS resources from CDN using the mode argument
* setting the `x_range` and `y_range` explicitly
* turning a line off (by setting its value to `None`)
* using NumPy arrays for supplying data

In [16]:
# prepare some data
N = 4000
x = np.random.random(size=N) * 100
y = np.random.random(size=N) * 100
radii = np.random.random(size=N) * 1.5
colors = [
    "#%02x%02x%02x" % (int(r), int(g), 150) for r, g in zip(50+2*x, 30+2*y)
]

# output to static HTML file (with CDN resources)
output_file("color_scatter.html", title="color scatter example", mode="cdn")

TOOLS="resize,crosshair,pan,wheel_zoom,box_zoom,reset,box_select,lasso_select"

# create a new plot with the tools above, and explicit ranges
p = figure(tools=TOOLS, x_range=(0,100), y_range=(0,100))

# add a circle renderer with vectorized colors and sizes
p.circle(x,y, radius=radii, fill_color=colors, fill_alpha=0.6, line_color=None)

# show the results
show(p)

INFO:bokeh.core.state:Session output file 'color_scatter.html' already exists, will be overwritten.


### Linked panning and brushing

Linking together various aspects of different plots can be a useful technique for data visualization. In Bokeh, such linkages are typically accomplished by sharing some plot component between plots. Below is an example that demonstrates **linked panning** (where changing the range of one plot causes others to update) by sharing range objects between the plots. Some other things to look out for in this example:

* calling `figure()` multiple times to create multiple plots
* using `gridplot()` to arrange several plots in an array
* showing different glyphs glyph methods `Figure.triangle` and `Figure.square`
* hiding the toolbar by setting `toolbar_location` to `None`
* setting convenience arguments `color` (sets both `line_color` and `fill_color`) and `alpha` (sets both `line_alpha` and `fill_alpha`)

In [17]:
from bokeh.layouts import gridplot
from bokeh.plotting import figure, output_file, show

# prepare some data
N = 100
x = np.linspace(0, 4*np.pi, N)
y0 = np.sin(x)
y1 = np.cos(x)
y2 = np.sin(x) + np.cos(x)

# output to static HTML file
output_file("linked_panning.html")

# create a new plot
s1 = figure(width=250, plot_height=250, title=None)
s1.circle(x, y0, size=10, color="navy", alpha=0.5)

# NEW: create a new plot and share both ranges
s2 = figure(width=250, height=250, x_range=s1.x_range, y_range=s1.y_range, title=None)
s2.triangle(x, y1, size=10, color="firebrick", alpha=0.5)

# NEW: create a new plot and share only one range
s3 = figure(width=250, height=250, x_range=s1.x_range, title=None)
s3.square(x, y2, size=10, color="olive", alpha=0.5)

# NEW: put the subplots in a gridplot
p = gridplot([[s1, s2, s3]], toolbar_location=None)

# show the results
show(p)

INFO:bokeh.core.state:Session output file 'linked_panning.html' already exists, will be overwritten.


Although the toolbar is hidden, the pan tool is still present and active. Click and drag the above plots to pan them, and see how their ranges are linked together.

Another linkage that is often useful is **linked brushing** (where a selection on one plot causes a selection to update on other plots). Below is an example that demonstrates linked brushing by sharing a ColumnDataSource between two plots:

In [18]:
from bokeh.plotting import *
from bokeh.models import ColumnDataSource

# prepare some date
N = 300
x = np.linspace(0, 4*np.pi, N)
y0 = np.sin(x)
y1 = np.cos(x)

# output to static HTML file
output_file("linked_brushing.html")

# NEW: create a column data source for the plots to share
source = ColumnDataSource(data=dict(x=x, y0=y0, y1=y1))

TOOLS = "pan,wheel_zoom,box_zoom,reset,save,box_select,lasso_select"

# create a new plot and add a renderer
left = figure(tools=TOOLS, width=350, height=350, title=None)
left.circle('x', 'y0', source=source)

# create another new plot and add a renderer
right = figure(tools=TOOLS, width=350, height=350, title=None)
right.circle('x', 'y1', source=source)

# put the subplots in a gridplot
p = gridplot([[left, right]])

# show the results
show(p)

INFO:bokeh.core.state:Session output file 'linked_brushing.html' already exists, will be overwritten.


### Interact example

This demo shows off an interactive visualization using Bokeh for plotting, and Jupyter interactors for widgets. The demo runs entirely inside the Jupyter notebook, with no Bokeh server required.

The dropdown offers a choice of trigonometry functions to plot, and the sliders control the frequency, amplitude, and phase.

In [19]:
from bokeh.models import ColumnDataSource
from bokeh.io import push_notebook
from bokeh.plotting import figure, show, output_notebook, reset_output

In [20]:
x = np.linspace(0, 2*np.pi, 2000)
y = np.sin(x)

#### Rendering in the notebook instead of HTML

Here are two important functions for rendering Bokeh plots in a Jupyter notebook:

* `reset_output()` resets how the output will be rendered, e.g. cancels a previous `output_file()` command
* `output_notebook()` tells Bokeh to display the plot in the notebook rather than in an html file

In [21]:
reset_output()

In [22]:
output_notebook()

In [23]:
source = ColumnDataSource(data=dict(x=x, y=y))

p = figure(title="simple line example", plot_height=300, plot_width=600)
p.line(x, y, color="#2222aa", line_width=3, source=source, name="foo")

Supplying a user-defined data source AND iterable values to glyph methods is deprecated.

See https://github.com/bokeh/bokeh/issues/2056 for more information.

  warn(message)
Supplying a user-defined data source AND iterable values to glyph methods is deprecated.

See https://github.com/bokeh/bokeh/issues/2056 for more information.

  warn(message)


In [24]:
def update(f, w=1, A=1, phi=0):
    if   f == "sin": func = np.sin
    elif f == "cos": func = np.cos
    elif f == "tan": func = np.tan
    source.data['y'] = A * func(w * x + phi)
    source.push_notebook()

In [25]:
show(p)

In [26]:
from IPython.html.widgets import interact
interact(update, f=["sin", "cos", "tan"], w=(0,100), A=(1,10), phi=(0, 10, 0.1))

  warn(message)


<function __main__.update>

## Addendum: Pandas Panels

Panel is a container for 3-dimensional data. The term panel data is derived from econometrics and is partially responsible for the name pandas: pan(el)-da(ta)-s. The names for the 3 axes give some semantic meaning to describing operations involving panel data and, in particular, econometric analysis of panel data:

* **items** -- axis 0, each item corresponds to a DataFrame contained inside
* **major_axis** -- axis 1, it is the index (rows) of each of the DataFrames
* **minor_axis** -- axis 2, it is the columns of each of the DataFrames

### From 3D ndarray with optional axis labels

In [27]:
data = np.random.randn(2, 5, 4)

In [28]:
data.shape

(2, 5, 4)

In [29]:
data

array([[[-0.32732004,  0.52198088, -0.08541109,  1.40061787],
        [-0.11048481,  0.7104301 , -1.1756112 , -0.01256531],
        [ 1.0716235 , -0.81957425,  0.04205134, -0.89155302],
        [-0.30904165,  0.19599535, -0.65679501,  0.29274351],
        [ 1.2712758 , -1.49189049, -1.60204896,  1.30482555]],

       [[-0.20408336,  0.75895731, -1.15239453,  0.41954575],
        [-0.23725067, -0.53382512,  0.45700558, -0.01536385],
        [-0.50719874, -0.67434529,  0.92010788, -0.72343731],
        [-0.05091193,  0.32762418, -0.71183395,  0.02003949],
        [ 0.07144614, -0.67919117, -0.51325948,  1.33576169]]])

In [30]:
panel1 = pd.Panel(data, items=['Item1', 'Item2'],
    major_axis=pd.date_range('2/25/1852', periods=data.shape[1]),
    minor_axis=['A', 'B', 'C', 'D'])

In [31]:
panel1

<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 5 (major_axis) x 4 (minor_axis)
Items axis: Item1 to Item2
Major_axis axis: 1852-02-25 00:00:00 to 1852-02-29 00:00:00
Minor_axis axis: A to D

In [32]:
panel1.Item1

Unnamed: 0,A,B,C,D
1852-02-25 00:00:00,-0.32732,0.521981,-0.085411,1.400618
1852-02-26 00:00:00,-0.110485,0.71043,-1.175611,-0.012565
1852-02-27 00:00:00,1.071624,-0.819574,0.042051,-0.891553
1852-02-28 00:00:00,-0.309042,0.195995,-0.656795,0.292744
1852-02-29 00:00:00,1.271276,-1.49189,-1.602049,1.304826


In [33]:
panel1.Item2

Unnamed: 0,A,B,C,D
1852-02-25 00:00:00,-0.204083,0.758957,-1.152395,0.419546
1852-02-26 00:00:00,-0.237251,-0.533825,0.457006,-0.015364
1852-02-27 00:00:00,-0.507199,-0.674345,0.920108,-0.723437
1852-02-28 00:00:00,-0.050912,0.327624,-0.711834,0.020039
1852-02-29 00:00:00,0.071446,-0.679191,-0.513259,1.335762


### From dict of DataFrame objects

In [34]:
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
        'Item2' : pd.DataFrame(np.random.randn(4, 2))}
panel2 = pd.Panel(data)

In [35]:
panel2.Item1

Unnamed: 0,0,1,2
0,-1.188125,0.909175,0.816839
1,1.813973,-1.109639,0.745423
2,-1.210121,0.625583,-0.559557
3,2.370146,1.74407,0.079267


In [36]:
panel2.Item2

Unnamed: 0,0,1,2
0,1.645357,0.085816,
1,0.101677,-1.209668,
2,-1.059708,-0.185651,
3,-1.771849,0.506035,


### Indexing and selection

Operation                     | Syntax             | Result
------------------------------|--------------------|----------
Select item                   | `wp[item]`         | DataFrame
Get slice at major_axis label | `wp.major_xs(val)` | DataFrame
Get slice at minor_axis label | `wp.minor_xs(val)` | DataFrame

In [37]:
panel1['Item1']

Unnamed: 0,A,B,C,D
1852-02-25 00:00:00,-0.32732,0.521981,-0.085411,1.400618
1852-02-26 00:00:00,-0.110485,0.71043,-1.175611,-0.012565
1852-02-27 00:00:00,1.071624,-0.819574,0.042051,-0.891553
1852-02-28 00:00:00,-0.309042,0.195995,-0.656795,0.292744
1852-02-29 00:00:00,1.271276,-1.49189,-1.602049,1.304826


In [38]:
panel1.major_axis[2]

Timestamp('1852-02-27 00:00:00', freq='D')

In [39]:
panel1.major_xs(panel1.major_axis[2])

Unnamed: 0,Item1,Item2
A,1.071624,-0.507199
B,-0.819574,-0.674345
C,0.042051,0.920108
D,-0.891553,-0.723437


In [40]:
panel1.minor_axis

Index([u'A', u'B', u'C', u'D'], dtype='object')

In [41]:
panel1.minor_xs('C')

Unnamed: 0,Item1,Item2
1852-02-25 00:00:00,-0.085411,-1.152395
1852-02-26 00:00:00,-1.175611,0.457006
1852-02-27 00:00:00,0.042051,0.920108
1852-02-28 00:00:00,-0.656795,-0.711834
1852-02-29 00:00:00,-1.602049,-0.513259


### Item operations

In [42]:
panel1['Item3'] = panel1['Item1']/panel1['Item2']

In [43]:
panel1['Item3']

Unnamed: 0,A,B,C,D
1852-02-25 00:00:00,1.603855,0.687761,0.074116,3.338415
1852-02-26 00:00:00,0.465688,-1.330829,-2.572422,0.817849
1852-02-27 00:00:00,-2.112828,1.215363,0.045703,1.232385
1852-02-28 00:00:00,6.070123,0.598232,0.92268,14.608334
1852-02-29 00:00:00,17.793483,2.196569,3.121324,0.97684


See additional documentation at <http://pandas.pydata.org/pandas-docs/stable/dsintro.html#panel>.