## Lesson 18 - Interactive Visualization with Bokeh

From the Bokeh [homepage](http://bokeh.pydata.org/en/latest/):

Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.

To get started using Bokeh to make your visualizations, see the [User Guide](http://bokeh.pydata.org/en/latest/docs/user_guide.html).

A complete API reference of Bokeh is at [Reference Guide](http://bokeh.pydata.org/en/latest/docs/reference.html).

To see examples of how you might use Bokeh with your own data, check out the [Gallery](http://bokeh.pydata.org/en/latest/docs/gallery.html) and the [Notebook Viewer Gallery](http://nbviewer.jupyter.org/github/bokeh/bokeh-notebooks/blob/master/index.ipynb).

### Getting started

#### Installation

First make sure you have `bokeh` installed. We will also use `pandas-datareader` to download stock quote data.

```
conda install bokeh
pip install pandas-datareader
```

#### Two interface levels for Bokeh

1. `bokeh.models` -- a low-level interface that provides the most flexibility to application developers
2. `bokeh.plotting` -- a higher-level interface centered around composing visual glyphs

### Example: bokeh.plotting interface

Use the figure object, within which we create "glyphs":

In [1]:
from bokeh.plotting import figure, output_file, show

Create a figure object, then create some glyphs, then customize it:

In [2]:
p = figure(plot_width=500, plot_height=400) # can also leave figure() empty

p.circle([0, 1, 2, 3, 4], [112, 7, 73, 6, 2], size=12, color='red')
p.triangle([0, 1, 2, 3, 4], [100, 12, 68, 15, 10], size=[5, 10, 15, 20, 25], 
           color='blue', alpha=0.5)

p.title.text = 'plotting example'
p.title.text_color = 'orange'

p.xaxis.axis_label = 'X-axis label'
p.yaxis.axis_label = 'Y-axis label'

p.xaxis.minor_tick_line_color = 'red'
p.yaxis.minor_tick_line_color = None

output_file('scatter_plotting.html', mode='cdn') # cdn = content delivery network

show(p)

Create a new plot with more glyphs and elements:

In [3]:
# prepare some data
x = [0.1, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0]
y0 = [i**2 for i in x]
y1 = [10**i for i in x]
y2 = [10**(i**2) for i in x]

# output to static HTML file
output_file("log_lines.html")

# create a new plot
p = figure(
   tools="pan,box_zoom,reset,save",
   y_axis_type="log", y_range=[0.001, 10**11], title="log axis example",
   x_axis_label='sections', y_axis_label='particles'
)

# add some renderers
p.line(x, x, legend="y=x")
p.circle(x, x, legend="y=x", fill_color="white", size=8)
p.line(x, y0, legend="y=x^2", line_width=3)
p.line(x, y1, legend="y=10^x", line_color="red")
p.circle(x, y1, legend="y=10^x", fill_color="red", line_color="red", size=6)
p.line(x, y2, legend="y=10^x^2", line_color="orange", line_dash="4 4")

# show the results
show(p)

### About the HTML file

Open up the HTML source code in a text editor.

The HTML file is composed of HTML tags and code. The code is composed of Javascript and CSS. Javascript is the scripting language that makes possible interaction between the elements in the browser. CSS controls the styling of text and everything else.

The style sheet is stored in a remote server. Every time you generate a graph, Bokeh is pulling the CSS and Javascript files from a server (pydata.org). This is the default mode, called CDN (content delivery network). There are several options for where to host these files:

* `mode=cdn` -- online (default)
* `mode=relative` -- locally with relative path
* `mode=absolute` -- locally with absolute path
* `mode=inline` -- locally in the HTML file (stand-alone option)

### Tools

* Customization: `logo`, `tools`, `toolbar_location`.
* Lots more about tools [here](http://bokeh.pydata.org/en/latest/docs/user_guide/tools.html).
* By the way, another useful glyph is `line`.

In [4]:
p = figure(plot_width=500, plot_height=400, logo=None, 
           tools='pan,wheel_zoom,box_zoom,lasso_select,reset,save',
           toolbar_location='below', toolbar_sticky=False) 

p.circle([0, 1, 2, 3, 4], [112, 7, 73, 6, 2], size=12, color='red')
p.line([0, 1, 2, 3, 4], [112, 7, 73, 6, 2], line_width=2, color='blue')

output_file('scatter_tools.html', mode='inline')

show(p)

### Time-series graphs

#### Stock data

In [5]:
import pandas_datareader as pdr
import pandas as pd

In [6]:
start = pd.datetime(2007, 6, 29) # the date the original iphone was released
end = pd.datetime.today()
aapl = pdr.data.DataReader('AAPL', 'google', start, end)
# if this fails, check https://pydata.github.io/pandas-datareader/stable/remote_data.html for alternate sources

The Google Finance API has not been stable since late 2017. Requests seem
to fail at random. Failure is especially common when bulk downloading.



In [7]:
aapl.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2007-06-29,17.42,17.71,17.3,17.43,284032539
2007-07-02,17.29,17.44,17.04,17.32,249049955
2007-07-03,17.43,18.2,17.36,18.17,290620330
2007-07-05,18.4,19.0,18.38,18.96,363262732
2007-07-06,19.02,19.05,18.63,18.9,218623258


The key part of a time-series graph is: ` x_axis_type='datetime'`.

In [8]:
p = figure(width=1000, height=500, x_axis_type='datetime')

p.line(aapl.index, aapl.Close, color='maroon', alpha=0.5)

p.xaxis.axis_label = 'Date'
p.yaxis.axis_label = 'Price'

p.ygrid.band_fill_color="olive"
p.ygrid.band_fill_alpha = 0.1

output_file('timeseries.html')

show(p)

#### Precipitation data

In [9]:
# format the data: covert to datetime, average precipitation per month, get month and year, reset index
df = pd.read_csv('../data/la_jolla_precip_monthly.csv')
df['DATE'] = pd.to_datetime(df['DATE'])
df = df.groupby('DATE').mean()
df['MONTH'] = [x.month for x in df.index]
df['YEAR'] = [x.year for x in df.index]
df.reset_index(inplace=True)
df.head()

Unnamed: 0,DATE,LATITUDE,LONGITUDE,ELEVATION,PRCP,MONTH,YEAR
0,2008-12-01,32.8254,-117.2397,154.8,112.4,12,2008
1,2009-01-01,32.8254,-117.2397,154.8,6.9,1,2009
2,2009-02-01,32.8254,-117.2397,154.8,72.6,2,2009
3,2009-03-01,32.82555,-117.2449,152.25,6.1,3,2009
4,2009-04-01,32.82555,-117.2449,152.25,2.05,4,2009


In [10]:
p = figure(width=1000, height=500, x_axis_type='datetime', 
           tools='pan,wheel_zoom,box_zoom,lasso_select,hover,reset,save')

p.circle(df.DATE, df.PRCP, color='blue', size=10, alpha=0.5)
p.line(df.DATE, df.PRCP, color='blue')

output_file('precip.html')

show(p)

### Vectorized colors and sizes

This example shows how it is possible to provide sequences of data values for glyph attributes like `fill_color` and `radius`. Other things to look out for in this example:

* supplying an explicit list of tool names to `figure()`
* fetching BokehJS resources from CDN using the mode argument
* setting the `x_range` and `y_range` explicitly
* turning a line off (by setting its value to `None`)
* using NumPy arrays for supplying data
* supplying a list of tools ([documentation](http://bokeh.pydata.org/en/latest/docs/user_guide/tools.html#configuring-plot-tools))

In [11]:
import numpy as np

In [12]:
# prepare some data
N = 4000
x = np.random.random(size=N) * 100
y = np.random.random(size=N) * 100
radii = np.random.random(size=N) * 1.5
colors = [
    "#%02x%02x%02x" % (int(r), int(g), 150) for r, g in zip(50+2*x, 30+2*y)
]

# output to static HTML file (with CDN resources)
output_file("color_scatter.html", title="color scatter example", mode="cdn")

TOOLS = 'box_select, box_zoom, lasso_select, pan, poly_select, tap, wheel_zoom, undo, redo, reset, save, zoom_in, zoom_out, crosshair, hover'

# create a new plot with the tools above, and explicit ranges
p = figure(tools=TOOLS, x_range=(0,100), y_range=(0,100))

# add a circle renderer with vectorized colors and sizes
p.circle(x,y, radius=radii, fill_color=colors, fill_alpha=0.6, line_color=None)

# show the results
show(p)

### Linked panning and brushing

Linking together various aspects of different plots can be a useful technique for data visualization. In Bokeh, such linkages are typically accomplished by sharing some plot component between plots. Below is an example that demonstrates **linked panning** (where changing the range of one plot causes others to update) by sharing range objects between the plots. Some other things to look out for in this example:

* calling `figure()` multiple times to create multiple plots
* using `gridplot()` to arrange several plots in an array
* showing different glyphs glyph methods `Figure.triangle` and `Figure.square`
* hiding the toolbar by setting `toolbar_location` to `None`
* setting convenience arguments `color` (sets both `line_color` and `fill_color`) and `alpha` (sets both `line_alpha` and `fill_alpha`)

In [13]:
from bokeh.layouts import gridplot
from bokeh.plotting import figure, output_file, show

# prepare some data
N = 100
x = np.linspace(0, 4*np.pi, N)
y0 = np.sin(x)
y1 = np.cos(x)
y2 = np.sin(x) + np.cos(x)

# output to static HTML file
output_file("linked_panning.html")

# create a new plot
s1 = figure(width=250, plot_height=250, title=None)
s1.circle(x, y0, size=10, color="navy", alpha=0.5)

# NEW: create a new plot and share both ranges
s2 = figure(width=250, height=250, x_range=s1.x_range, y_range=s1.y_range, title=None)
s2.triangle(x, y1, size=10, color="firebrick", alpha=0.5)

# NEW: create a new plot and share only one range
s3 = figure(width=250, height=250, x_range=s1.x_range, title=None)
s3.square(x, y2, size=10, color="olive", alpha=0.5)

# NEW: put the subplots in a gridplot
p = gridplot([[s1, s2, s3]], toolbar_location=None)

# show the results
show(p)

Although the toolbar is hidden, the pan tool is still present and active. Click and drag the above plots to pan them, and see how their ranges are linked together.

Another linkage that is often useful is **linked brushing** (where a selection on one plot causes a selection to update on other plots). Below is an example that demonstrates linked brushing by sharing a ColumnDataSource between two plots:

In [14]:
from bokeh.plotting import *
from bokeh.models import ColumnDataSource

# prepare some date
N = 300
x = np.linspace(0, 4*np.pi, N)
y0 = np.sin(x)
y1 = np.cos(x)

# output to static HTML file
output_file("linked_brushing.html")

# NEW: create a column data source for the plots to share
source = ColumnDataSource(data=dict(x=x, y0=y0, y1=y1))

TOOLS = "pan,wheel_zoom,box_zoom,reset,save,box_select,lasso_select,hover"

# create a new plot and add a renderer
left = figure(tools=TOOLS, width=350, height=350, title=None)
left.circle('x', 'y0', source=source)

# create another new plot and add a renderer
right = figure(tools=TOOLS, width=350, height=350, title=None)
right.circle('x', 'y1', source=source)

# put the subplots in a gridplot
p = gridplot([[left, right]])

# show the results
show(p)

### Interact example

This demo shows off an interactive visualization using Bokeh for plotting, and Jupyter interactors for widgets. The demo runs entirely inside the Jupyter notebook, with no Bokeh server required.

The dropdown offers a choice of trigonometry functions to plot, and the sliders control the frequency, amplitude, and phase.

In [15]:
from ipywidgets import interact
import numpy as np

from bokeh.io import push_notebook, show, output_notebook
from bokeh.plotting import figure
output_notebook()

In [16]:
x = np.linspace(0, 2*np.pi, 2000)
y = np.sin(x)

In [17]:
p = figure(title="simple line example", plot_height=300, plot_width=600, y_range=(-5,5))
r = p.line(x, y, color="#2222aa", line_width=3)

In [18]:
def update(f, w=1, A=1, phi=0):
    if   f == "sin": func = np.sin
    elif f == "cos": func = np.cos
    elif f == "tan": func = np.tan
    r.data_source.data['y'] = A * func(w * x + phi)
    push_notebook()

In [19]:
show(p, notebook_handle=True)

In [20]:
interact(update, f=["sin", "cos", "tan"], w=(0,100), A=(1,5), phi=(0, 20, 0.1))

<function __main__.update>

What this is supposed to look like! https://bokeh.pydata.org/en/latest/docs/user_guide/notebook.html#jupyter-interactors

## Addendum: Pandas Panels

Panel is a container for 3-dimensional data. The term panel data is derived from econometrics and is partially responsible for the name pandas: pan(el)-da(ta)-s. The names for the 3 axes give some semantic meaning to describing operations involving panel data and, in particular, econometric analysis of panel data:

* **items** -- axis 0, each item corresponds to a DataFrame contained inside
* **major_axis** -- axis 1, it is the index (rows) of each of the DataFrames
* **minor_axis** -- axis 2, it is the columns of each of the DataFrames

### From 3D ndarray with optional axis labels

In [21]:
import numpy as np
import pandas as pd

In [22]:
data = np.random.randn(2, 5, 4)

In [23]:
data.shape

(2, 5, 4)

In [24]:
data

array([[[ 1.75096624,  0.65556618, -0.10007329,  0.37582542],
        [-2.00852604,  0.16424924, -0.08516157, -0.14118034],
        [ 0.87047175, -0.95286499,  0.13297243, -1.34238875],
        [ 0.72938221,  0.46707014,  0.19111334,  1.01504314],
        [ 0.30068285, -0.66560047, -0.47830819, -0.32585423]],

       [[ 1.35961042,  0.5711701 , -0.93532469,  1.03414193],
        [-1.018831  , -0.6969649 , -1.09888308,  0.11786802],
        [-1.32336306, -0.17684415,  2.59423053,  0.58195235],
        [-0.88314767, -0.87869987,  0.08630181,  0.11684831],
        [ 1.96734997,  0.94845456, -0.40561842, -0.39463591]]])

In [25]:
panel1 = pd.Panel(data, items=['Item1', 'Item2'],
    major_axis=pd.date_range('2/25/1852', periods=data.shape[1]),
    minor_axis=['A', 'B', 'C', 'D'])

In [26]:
panel1

<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 5 (major_axis) x 4 (minor_axis)
Items axis: Item1 to Item2
Major_axis axis: 1852-02-25 00:00:00 to 1852-02-29 00:00:00
Minor_axis axis: A to D

In [27]:
panel1.Item1

Unnamed: 0,A,B,C,D
1852-02-25,1.750966,0.655566,-0.100073,0.375825
1852-02-26,-2.008526,0.164249,-0.085162,-0.14118
1852-02-27,0.870472,-0.952865,0.132972,-1.342389
1852-02-28,0.729382,0.46707,0.191113,1.015043
1852-02-29,0.300683,-0.6656,-0.478308,-0.325854


In [28]:
panel1.Item2

Unnamed: 0,A,B,C,D
1852-02-25,1.35961,0.57117,-0.935325,1.034142
1852-02-26,-1.018831,-0.696965,-1.098883,0.117868
1852-02-27,-1.323363,-0.176844,2.594231,0.581952
1852-02-28,-0.883148,-0.8787,0.086302,0.116848
1852-02-29,1.96735,0.948455,-0.405618,-0.394636


### From dict of DataFrame objects

In [29]:
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
        'Item2' : pd.DataFrame(np.random.randn(4, 2))}
panel2 = pd.Panel(data)

In [30]:
panel2.Item1

Unnamed: 0,0,1,2
0,0.34347,-0.581592,-0.221928
1,0.116908,0.321611,0.211397
2,-0.779497,0.658704,0.006724
3,-0.400529,-0.301046,0.159996


In [31]:
panel2.Item2

Unnamed: 0,0,1,2
0,1.064061,0.862293,
1,-1.377671,0.076045,
2,-1.271046,-0.942552,
3,-0.118975,-2.196566,


### Indexing and selection

Operation                     | Syntax             | Result
------------------------------|--------------------|----------
Select item                   | `wp[item]`         | DataFrame
Get slice at major_axis label | `wp.major_xs(val)` | DataFrame
Get slice at minor_axis label | `wp.minor_xs(val)` | DataFrame

In [32]:
panel1['Item1']

Unnamed: 0,A,B,C,D
1852-02-25,1.750966,0.655566,-0.100073,0.375825
1852-02-26,-2.008526,0.164249,-0.085162,-0.14118
1852-02-27,0.870472,-0.952865,0.132972,-1.342389
1852-02-28,0.729382,0.46707,0.191113,1.015043
1852-02-29,0.300683,-0.6656,-0.478308,-0.325854


In [33]:
panel1.major_axis

DatetimeIndex(['1852-02-25', '1852-02-26', '1852-02-27', '1852-02-28',
               '1852-02-29'],
              dtype='datetime64[ns]', freq='D')

In [34]:
panel1.major_xs('1852-02-25')

Unnamed: 0,Item1,Item2
A,1.750966,1.35961
B,0.655566,0.57117
C,-0.100073,-0.935325
D,0.375825,1.034142


In [35]:
panel1.major_xs(panel1.major_axis[2])

Unnamed: 0,Item1,Item2
A,0.870472,-1.323363
B,-0.952865,-0.176844
C,0.132972,2.594231
D,-1.342389,0.581952


In [36]:
panel1.minor_axis

Index(['A', 'B', 'C', 'D'], dtype='object')

In [37]:
panel1.minor_xs('C')

Unnamed: 0,Item1,Item2
1852-02-25,-0.100073,-0.935325
1852-02-26,-0.085162,-1.098883
1852-02-27,0.132972,2.594231
1852-02-28,0.191113,0.086302
1852-02-29,-0.478308,-0.405618


### Item operations

In [38]:
panel1['Item3'] = panel1['Item1']/panel1['Item2']

In [39]:
panel1['Item3']

Unnamed: 0,A,B,C,D
1852-02-25,1.287844,1.14776,0.106993,0.363418
1852-02-26,1.971403,-0.235664,0.077498,-1.197783
1852-02-27,-0.657772,5.388162,0.051257,-2.306699
1852-02-28,-0.825889,-0.531547,2.214477,8.686845
1852-02-29,0.152836,-0.701774,1.179207,0.825709


See additional documentation at <http://pandas.pydata.org/pandas-docs/stable/dsintro.html#panel>.