# Bokeh: Interactive Web Visualizations in Python

## About Me

* PhD Candidate in the Interdisciplinary Quantitative Biology program and at the Institute for Behavioral Genetics at CU
* I work on addiction genomics and phenotyping methods that can scale to millions of subjects.
* <https://github.com/dbrazel>

## Why Bokeh?

* D3-like interactive web visualizations in Python, R, and Julia
    * "We write the JavaScript, so you don't have to!"
* Plays nicely with notebooks and produces portable, embeddable HTML
* Bokeh Server enables streaming updates

## Bokeh Architecture
![Bokeh Architecture](architecture.jpg)
(Peter Wang)

## Interfaces

* `bokeh.charts` - High-level, provides common statistical charts (bar, box, histogram, heat map, chord, etc.)
* `bokeh.plotting` - Mid-level, focused on placing glyphs (boxes, lines, circles, etc.)
* `bokeh.models` - Low-level, uses classes that map directly to BokehJS models

## Setup

In [2]:
import bokeh.charts as bkc
import bokeh.plotting as bkp

bkc.output_notebook()

# Load the seaborn exercise dataset

import pandas as pd
exercise = pd.read_csv('exercise.csv')
exercise = exercise.drop(exercise.columns[0], axis=1)
exercise.head()

Unnamed: 0,id,diet,pulse,time,kind
0,1,low fat,85,1 min,rest
1,1,low fat,85,15 min,rest
2,1,low fat,88,30 min,rest
3,2,low fat,90,1 min,rest
4,2,low fat,92,15 min,rest


## `bokeh.charts`

Standard plots are simple to make, have reasonable defaults, and work well with pandas DataFrames.

In [3]:
p = bkc.BoxPlot(exercise, values='pulse', label='kind', color='kind', 
                title='Pulse in BPM, grouped by exercise type')
bkc.show(p)

It's easy to group by a column and to produce a self-contained HTML file that packages the required data.

In [4]:
#bkc.output_file('exercise_bar.html', mode='inline')
p = bkc.Bar(exercise, values='pulse', label='kind', color='time', agg='median', 
            group='time', title='Median BPM by kind of exercise, grouped by duration')
bkc.show(p)

Now, let's load a dataset with more continuous variables so we can show off how easy it is to create tooltips on a scatter plot.

In [5]:
from bokeh.sampledata.autompg import autompg
autompg.head()

Unnamed: 0,mpg,cyl,displ,hp,weight,accel,yr,origin,name
0,18.0,8,307.0,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140,3449,10.5,70,1,ford torino


In [12]:
tooltips = [
    ('Car Model', '@name'),
    ('Cylinders', '@cyl'),
    ('Horsepower', '@hp'),
    ('Acceleration', '@accel'),
    ('Model Year', '@yr')
]

p = bkc.Scatter(autompg, x='weight', y='mpg', xlabel='Weight (Pounds)', 
                ylabel='Miles Per Gallon', tooltips=tooltips,
               title='Weight vs Fuel Efficiency')
bkc.show(p)

## `bokeh.plotting`

Working with the plotting interface enables more powerful functionality. For example, it is easy to enable linked panning and brushing across plots and to customize the tools that are available.

In [19]:
from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource

# Figures that share a data source will have linked brushing
source = ColumnDataSource(autompg)

TOOLS = "pan,wheel_zoom,box_select,lasso_select,reset"

f1 = bkp.figure(width=500, height=500, tools=TOOLS, title='Weight vs MPG')
f1.square('weight', 'mpg', alpha=0.8, size=10, source=source)

# Figures that share a range will have linked panning
f2 = bkp.figure(width=500, height=500, x_range=f1.x_range, tools=TOOLS,
                title='Weight vs Acceleration')
f2.triangle('weight', 'accel', color='red', alpha=0.8, size=10, source=source)

p = gridplot([[f1, f2]])
bkp.show(p)

## Using Map Tiles

Here, I'll use some geolocation test data I collected to demonstrate Bokeh's handling of map data. I've already converted the coordinates to Web Mercator and done some other processing to make things easier.

In [22]:
from bokeh.tile_providers import STAMEN_TONER
from bokeh.models import HoverTool

locations = pd.read_csv('dmb_vancouver_locs.csv')

hover = HoverTool(tooltips=[('Sample Time', '@sample_time'), ('Accuracy', '@accuracy{int} meters'), 
                            ('Offset', '@sample_timezone')])
p = bkp.figure(width=700, height=500, tools=[hover, 'pan', 'wheel_zoom', 'reset', 'box_zoom'])
source = ColumnDataSource(locations)
p.circle(x='longitude', y='latitude', alpha=0.9, size=10, source=source)
p.add_tile(STAMEN_TONER)
bkp.show(p)