# Interavtive Data Visualization with Bokeh
- William Surles
- 2017-11-14
- Datacamp class
- [https://www.datacamp.com/courses/interactive-data-visualization-with-bokeh](https://www.datacamp.com/courses/interactive-data-visualization-with-bokeh)

## Whats Covered

- Basic Plotting with Bokeh
 - Plotting with glyphs
 - Additional glyphs
 - Data formats
 - Customizing glyphs
- Layouts, Interactions, and Annotations
 - Introduction to layouts
 - Advanced layouts
 - Linking plots together
 - Annotations and guides
- Building interactive apps with Bokeh
 - Introducing the Bokeh Server
 - Connecting sliders to plots
 - Updating plots from dropdowns
 - Buttons
 - Hosting applications for wider audiences
- Putting it all together! A case study
 - Time to put it all together!
 - Starting the app
 - Adding more interactivity to the app
 - Congratulations!

## Additonal Resources

- General Documentation
 - [bokeh.io reference guide](https://bokeh.pydata.org/en/latest/docs/reference/io.html)
- Markers and glyphs
 - [bokeh markers](http://bokeh.pydata.org/en/latest/docs/gallery/markers.html)
 - [Plotting with basic glyphs](https://bokeh.pydata.org/en/latest/docs/user_guide/plotting.html#)
 - [full list of glyphs](https://bokeh.pydata.org/en/latest/docs/reference/models/glyphs.html)

## Libraries and Data

In [56]:
import pandas as pd
import numpy as np
from bokeh.io import output_notebook
from bokeh.plotting import figure, show
from bokeh.models import HoverTool

# Basic Plotting with Bokeh

## Plotting with glyphs

#### What are Glyphs
- Visual shapes
 - circles, squares, triangles
 - rectangles, lines, wedges
- With properties attached to data
 - coordinates (x,y)
 - size, color, transparency

#### Typical usage
 - Its common to use output_file to print out these charts as html files tobe viewed in a browser
 - or to use output_notebook to put them inline in a notebook

In [2]:
output_notebook()

In [4]:
plot = figure(plot_width=400, tools = 'pan, box_zoom')
plot.circle([1,2,3,4,5], [8,6,5,2,3])
show(plot)

#### Glyph properties
- List, arrays, sequences of values will all work
- Single fixed values can work (e.g things like color)


In [11]:
plot = figure()
plot.circle(x=10, y=[2,5,8,12], size=[10,20,30,40], color = 'green')
show(plot)

#### Markers
- We will just use circles here but there are many marker types
- [bokeh markers](http://bokeh.pydata.org/en/latest/docs/gallery/markers.html)

#### Another example I found on the web
- Oooooo, pretty

In [54]:
N = 2000
x = np.random.random(size=N) * 100
y = np.random.random(size=N) * 100
radii = np.random.random(size=N) * 1.5
colors = ["#%02x%02x%02x" % (r, g, 150) for r,g in zip(np.floor(50+2*x).astype(int), np.floor(30+2*y).astype(int))]

hover = HoverTool(tooltips = [
    ("index", "$index"),
    ("(x,y)", "($x, $y)"),
    ("radius", "@radius"),
    ("fill color", "$color[hex, swatch]:fill_color")
])

p = figure(title = "Just an example plot", tools=[hover, 'wheel_zoom', 'reset'])
p.circle(x, y, radius = radii, fill_color = colors, fill_alpha = 0.6, line_color = None)
show(p)

### A simple scatter plot

#### Load the fertility data first
- This data has 182 datpoints. but most countries are missing data. Some of the commas in the file are off too. Its quite a mess.
- I did not find a clean similar file on the web with a quick search so I am just going to use this for now. 
- The point is to practice the charting and I can still do that

In [70]:
file = 'https://assets.datacamp.com/production/course_1392/datasets/literacy_birth_rate.csv'
female = pd.read_csv(file)
print(female.shape)
female.head()

(182, 5)


Unnamed: 0,Country,Continent,female literacy,fertility,population
0,Chine,ASI,90.5,1.769,1324655000.0
1,Inde,ASI,50.8,2.682,1139965000.0
2,USA,NAM,99.0,2.077,304060000.0
3,Indonésie,ASI,88.8,2.132,227345100.0
4,Brésil,LAT,90.2,1.827,191971500.0


In [71]:
fertility = female.fertility
female_literacy = female['female literacy']

In [72]:
# Create the figure: p
p = figure(x_axis_label='fertility (children per woman)', y_axis_label='female_literacy (% population)')

# Add a circle glyph to the figure p
p.circle(fertility, female_literacy)

# Display the plot
show(p)



### A scatter plot with different shapes

In [75]:
print(female.columns)
female['Continent'].unique()

Index(['Country ', 'Continent', 'female literacy', 'fertility', 'population'], dtype='object')


array(['ASI', 'NAM', 'LAT', 'AF', 'EUR', 'OCE', nan, 'Continent', 'WORLD'], dtype=object)

In [76]:
fertility_latinamerica = female['fertility'][female['Continent'] == 'LAT']
female_literacy_latinamerica = female['female literacy'][female['Continent'] == 'LAT']
print(fertility_latinamerica.shape)
print(female_literacy_latinamerica.shape)

(24,)
(24,)


In [78]:
fertility_africa = female['fertility'][female['Continent'] == 'AF']
female_literacy_africa = female['female literacy'][female['Continent'] == 'AF']
print(fertility_africa.shape)
print(female_literacy_africa.shape)

(49,)
(49,)


In [80]:
# Create the figure: p
p = figure(x_axis_label='fertility', 
    y_axis_label='female_literacy (% population)')

# Add a circle glyph to the figure p
p.circle(fertility_latinamerica, female_literacy_latinamerica)

# Add an x glyph to the figure p
p.x(fertility_africa, female_literacy_africa)

# Display the plot
show(p)


### Customizing your scatter plots
- The three most important arguments to customize scatter glyphs are color, size, and alpha. 
- Bokeh accepts colors as hexadecimal strings, tuples of RGB values between 0 and 255, and any of the 147 CSS color names. 
- Size values are supplied in screen space units with 100 meaning the size of the entire figure.

In [81]:
# Create the figure: p
p = figure(
    x_axis_label='fertility (children per woman)', 
    y_axis_label='female_literacy (% population)')

# Add a blue circle glyph to the figure p
p.circle(
    fertility_latinamerica, 
    female_literacy_latinamerica, 
    color = 'blue', 
    size=10, 
    alpha = 0.8)

# Add a red circle glyph to the figure p
p.circle(
    fertility_africa, 
    female_literacy_africa, 
    color = 'red', 
    size=10, 
    alpha = 0.8)

# Display the plot
show(p)


## Additional glyphs

#### Lines

In [83]:
x = [1,2,3,4,5]
y = [8,6,5,2,3]
plot = figure()
plot.line(x,y, line_width=3)
show(plot)

#### Lines and Markers together

In [85]:
plot = figure()
plot.line(x, y, line_width=2)
plot.circle(x, y, fill_color = 'white', size = 10)
show(plot)

#### Patches
- Useful for showing geographic regions
- Data given as list of lists
 - one list for x and one for y

In [87]:
xs = [[1,1,2,2],[2,2,4],[2,2,3,3]]
ys = [[2,5,5,2],[3,5,5],[2,3,4,2]]
plot = figure()
plot.patches(xs, ys, 
            fill_color = ['red','blue','green'],
            line_color = 'white')

show(plot)

#### Other glyphs
- annulus(), wedge(), rect(), hbar(), etc, etc
- [Plotting with basic glyphs](https://bokeh.pydata.org/en/latest/docs/user_guide/plotting.html#)
- [full list of glyphs](https://bokeh.pydata.org/en/latest/docs/reference/models/glyphs.html)

### Lines

In [98]:
file = 'https://assets.datacamp.com/production/course_1392/datasets/aapl.csv'
stock = pd.read_csv(file, parse_dates = ['date'], index_col = 0)
stock.head()

Unnamed: 0,adj_close,close,date,high,low,open,volume
0,31.68,130.31,2000-03-01,132.06,118.5,118.56,38478000
1,29.66,122.0,2000-03-02,127.94,120.69,127.0,11136800
2,31.12,128.0,2000-03-03,128.23,120.0,124.87,11565200
3,30.56,125.69,2000-03-06,129.13,125.0,126.0,7520000
4,29.87,122.87,2000-03-07,127.44,121.12,126.44,9767600


In [99]:
# Create a figure with x_axis_type="datetime": p
p = figure(
    x_axis_type = 'datetime', 
    x_axis_label='Date', 
    y_axis_label='US Dollars')

# Plot date along the x axis and price along the y axis
p.line(stock.date, stock.close)

# show the result
show(p)


### Lines and markers

In [102]:
stock2 = stock[:365]
stock2.shape

(365, 7)

In [103]:
# Create a figure with x_axis_type='datetime': p
p = figure(x_axis_type='datetime', x_axis_label='Date', y_axis_label='US Dollars')

# Plot date along the x-axis and price along the y-axis
p.line(stock2.date, stock2.close)

# With date on the x-axis and price on the y-axis, add a white circle glyph of size 4
p.circle(stock2.date, stock2.close, fill_color='white', size=4)

# Show the result
show(p)

### Patches

In [104]:
%run state_geometry.py
co_lats[:6]

[38.215, 38.40118, 38.60929, 38.81393, 38.95788, 39.11656]

In [125]:
p = figure()

# Create a list of az_lons, co_lons, nm_lons and ut_lons: x
x = [az_lons, co_lons, nm_lons, ut_lons]

# Create a list of az_lats, co_lats, nm_lats and ut_lats: y
y = [az_lats, co_lats, nm_lats, ut_lats]

In [126]:
# Add patches to figure p with line_color=white for x and y
p.patches(x, y, line_color = 'black')

# Show the result
show(p)

#### I'm flattening the x and y lists and plotting as circles just to see how its drawing the shapes

In [130]:
flatten = lambda l: [item for sublist in l for item in sublist]

x_flat = flatten(x)
print(x_flat[:6])

y_flat = flatten(y)
print(y_flat[:6])

# I'm adding this just to see how its drawing the points
p.circle(x_flat, y_flat, fill_color = 'white', color = 'black', size = 5)
show(p)

[-114.63332, -114.63349, -114.63423, -114.60899, -114.63064, -114.57354]
[34.87057, 35.00186, 35.00332, 35.07971, 35.11791, 35.14231]


## Data formats

#### Lists
- We used this above

In [137]:
x = [1,2,3,4,5]
y = [8,6,5,2,3]

plot = figure()
plot.line(x,y, line_width = 3)
plot.circle(x, y, fill_color='white', size = 10)
show(plot)

#### Numpy Arrays
- the foundation of the python science stack... yada yada
- enter standard random random array [stage left]

In [136]:
x = np.linspace(0,10,1000)
y = np.sin(x) + np.random.random(1000) * 0.2

plot = figure()
plot.line(x, y)
show(plot)

#### Pandas
- The R dataframe copied into python

In [139]:
from bokeh.sampledata.iris import flowers
flowers.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [140]:
plot = figure()
plot.circle(flowers['petal_length'], flowers['sepal_length'], size = 10)
show(plot)

#### Column Data Sources
- Common fundamental data structure of Bokeh
- Maps string column names to sequence of data
- Often created automatically for you
- Can be shared between glpyhs to link selections
- Extra columns can be used with hover tooltips
- Other points of note
 - need to import them from bokeh.modesl
 - needs string keys and array values
 - all the columns must be the same length
 - easy to make from a pandas data frame

In [141]:
from bokeh.models import ColumnDataSource
source = ColumnDataSource(
    data = {'x': [1,2,3,4,5],
            'y': [8,6,5,2,3]})
source.data

{'x': [1, 2, 3, 4, 5], 'y': [8, 6, 5, 2, 3]}

In [150]:
source = ColumnDataSource(flowers)
source

### Plotting data from NumPy arrays

In [152]:
# Create array using np.linspace: x
x = np.linspace(0,5,100)

# Create array using np.cos: y
y = np.cos(x)

# Add circles at x and y
p = figure()
p.circle(x, y)
show(p)

### Plotting data from Pandas DataFrames

In [153]:
file = 'https://assets.datacamp.com/production/course_1392/datasets/auto-mpg.csv'
df = pd.read_csv(file)
df.head()

Unnamed: 0,mpg,cyl,displ,hp,weight,accel,yr,origin,name,color,size
0,18.0,6,250.0,88,3139,14.5,71,US,ford mustang,blue,15.0
1,9.0,8,304.0,193,4732,18.5,70,US,hi 1200d,blue,20.0
2,36.1,4,91.0,60,1800,16.4,78,Asia,honda civic cvcc,red,10.0
3,18.5,6,250.0,98,3525,19.0,77,US,ford granada,blue,15.0
4,34.3,4,97.0,78,2188,15.8,80,Europe,audi 4000,green,10.0


In [154]:
# Create the figure: p
p = figure(x_axis_label='HP', y_axis_label='MPG')

# Plot mpg vs hp by color
p.circle(df.hp, df.mpg, color = df.color, size = 10)

show(p)


### The Bokeh ColumnDataSource

In [155]:
file = 'https://assets.datacamp.com/production/course_1392/datasets/sprint.csv'
df = pd.read_csv(file)
df.head()

Unnamed: 0,Name,Country,Medal,Time,Year,color
0,Usain Bolt,JAM,GOLD,9.63,2012,goldenrod
1,Yohan Blake,JAM,SILVER,9.75,2012,silver
2,Justin Gatlin,USA,BRONZE,9.79,2012,saddlebrown
3,Usain Bolt,JAM,GOLD,9.69,2008,goldenrod
4,Richard Thompson,TRI,SILVER,9.89,2008,silver


In [157]:
p = figure()

# Create a ColumnDataSource from df: source
source = ColumnDataSource(df)

# Add circle glyphs to the figure p
p.circle('Year', 'Time', source = source, color = 'color', size = 8)

# Show plot
show(p)


## Customizing glyphs

#### Selection appearance

In [161]:
plot = figure(tools='box_select, lasso_select')

plot.circle(flowers.petal_length, flowers.sepal_length)

show(plot)

In [164]:
plot = figure(tools='box_select, lasso_select')

plot.circle(flowers.petal_length, flowers.sepal_length,
           selection_color = 'red',
           nonselection_fill_alpha = 0.4,
           nonselection_fill_color = 'grey')

show(plot)

#### Hover appearance

In [166]:
from bokeh.models import HoverTool
hover = HoverTool(tooltips=None, mode='hline')

plot = figure(tools= [hover, 'crosshair'])
plot.circle(flowers.petal_length, flowers.sepal_length,
           size = 5, hover_color = 'red')
show(plot)

#### Color mapping

In [172]:
source = ColumnDataSource(flowers)
flowers.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [177]:
from bokeh.models import CategoricalColorMapper

mapper = CategoricalColorMapper(
    factors = ['setosa','virginica','versicolor'],
    palette = ['orange','lightgreen','skyblue'])

plot = figure(
    x_axis_label = 'petal_length',
    y_axis_label = 'sepal_length')

plot.circle(
    x = 'petal_length',
    y = 'sepal_length',
    size = 10, 
    source = source,
    color = {'field': 'species', 'transform': mapper})

show(plot)

### Selection and non-selection glyphs

### Hover glyphs

### Colormapping

# Layouts, Interactions, and Annotations

## Introduction to layouts

### Creating rows of plots

### Creating columns of plots

### Nesting rows and columns of plots

## Advanced layouts

### Investigating the layout API

### Creating gridded layouts

### Starting tabbed layouts

### Displaying tabbed layouts

## Linking plots together

### Linked axes

### Linked brushing

## Annotations and guides

### How to create legends

### Positioning and styling legends

### Hover tooltips for exposing details

### Adding a hover tooltip

# Building interactive apps with Bokeh

## Introducing the Bokeh Server

### Understanding Bokeh apps

### Using the current document

### Add a single slider

### Multiple sliders in one document

## Connecting sliders to plots

### Adding callbacks to sliders

### How to combine Bokeh models into layouts

### Learn about widget callbacks

## Updating plots from dropdowns

### Updating data sources from dropdown callbacks

### Synchronize two dropdowns

## Buttons

### Button widgets

### Button styles

## Hosting applications for wider audiences

# Putting It All Together! A Case Study

## Time to put it all together!

### Introducing the project dataset

### Some exploratory plots of the data

## Starting the app

### Beginning with just a plot

### Enhancing the plot with some shading

### Adding a slider to vary the year

### Customizing based on user input

## Adding more interactivity to the app

### Adding a hover tool

### Adding dropdowns to the app

## Congratulations!