# Introduction to bokeh

## Top 5 fun and entertaining plots
---

<img src='./images/logos.3.600.wide.png' height='250' width='300' style="float:right">

Next, we will take a look at five sample plots include many diverse examples of how bokeh can be used for awesome data visualizations.
   0. **scatterplot**
   1. **histogram** with pdf and cdf curves
   2. graph with **interactive sliders** to set parameters
   3. **visualization tools**: pan, zoom, hover, crosshairs, lasso select, box select, poly select, save, undo, redo, tap
   4. **geographical map**
 

This material is based largely upon the sample graphs shown here:
http://bokeh.pydata.org/en/latest/docs/gallery.html
 

# Sample data for your amusement is available:



Bokeh comes preinstalled with various types of sample data: look in the site-packages folder found in your virtual environment folder...

`<path_to_where_you_installed_miniconda>/miniconda3/envs/bokeh_tut/lib/python3.5/site-packages/bokeh/sampledata/`

Bokeh provides ready access to some of the smaller datasets via import, as we will see.

There are other much larger datasets available as well, that we will be using and will get later, using: 

`bokeh.sampledata.download()`


# Simple scatterplot
---

<img src='./images/220px-Iris_versicolor_3.jpg' height='250' width='300' style="float:right">

In [1]:
# Let's start by importing some of the data

from bokeh.plotting import figure, show, output_notebook
from bokeh.sampledata.iris import flowers

# When flowers is imported, it comes in as a pandas DataFrame
# DataFrames have a function called sample, that will let us see a representative
#     sample of all the data... let's look at 10 rows.

print(type(flowers))
print(len(flowers))
flowers.sample(10)


<class 'pandas.core.frame.DataFrame'>
150


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
148,6.2,3.4,5.4,2.3,virginica
125,7.2,3.2,6.0,1.8,virginica
100,6.3,3.3,6.0,2.5,virginica
39,5.1,3.4,1.5,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
118,7.7,2.6,6.9,2.3,virginica
126,6.2,2.8,4.8,1.8,virginica
33,5.5,4.2,1.4,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
52,6.9,3.1,4.9,1.5,versicolor


In [2]:
# Next, we will prep some enrichments that we can apply to our flower data
# We want to map the species of iris to a color so we create a dict.

colormap = {'setosa': 'red',
            'versicolor': 'green',
            'virginica': 'blue'}

# Then we create a list of colors... one color for each row in the flowers DataFrame.
# Based on the species in the species column, we insert the appropriate color
# into our list.

colors = [colormap[x] for x in flowers['species']]

# To see the results of our work, let's look at a sampling of the colors list.

print(len(colors))
colors[::10]

150


['red',
 'red',
 'red',
 'red',
 'red',
 'green',
 'green',
 'green',
 'green',
 'green',
 'blue',
 'blue',
 'blue',
 'blue',
 'blue']

In [3]:

p = figure(title = "Iris Morphology")

p.xaxis.axis_label = 'Petal Length'
p.yaxis.axis_label = 'Petal Width'

p.circle(flowers["petal_length"],      # x values
         flowers["petal_width"],       # y values
         color=colors, 
         fill_alpha=0.2, 
         size=10)

In [4]:
output_notebook()
show(p)

# http://bokeh.pydata.org/en/latest/docs/gallery/iris.html

# Workflow
---

As we look at the remaining examples, and as you create your own, a generic, but useful workflow is listed below. Following a workflow such as this, helps you to focus on one task at a time and create robust and easy to understand scripts.

1. Import libraries
1. Generate a figure
1. Create or load data and enrichments
1. Generate glyphs
1. Add attributes, annotations, interactions
1. Create outputs

**NOTES**:

* You may not need some steps in the above workflow, especially for barebones charts.
* Conversely, your charts may be very sophisticated and you may need to expand on the above workflow. 
* This is intended as a starting point. 
* Your mileage may vary (YMMV)!


# Histogram: with pdf and cdf curves
---

In [5]:
# IMPORTS ---------------------------------------

import numpy as np
import scipy.special

from bokeh.layouts import gridplot
from bokeh.plotting import figure, show, output_notebook

In [6]:
# GENERATE FIGURE -----------------------------------------

h = figure(title="Normal Distribution (μ=0, σ=0.5)",
            tools="save",
            background_fill_color="whitesmoke")

# Notice this creates a figure() object

h

```
h?
```

versus

```
h??
```

In [8]:
h??

In [9]:
# CREATE OR LOAD DATA/ENRICHMENTS -------------------------

# set the midpoint and spread of the data
mu, sigma = 0, 0.5

# draw a 1000 random samples from a normal distribution 
measured = np.random.normal(mu, sigma, 1000)

# compute the histogram of a set of data.
#     hist is an array of the y-values of the histogram points
#     edges is an array of the 'sides' of the rectangles as 
#     defined across the x-axis
hist, edges = np.histogram(measured, density=True, bins=50)

# create an array of x-values: 1000 values between -2 and 2
x = np.linspace(-2, 2, 1000)

# calculate the points for a pdf and a cdf based on:
#     * mu
#     * sigma
#     * the values of x
pdf = 1/(sigma * np.sqrt(2*np.pi)) * np.exp(-(x-mu)**2 / (2*sigma**2))
cdf = (1+scipy.special.erf((x-mu)/np.sqrt(2*sigma**2)))/2

print('MEASURED: ', measured[:10])
print()
print('HIST: ', len(hist), hist[:10])
print()
print('EDGES: ', len(edges), edges[:10])
print()
print('-' * 60, end='\n\n')
print('PDF: ', pdf[:10])
print()
print('CDF: ', cdf[:10])


MEASURED:  [-0.17877429 -0.37133468  0.04950594 -0.66680592  0.74042919  0.16143088
 -0.10899642  0.32055966 -0.08447807 -0.06383199]

HIST:  50 [ 0.01507805  0.          0.          0.          0.01507805  0.01507805
  0.03015609  0.01507805  0.01507805  0.        ]

EDGES:  51 [-1.87491651 -1.80859491 -1.74227332 -1.67595172 -1.60963013 -1.54330853
 -1.47698694 -1.41066534 -1.34434374 -1.27802215]

------------------------------------------------------------

PDF:  [ 0.00026766  0.00027636  0.00028533  0.00029457  0.00030409  0.0003139
  0.000324    0.00033441  0.00034513  0.00035617]

CDF:  [  3.16712418e-05   3.27602929e-05   3.38847210e-05   3.50456008e-05
   3.62440370e-05   3.74811655e-05   3.87581535e-05   4.00762007e-05
   4.14365401e-05   4.28404387e-05]


# Experience Points
---

1. In the code cell below, use `?` to explore the `quad()` function assocated with the figure `h`
1. What is the default fill_color?
1. What does the characteristic `alpha` do?
1. How does `alpha` contrast with `line_alpha`?

In [None]:
# Let's look into the quad() method of our figure:



In [13]:
# GENERATE GLYPHS -----------------------------------------

h.quad(top=hist,
       bottom=0,
       left=edges[:-1],
       right=edges[1:],
       fill_color='yellow', alpha=0.9, line_color="orange")


In [14]:
# GENERATE GLYPHS (cont.)-----------------------------------

h.line(x, pdf, line_color="black", line_width=3, alpha=0.7, legend="PDF")
h.line(x, cdf, line_color="red", line_width=2, alpha=0.7, legend="CDF")

In [15]:
# ADD ATTRIBUTES, ANNOTATIONS, INTERACTIONS ---------------

h.legend.location = "top_left"
h.xaxis.axis_label = 'x'
h.yaxis.axis_label = 'Pr(x)'

In [16]:
# CREATE OUTPUTS ------------------------------------------
output_notebook()
show(h)

# http://bokeh.pydata.org/en/latest/docs/gallery/histogram.html

# Using interactive sliders
---

In [26]:
# IMPORTS -------------------------------------------------

import numpy as np

from bokeh.layouts import row, widgetbox

from bokeh.models import CustomJS, Slider

from bokeh.plotting import figure, output_notebook, show, ColumnDataSource

# Experience Points
---

# Try to identify the purpose of these two functions:

1. CustomJS
1. Slider

In [27]:
Slider?

In [28]:
# GENERATE FIGURE -----------------------------------------
plot = figure(y_range=(-10, 10), plot_width=400, plot_height=400)

In [29]:
# CREATE OR LOAD DATA/ENRICHMENTS -------------------------
x = np.linspace(0, 10, 500)
y = np.sin(x)
source = ColumnDataSource(data=dict(x=x, y=y))

In [30]:
# GENERATE GLYPHS -----------------------------------------
plot.line('x', 'y', source=source, line_width=3, line_alpha=0.6)

In [31]:
# ADD ATTRIBUTES, ANNOTATIONS, INTERACTIONS ---------------
# I am not a javascript guru.
# ... you are on your own in terms of making javascript


callback = CustomJS(args=dict(source=source), code="""
    var data = source.data;
    var A = amp.value;
    x = data['x']
    y = data['y']
    for (i = 0; i < x.length; i++) {
        y[i] = A*Math.sin(x[i]);
    }
    source.trigger('change');
    """)

amp_slider = Slider(start=0.1, end=10, value=1, step=.1,
                    title="Amplitude",
                    callback=callback)

callback.args["amp"] = amp_slider

layout = row(plot,
             widgetbox(amp_slider))

In [32]:
# CREATE OUTPUTS ------------------------------------------
output_notebook()
show(layout)

# http://bokeh.pydata.org/en/latest/docs/gallery/slider.html

# Using visualization tools
---

In [34]:
# IMPORTS -------------------------------------------------

import numpy as np
from bokeh.plotting import figure, show, output_notebook

In [37]:
# CREATE OR LOAD DATA/ENRICHMENTS -------------------------

number = 4000
x = np.random.random(size=number) * 100     # random yields [0.0, 1.0) OR 0.0 <= x < 1.0
y = np.random.random(size=number) * 100

radii = np.random.random(size=number) * 1.5

# The following string formatting produces a hex number:
# %02x yields a zero padded hex number when you provide an integer
#     "#%02x" % (int(10)) would yield #0a
#     "#%02x%02x%02x" % (int(10), int(11), 150) would yield #0a0b96

colors = ["#%02x%02x%02x" % (int(r), int(g), 150) for r, g in zip(50+2*x, 30+2*y)]

In [40]:
# GENERATE FIGURE -----------------------------------------

TOOLS="crosshair,hover,pan,wheel_zoom,zoom_in,zoom_out,box_zoom,undo,redo,reset,tap,save,box_select,lasso_select"

p = figure(tools=TOOLS)

# GENERATE GLYPHS -----------------------------------------

p.scatter(x, y, radius=radii,
          fill_color=colors, fill_alpha=0.6,
          line_color=None)


# CREATE OUTPUTS ------------------------------------------

output_notebook()
show(p)  # open a browser

# http://bokeh.pydata.org/en/latest/docs/gallery/color_scatter.html

# Experience Points
---

Edit the code cell above to experiment with tool settings:

1. Add an **additional** `pan` tool to the TOOLS string and execute the code
   * Consider the warning you get
   * Look at the toolbar, anyway
   * Remove the extra `pan` tool
1. Add the tool: `poly_select` to the TOOLS string and execute the code
1. Add the tool: `lasso_select` to the TOOLS string and execute the code

# Geographical map
---

In [None]:
# For larger datasets, we will need to download the data 

# The data will be stored in the following location:
# $home_directory/.bokeh/data

import bokeh
bokeh.sampledata.download()

In [41]:
# IMPORTS -------------------------------------------------

from bokeh.io import show
from bokeh.models import (
    ColumnDataSource,
    HoverTool,
    LogColorMapper
)

from bokeh.palettes import Viridis6 as palette
from bokeh.plotting import figure, output_notebook


from bokeh.sampledata.us_counties import data as counties
from bokeh.sampledata.unemployment import data as unemployment


In [42]:
# GENERATE FIGURE -----------------------------------------

TOOLS = "pan,wheel_zoom,box_zoom,reset,hover,save"

texas = figure(title="Texas Unemployment, 2009", 
               tools=TOOLS, 
               x_axis_location=None,
               y_axis_location=None)

texas.grid.grid_line_color = None


In [43]:
# Let's understand the data:
# Looking at just the first 20 keys from the counties data, we see that they are 
# tuples with the first number equal to 48
# the second number is an odd number

print(sorted(counties.keys())[:20])       
print()

# Looking at just one set of data for the county 
# coded with tuple 48, 3, we see that the data is a state, some lats and longs and some names.

print(counties[(48,3)])

[(1, 1), (1, 3), (1, 5), (1, 7), (1, 9), (1, 11), (1, 13), (1, 15), (1, 17), (1, 19), (1, 21), (1, 23), (1, 25), (1, 27), (1, 29), (1, 31), (1, 33), (1, 35), (1, 37), (1, 39)]

{'lats': [32.52315, 32.52309, 32.52308, 32.52307, 32.52304, 32.52311, 32.52328, 32.52293, 32.52298, 32.52305, 32.52333, 32.52334, 32.52325, 32.52316, 32.52319, 32.52324, 32.52223, 32.49588, 32.46638, 32.42405, 32.36164, 32.32856, 32.32684, 32.32369, 32.30704, 32.29004, 32.26794, 32.25861, 32.25666, 32.25145, 32.24007, 32.23176, 32.22406, 32.22033, 32.2129, 32.20843, 32.19757, 32.18818, 32.17704, 32.14543, 32.09943, 32.08746, 32.0868, 32.0868, 32.0868, 32.0868, 32.0868, 32.0868, 32.0868, 32.0868, 32.08681, 32.08682, 32.08683, 32.08685, 32.08685, 32.08685, 32.08686, 32.08686, 32.08687, 32.08691, 32.08692, 32.08693, 32.08694, 32.08694, 32.08695, 32.08696, 32.08698, 32.08699, 32.08699, 32.08699, 32.08709, 32.08717, 32.08717, 32.08696, 32.08702, 32.08697, 32.08694, 32.08696, 32.08699, 32.087, 32.08701, 32.08702, 32.0

In [44]:
# Looking at the data for unemployment:
# Each key is a tuple, some of which match the tuples for the counties
# Each tuple is paired with an unemployment value.

for key, value in sorted(unemployment.items()):
    if key[0] == 48:
        print('{}: {}'.format(key, value))


(48, 1): 9.4
(48, 3): 7.6
(48, 5): 8.9
(48, 7): 7.3
(48, 9): 6.5
(48, 11): 5.1
(48, 13): 7.9
(48, 15): 8.4
(48, 17): 5.2
(48, 19): 6.8
(48, 21): 8.0
(48, 23): 5.1
(48, 25): 10.3
(48, 27): 7.1
(48, 29): 7.2
(48, 31): 5.8
(48, 33): 6.2
(48, 35): 8.6
(48, 37): 7.8
(48, 39): 8.9
(48, 41): 6.2
(48, 43): 5.1
(48, 45): 5.9
(48, 47): 9.9
(48, 49): 7.2
(48, 51): 7.4
(48, 53): 6.3
(48, 55): 8.2
(48, 57): 9.5
(48, 59): 6.3
(48, 61): 10.8
(48, 63): 9.6
(48, 65): 6.7
(48, 67): 12.5
(48, 69): 5.3
(48, 71): 10.7
(48, 73): 9.7
(48, 75): 6.4
(48, 77): 7.4
(48, 79): 7.2
(48, 81): 8.6
(48, 83): 6.9
(48, 85): 7.8
(48, 87): 5.9
(48, 89): 6.6
(48, 91): 6.6
(48, 93): 6.1
(48, 95): 8.2
(48, 97): 6.5
(48, 99): 8.7
(48, 101): 5.6
(48, 103): 9.5
(48, 105): 9.7
(48, 107): 7.0
(48, 109): 4.5
(48, 111): 4.4
(48, 113): 8.7
(48, 115): 8.7
(48, 117): 5.8
(48, 119): 8.4
(48, 121): 7.7
(48, 123): 8.3
(48, 125): 6.0
(48, 127): 11.0
(48, 129): 7.0
(48, 131): 12.5
(48, 133): 8.4
(48, 135): 9.2
(48, 137): 6.7
(48, 139): 8.6

In [45]:
# CREATE OR LOAD DATA/ENRICHMENTS -------------------------

palette.reverse()

counties = {
    code: county for code, county in counties.items() if county["state"] == "tx"
}

county_xs = [county["lons"] for county in counties.values()]
county_ys = [county["lats"] for county in counties.values()]

county_names = [county['name'] for county in counties.values()]
county_rates = [unemployment[county_id] for county_id in counties]


# LogColorMapper will map numbers in a range [low, high] against a
# sequence of colors (i.e. a palette) on a natural logarithm scale.

color_mapper = LogColorMapper(palette=palette)

# ColumnDataSource() maps the names of columns to sequences or arrays,
# often using a dictionary OR DataFrame. Here we map all the values 
# of county_xs to a variable called x, etc.

source = ColumnDataSource(data=dict(
    x=county_xs,
    y=county_ys,
    name=county_names,
    rate=county_rates,
))


In [46]:
# GENERATE GLYPHS -----------------------------------------

# Here, we are referring to the 'x' and 'y' data found in the 
# ColumnDataSource we just created.
# Note that we use the string 'x' NOT x

texas.patches('x', 'y', source=source,
              fill_color={'field': 'rate', 'transform': color_mapper},
              fill_alpha=0.7, line_color="white", line_width=0.5)


In [47]:
# ADD ATTRIBUTES, ANNOTATIONS, INTERACTIONS ---------------

hover = texas.select_one(HoverTool)

hover.point_policy = "follow_mouse"

hover.tooltips = [("Name", "@name"),
                  ("Unemployment rate)", "@rate%"),
                  ("(Long, Lat)", "($x, $y)"),]


In [48]:
# CREATE OUTPUTS ------------------------------------------

output_notebook()
show(texas)

# http://bokeh.pydata.org/en/latest/docs/gallery/texas.html

# Experience Points
---

1. On the Internets, research the syntax for generating tool tips.


## Navigation
---

| Previous | Up | Next |
|:-----|:-----:|-----:|
| <<< [First Graph](./first_graph.ipynb) | [Table of Contents](./README.md) | [What Went Wrong](./what_went_wrong.ipynb) >>> |