# Auxiliary tutorial 2: Introduction to Bokeh

(c) 2016 Justin Bois. This work is licensed under a [Creative Commons Attribution License CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/). All code contained herein is licensed under an [MIT license](https://opensource.org/licenses/MIT).

*This tutorial was generated from an Jupyter notebook.  You can download the notebook [here](aux2_intro_to_bokeh.ipynb).*

In [1]:
# Our numerical workhorses
import numpy as np
import pandas as pd

# Import Bokeh modules for interactive plotting
import bokeh.charts
import bokeh.charts.utils
import bokeh.io
import bokeh.models
import bokeh.palettes
import bokeh.plotting

# Display graphics in this notebook
bokeh.io.output_notebook()

In this tutorial, we will explore browser-based interactive plotting using [Bokeh](http://bokeh.pydata.org/).  It is important that you are using the latest version of Bokeh, v. 0.12.2.  After importing, verify that this is the case.

In [2]:
bokeh.__version__

'0.12.2'

If we do not have the most recent version, you can update it:

    conda update bokeh
    

Why is it so important to use the most recent version?  Bokeh is currently in very active development.  It is certainly not feature-full yet, and there are lots and lots of features slated to be added.

For browser-based interactive data visualization, [D3.js](http://d3js.org) is the most widely used and feature-full.  However, it is a lower level package, and requires writing JavaScript.  Bokeh, like Shiny(http://shiny.rstudio.com) for R, and others, is an attempt to bring the type of functionality D3 offers, using high level languages like Python.  In other words, the goal is that you can achieve browser-based interactive data visualizations with few lines of code. 

[Datashader](http://datashader.readthedocs.io/en/latest/#) is a great add-on on top of Bokeh that enables visualization of very large data sets.

### Why browser-based interactive data visualization?

I think the interactive part is easy to answer.  The more you can interact with your data, particularly during the exploratory phase of data analysis, the more you can learn.  When doing exploratory data analysis, we typically make lots and lots of plots to see patterns.  If we can expedite this process, we can be more efficient and effective in our analysis.

Why browser-based?  There are two simple answers to this.  First, everyone has one, and they are relatively standardized.  This makes your graphics *very* portable.  Second, there are lots of tools for efficiently rendering graphics in browsers.  Bokeh uses [HTML5 canvas elements](https://en.wikipedia.org/wiki/Canvas_element) to accomplish this.  These tools are mature and stable, thereby making backend rendering of the graphics easy.

## Data for this tutorial

We will use the tidy `DataFrame`s from the first couple weeks of class as we explore Bokeh's features and do some interactive visualizations.  So, let's load in the `DataFrame`s now.

In [3]:
# The frog data from tutorial 1a
df_frog = pd.read_csv('../data/frog_tongue_adhesion.csv', comment='#')

# The MT catastrophe data
df_mt = pd.read_csv(
    '../data/gardner_et_al_2011_time_to_catastrophe_dic.csv',
    comment='#')

# These were generated in tutorial 2a
df_fish = pd.read_csv('../data/130315_10_minute_intervals.csv')

Before moving on, we'll go ahead and tidy the MT catastrophe `DataFrame`.

In [4]:
# Tidy MT_catastrophe DataFrame
df_mt.columns = ['labeled', 'unlabeled']
df_mt = pd.melt(df_mt, var_name='fluor', value_name='tau').dropna()

## High-level charts

Perhaps the easiest way to get started with Bokeh is to use its high-level charts.  These allow for rapid plotting of data coming from Pandas `DataFrame`s, much like the plotting utilities in Pandas itself.

### Line plot
We'll start with a simple line plot of zebrafish sleep data.

In [5]:
# Pull out fish record
df_fish2 = df_fish[df_fish['fish']==2]

# Use Bokeh chart to make plot
p = bokeh.charts.Line(df_fish2, x='zeit', y='activity', height=300, 
                      color='dodgerblue')

# Display it
bokeh.io.show(p)

There are many things to note here.  First, and most obviously, you can play with the various tools.  You can select the tools in the upper right corner of the plot. Hovering over an icon will reveal what the tool does.

When we instantiate the  `bokeh.charts.Line` object, we plot is returned, which we assigned to variable `p`.  We can further modify/add attributes to this object.  Importantly, the `bokher.io.show()` function displays the object.  We have specified that the graphics will be shown in the current notebook with our import statements.  We can also export the plot as its own standalone HTML document.  We won't do it here, but simply put

    bokeh.plotting.output_file('filename.html')
    
before the `bokeh.io.show(p)` function call.

Note also that we chose a color of "`dodgerblue`."  We can choose any of the [named CSS colors](http://www.w3schools.com/cssref/css_colornames.asp), or specify a hexadecimal color. Further, we specified the height of the plot in pixels using the `height` kwarg. We could also specify the width using the `width` kwarg, but let it as the default here.  Notice also that the axes were automatically labeled with the column headings of the `DataFrame`.  We can specify the axis labels with keyword arguments as well.

In [6]:
# Use Bokeh chart to make plot
p = bokeh.charts.Line(df_fish2, x='zeit', y='activity', height=300,
                      color='dodgerblue', xlabel='time (h)', 
                      ylabel='sec of activity / 10 min')

# Display it
bokeh.io.show(p)

We can also put multiple lines on the same plot.

In [7]:
# Select three fish to plot
df_fish_multi = df_fish[df_fish['fish'].isin([1, 12, 23])]

# Use Bokeh chart to make plot
p = bokeh.charts.Line(df_fish_multi, x='zeit', y='activity', height=300,
                      color='fish', xlabel='time (h)', 
                      ylabel='sec of activity / 10 min', legend="top_left")

# Display it
bokeh.io.show(p)

### Box plots

Bokeh's high-level charts interface also allows for easy construction of box plots.  As an example, we'll make box plots of the striking force of the frog tongues.

In [8]:
# Use Bokeh chart to make plot
p = bokeh.charts.BoxPlot(df_frog, values='impact force (mN)', label='ID',
                        color='ID', height=400, xlabel='frog', 
                        ylabel='impact force (mN)', legend=None)

# Display it
bokeh.io.show(p)

Pretty slick, just like Seaborn. There is currently no support for beeswarm plots in Bokeh, but we can make jitter plots, as I demonstrate below.

### Scatter plots

We can also make scatter plots.  As a useful feature, we can color the points in the scatter plot according to values in the `DataFrame`.

In [9]:
# Use Bokeh chart to make plot
p = bokeh.charts.Scatter(df_frog, x='impact force (mN)', y='adhesive force (mN)',
                         color='ID', height=400, width=500,
                         ylabel='adhesive force (mN)', xlabel='impact force (mN)',
                         legend='top_right')

# Display it
bokeh.io.show(p)

### Histograms
And, of course, we can do histograms.  We'll use the microtubule catastrophe data to do that.

In [10]:
# Use Bokeh chart to make plot
p = bokeh.charts.Histogram(df_mt, values='tau', color='fluor',
                           bins=20, height=400, width=500, 
                           xlabel='τ (seconds)', ylabel='count',
                           legend='top_right')

# Display it
bokeh.io.show(p)

## More control with the `plotting` interface

Bokeh's `charts` interface is useful for quickly making plots from `DataFrame`s, but the lower level `bokeh.plotting` interface allows more control over the plots.  For example, we'll use my favorite background fill with white grid for our plot.

In [11]:
# Set up the figure (this is like a canvas you will paint on)
p = bokeh.plotting.figure(plot_width=650, plot_height=450, 
                          x_axis_label='Impact force (mN)',
                          y_axis_label='Adhesive force (mN)')

# Specify the glyphs
p.circle(df_frog['impact force (mN)'], df_frog['adhesive force (mN)'], size=7,
         alpha=0.5)

bokeh.io.show(p)

We can also add multiple glyphs to the same plot.

In [12]:
def ecdf(data):
    return np.sort(data), np.arange(1, len(data)+1) / len(data)

# Compute ECDFs
x_lab, y_lab = ecdf(df_mt.loc[df_mt.fluor=='labeled','tau'])
x_unlab, y_unlab = ecdf(df_mt.loc[df_mt.fluor=='unlabeled','tau'])

# Set up our figure to paint the data on
p = bokeh.plotting.figure(width=650, height=450, x_axis_label='τ (s)',
                         y_axis_label='ECDF')

# Specify the glyphs
p.circle(x_lab, y_lab, size=7, alpha=0.75, legend='labeled',
         color='dodgerblue')
p.circle(x_unlab, y_unlab, size=7, alpha=0.75, legend='unlabeled',
         color='tomato')
p.legend.location = 'bottom_right'

bokeh.io.show(p)

We can also exercise this increased control with the fish activity data.  First, we'll write a small function to get the starting and ending points of nights.

In [13]:
def nights(df):
    """
    Takes light series from a single fish and gives the start and end of nights.
    """
    lefts = df.zeit[np.where(np.diff(df.light.astype(int)) == -1)[0]].values
    rights = df.zeit[np.where(np.diff(df.light.astype(int)) == 1)[0]].values
    return lefts, rights

Now that we have this function, we can proceed to make our nicely shaded plot.

In [15]:
# Create figure
p = bokeh.plotting.figure(width=650, height=450, x_axis_label='time (hours)',
                          y_axis_label='sec. of activity / 10 min.')

# Specify colors
colors = ['dodgerblue', 'tomato', 'indigo']

# Populate glyphs
for i, fish in enumerate([11, 12, 23]):
    source = bokeh.models.ColumnDataSource(df_fish[df_fish['fish']==fish])
    p.line(source=source, x='zeit', y='activity', line_width=0.5, alpha=0.75,
           color=colors[i], line_join='round')

# Determine when nights start and end
lefts, rights = nights(df_fish[df_fish.fish==1])

# Make shaded boxes for nights
night_boxes = []
for i, left in enumerate(lefts):
    night_boxes.append(
            bokeh.models.BoxAnnotation(plot=p, left=left, right=rights[i], 
                                       fill_alpha=0.6, fill_color='gray'))
p.renderers.extend(night_boxes)

bokeh.io.show(p)

Notice how we used the kwarg `line_join='round'`. By default, making a line plot with `bokeh.charts.Line()` joins line segments that are mitered, giving the sharp points, some of which dip below zero, that you saw before. I prefer `line_join='round'`, which does not have this problem.

#### Specifying tools
Using the `bokeh.plotting` interface, we can also specify which tools we want available.  For example, we can add a `HoverTool` that will give information about each data point if we hover the mouse over it.

In [17]:
# Eliminate spaces from column headings to allow tooltip to work
df_frog = df_frog.rename(columns={'impact force (mN)': 'impf',
                                  'adhesive force (mN)': 'adhf'})

# Specify data source
source = bokeh.models.ColumnDataSource(df_frog)

# What pops up on hover?
tooltips = [('frog', '@ID'),
           ('imp', '@impf'),
           ('adh', '@adhf')]

# Make the hover tool
hover = bokeh.models.HoverTool(tooltips=tooltips)

# Create figure
p = bokeh.plotting.figure(plot_width=650, plot_height=450, 
                          x_axis_label='Impact force (mN)',
                          y_axis_label='Adhesive force (mN)')

# Add the hover tool
p.add_tools(hover)

# Populate glyphs
p.circle(x='adhf', y='impf', size=7, alpha=0.5, source=source)

bokeh.io.show(p)

#### Linking subplots
Bokeh also has the wonderful capability of linking subplots.  The key here is to specify that the plots have the same ranges of the $x$ and $y$ variables. To do this, we just have to specify the `x_range` and `y_range` properties of plots to be the same.

In [26]:
# Determine when nights start and end
lefts, rights = nights(df_fish[df_fish.fish==1])

# Specify colors
colors = ['dodgerblue', 'tomato', 'indigo']

# Create figures
ps = [bokeh.plotting.figure(plot_width=650, plot_height=250, 
                            x_axis_label='time (h)',
                            y_axis_label = 'sec of activity / 10 min',
                            tools='pan,box_zoom,wheel_zoom') 
          for i in range(3)] 

# Link ranges (enable linked panning/zooming)
for i in (1, 2):
    ps[1].x_range = ps[0].x_range
    ps[2].x_range = ps[0].x_range
    ps[1].y_range = ps[0].y_range
    ps[2].y_range = ps[0].y_range
        
# Populate glyphs
for i, fish in enumerate([11, 12, 23]):
    # Put in line
    x = df_fish.loc[df_fish.fish==fish, 'zeit']
    y = df_fish.loc[df_fish.fish==fish, 'activity']
    ps[i].line(x=x, y=y, line_width=1, color=colors[i])
    
    # Label with title
    ps[i].title.text = 'Fish' + str(fish)
        
    # Make shaded boxes for nights
    night_boxes = []
    for j, left in enumerate(lefts):
        night_boxes.append(
                bokeh.models.BoxAnnotation(plot=ps[i], left=left, right=rights[j], 
                                           fill_alpha=0.3, fill_color='gray'))
    ps[i].renderers.extend(night_boxes)
        
my_plot = bokeh.models.layouts.Column(*tuple(ps))
        
bokeh.io.show(my_plot)

## Visualizing large data sets with DataShader