# The Bokeh Library

## Introduction

Analyzing any given set of data is an integral part of data science. But an even more important aspect of data science is effectively conveying your findings from data to others! To that extent, static graphical visualizations like bar charts, line graphs, and scatter plots are a very common means of conveying observations. However, interactive graphical visualization provides additional dimensions for explaining findings that can help put the static graphical visualizations further in context and means of personalizing the explanation of the data for each audience. 

For example, consider the all too familiar scenario in which a student is trying to choose between two classes for the following semester, class A and class B where each class has two possible sections. So the student's class options are class A with professor X, class A with professor Y, class B with professor Z, or class B with professor W. That leaves the student with lots of possible combinations of professors and classes to compare before making a final decision. Although there may be useful individual feedback available for each class or professor like so:
<img src="https://coedmagazine.files.wordpress.com/2014/10/ratemyprofessor-top-rankings-h.jpg" width="250">
It would be helpful for the student to compare classes or professors. So to make this visualization useful to all students, it would be helpful to have each student have a personalizable comparison option, a form of personalized interactive visualization.

Here is another example of the usefulness of interactive visualization. Consider the following graph: 
<img src="http://minimaxir.com/img/online-class-charts/class-attendance.png" width="450">
Although this graph contains plenty of useful information about student attendance, it would be helpful to know how these attendance statistics have changed over the years to try and figure out what may be the cause of this data which is achievable with interactive graphs.

## Table of Contents

1. [Installation](#installation)
2. [The Plot](#the-plot)
3. [Glyphs](#glyphs)
4. [Properties](#properties)
5. [Basic Interactivity](#interactivity)
6. [Bokeh Server](#server)
7. [Example Application: Chocolate Bar Ratings](#app1)
8. [Example Application: Zomato Restaurants](#app2)
9. [Summary](#summary)
10. [Resources](#resources)

## Installation
<a id='installation'></a>

Before we get started with the interactive graphics, we need to install the Bokeh library. In order to install Bokeh and its dependencies using Anaconda, run the following command in terminal: ```coda install bokeh```

If you do not use Anaconda, the Bokeh library and its dependencies can also be installed using the Python package manager (pip) via: ```pip install bokeh```
 
To ensure that your installation process completed successfully, make sure that the following import runs without errors:

In [6]:
from bokeh.plotting import figure, output_notebook, show

## The Plot
<a id='the-plot'></a>

Now that the library is set up, we can start making some graphs. The first step to creating a Bokeh graph is specifying where the graph should output to. If you want the graph to output to a file, it is important to import ```output_file``` from ```bokeh.plotting```, but this tutorial will output the graphs within the notebook and thus, uses ```output_notebook```. Next comes the most important part of graphs in Bokeh, the "plot" or figure which is essentially a container for all the components that will make up a graph. The plot container can be initialized using the ```figure``` constructor as shown below:

In [7]:
output_notebook()
p = figure()

The title and axes labels are often the first components of a graph that the audience looks to for guidance in understanding the contents of the graph. The title, axes labels, and other guiding text can be directly applied on the plot container during its initialization like so:

In [13]:
p = figure(
    title="Title Placeholder",
    x_axis_label="X Axis Label Placeholder",
    y_axis_label="Y Axis Label Placeholder"
)

Although this properly sets the title and axes labels, the Bokeh library doesn't render the plot properly since it doesn't contain any data to show so far. So, let's take a look at how to add in the data component of the graph into the plot.

## Glyphs
<a id='glyphs'></a>

In order to insert the data into the plot, Bokeh uses what are called glyphs or the basic visual components of the graph. There are many kinds of useful glyphs`**` that determine how the data is represented within the graph, but we will cover some of the most commonly used glyphs here: line, circle, vbar/hbar. For all of these glyph components, the x and/or y coordinate data should be specified as list(s) of values like shown below. Viewing the plot container after setting all the glyph components with the appropriate data can be done using the ```show``` function as demonstrated below.

#### The Line Glyph

This glyph functions similar to a line graph in that every consecutive pair of (x,y) coordinates are connected by a line. For example, take a look at the ```x^2``` function plotted below:

In [22]:
# Generate random data
import numpy as np
x = np.random.randint(low=1, high=1000, size=100)
x.sort()
y = [i**2 for i in x]

# Line Glyph
output_notebook()
p_line = figure(
    title="Line Glyph Example",
    x_axis_label="Random Sorted X",
    y_axis_label="X^2"
)
p_line.line(x,y)
show(p_line)

#### The Circle Glyph

This glyph allows creation of scatter plots with the data specified by (x,y) coordinates marked by circles. For example, see the random data scatter plot shown below:

In [23]:
# Generate random data
x = np.random.randint(low=1, high=1000, size=100)
y = np.random.randint(low=1, high=1000, size=100)

# Circle Glyph
output_notebook()
p_circle = figure(
    title="Circle Glyph Example",
    x_axis_label="Random X",
    y_axis_label="Random Y"
)
p_circle.circle(x,y)
show(p_circle)

#### The Vertical Bar (vbar) and Horizontal Bar (hbar) Glyphs

As the names imply, the vertical bar glyph is used to create a vertical bar chart with the data values determined by the x-coordinates and ```top``` coordinates specifying the height of the bars. Similarly, the horizontal bar glyph is used to create a horizontal bar chart with the data values given by the y-coordinates and ```right``` coordinates specifying the length of the bars. Both vbar and hbar also require a specification of the width of each bar using the ```width``` component for vbar and the ```height``` component for hbar. For example, see the bar charts created below.

In [29]:
# Generate random data
x = [i+1 for i in range(20)]
y = [i*2 for i in x]
top = np.random.randint(low=1, high=1000, size=20)
right = np.random.randint(low=1, high=1000, size=20)

# Vertical Bar (vbar) Glyph
output_notebook()
p_vbar = figure(
    title="VBar Glyph Example",
    x_axis_label="X",
    y_axis_label="Random Height"
)
p_vbar.vbar(x, top=top, width=0.5)
show(p_vbar)

# Horizontal Bar (hbar) Glyph
output_notebook()
p_hbar = figure(
    title="HBar Glyph Example",
    x_axis_label="Y",
    y_axis_label="Random Length"
)
p_hbar.hbar(y, right=right, height=0.5)
show(p_hbar)

Bokeh also makes it possible to overlay multiple glyphs into the same plot! This can be useful in the case of having multiple data series plotted together or even just marking the particular data points on an existing line graph. When combining glyphs, it is also important to have a legend specifying what each of the glyphs represent. Creating the legend is made very simple in Bokeh by simply having a legend parameter for each glyph to associate the glyph with a legend label which is then automatically compiled into a legend for the plot.

In [37]:
# Generate multiple data sets for layering
x = np.random.randint(low=1, high=1000, size=100)
x.sort()
y1 = [i for i in x]
y2 = [4*i + 3 for i in x]
y3 = [(-2*i) + 4 for i in x]
y_bar = [i+1 for i in range(20)]
x_bar1 = [i*2 for i in y_bar]
x_bar2 = [4*i - 3 for i in y_bar]

output_notebook()
p_layered = figure(
    title="Combination of Line & Circle Glyphs",
    x_axis_label="X Coordinate",
    y_axis_label="Y Coordinate"
)
p_layered.line(x, y1, legend="y=x")
p_layered.line(x, y2, legend="y=4x+3")
p_layered.circle(x, y2, legend="y=4x+3")
p_layered.circle(x, y3, legend="y=-2x+4")
show(p_layered)

output_notebook()
p_multibar = figure(
    title="Combination of Bar Glyphs",
    x_axis_label="Bar Length",
    y_axis_label="Y Coordinate"
)
p_multibar.hbar(y_bar, right=x_bar1, legend="x = 2y", height = 0.5)
p_multibar.hbar(y_bar, right=[-1*i for i in x_bar2], legend="x = -(4y-3)", height = 0.5)
show(p_multibar)

`**`If you are interested in the other kinds of possible glyphs, feel free to explore the Bokeh documentation given in the additional resources.

## Properties
<a id='properties'></a>

When looking at a complex data set, it is often helpful to know the underlying groupings or other categorical features of the data being observed. Maybe a particular feature of the data points makes them clustered together. Maybe the outliers in the data come from a particular source. Thus, to convey this additional information in a plot, we can use dimensions such as color of the glyph, size of the data markers, shape of the data markers, and solid vs. dashed lines. In addition to conveying more information, these customizations can also make it easier to differentiate glyphs that are combined into the same plot.

In order to set the color on a particular glyph, one option is to apply a color to the entire glyph and thus, make it easy to differentiate multiple overlayed glyphs. This can be done by specifying a color parameter like so:

In [43]:
# Glyph color - bars
y_bar = [i+1 for i in range(20)]
x_bar1 = [i*2 for i in y_bar]
x_bar2 = [4*i - 3 for i in y_bar]
output_notebook()
p_multibar = figure(
    title="Combination of Bar Glyphs",
    x_axis_label="Bar Length",
    y_axis_label="Y Coordinate"
)
p_multibar.hbar(y_bar, right=x_bar1, legend="x = 2y", height = 0.5, color="orange")
p_multibar.hbar(y_bar, right=[-1*i for i in x_bar2], legend="x = -(4y-3)", height = 0.5)
show(p_multibar)

x = np.random.randint(low=1, high=1000, size=100)
x.sort()
y1 = [i for i in x]
y2 = [4*i + 3 for i in x]
y3 = [(-2*i) + 4 for i in x]
output_notebook()
p_layered = figure(
    title="Combination of Line & Circle Glyphs",
    x_axis_label="X Coordinate",
    y_axis_label="Y Coordinate"
)
p_layered.line(x, y1, legend="y=x", color="red")
p_layered.line(x, y2, legend="y=4x+3", color="purple")
p_layered.circle(x, y2, legend="y=4x+3", color="purple")
p_layered.circle(x, y3, legend="y=-2x+4")
show(p_layered)

Another option for applying color is to render a color based on a categorical component of the data. In order to do so, we use palettes provided in Bokeh (```bokeh.palettes```) along with a factor mapper provided in Bokeh (```factor_cmap```) to map all the possible factors of the categorical data field to colors in the palette. Finally in order to apply the factor mapping across the entire data, all the data must be compiled into a Column Data Source model which is essentially a container for the data with a dictionary for each data entry with all its components specified as fields of the dictionary. For instance, take a look at the example provided below:

In [53]:
from bokeh.models import ColumnDataSource
from bokeh.palettes import Spectral6
from bokeh.transform import factor_cmap

# Generate data with categorical component
x = np.random.randint(low=1, high=1000, size=100)
y = [2*i + 3 for i in x]
category = [str(i) for i in np.random.randint(low=0, high=5, size=100)]
source = ColumnDataSource(data=dict(x=x, y=y, category=category))

# Glyph color - circles
output_notebook()
p_colmark = figure(
    title="Colored Circle Glyphs",
    x_axis_label="X Coordinate",
    y_axis_label="Y Coordinate"
)
p_colmark.circle('x', 'y', source=source, legend='category',
                 fill_color=factor_cmap('category', palette=Spectral6, factors=[str(i) for i in range(6)]))
show(p_colmark)

Similar to applying color in glyphs, it is possible to convey ordinal information using the data marker size by mapping the size to particular factors. The size unlike the color, however, must be specified in numerical values representing values in screen space units. So, it is important to map the factors to numerical values, for example, like shown below. It is also important to note that since the sizing is changed on the same glyph, specifying the legend keyword will just create a legend with all the circles the same size. Since modifying the sizes of individual glyphs within a custom legend is currently not possible, it is necessary to create a manual annotation in some other way.

In [63]:
# Generate data with categorical component
x = np.random.randint(low=1, high=1000, size=100)
y = [2*i + 3 for i in x]
categories = ["Very Low", "Low", "Moderate", "High", "Very High"]
category = [categories[i] for i in np.random.randint(low=0, high=5, size=100)]

# Manipulating data marker size
source = ColumnDataSource(data=dict(x=x, y=y, size=[categories.index(c) for c in category]))
output_notebook()
p_sizmark = figure(
    title="Sized Circle Glyphs",
    x_axis_label="X Coordinate",
    y_axis_label="Y Coordinate"
)
p_sizmark.circle('x', 'y', source=source, size='size')
show(p_sizmark)

Unlike the coloring of glyphs, data marker shapes are specified by which glyph is added to the plot. Thus, the shape can only be used to differentiate multiple glyphs plotted in the same plot container and cannot be changed dynamically using categorical data. There are many possible glyph shapes including: circle, asterisk, cross, diamond, oval, square, and triangle.

In [70]:
# Generate data with categorical feature
x = np.random.randint(low=1, high=1000, size=100)
x.sort()
y1 = [2*i + 3 for i in x]
y2 = [2*i + np.random.randint(low=-10, high=15) for i in x]
category = [str(i) for i in np.random.randint(low=0, high=5, size=100)]
source2 = ColumnDataSource(data=dict(x=x, y=y2, category=category))

# Multiple glyphs marker shape
output_notebook()
p_multishape = figure(
    title="Multiple Shaped Glyphs",
    x_axis_label="X Coordinate",
    y_axis_label="Y Coordinate"
)
p_multishape.diamond(x, y1, legend="y1")
p_multishape.cross(x, y2, legend="y2")
show(p_multishape)

Another commonly used property for conveying more categorical information within a plot is solidness vs. dashedness and thickness of the line glyph when making line graphs. Since this is a property that is set for the entire line and not just a particular data point, this property can only be used to differentiate multiple glyphs on the same plot like shown below. When setting the ```line_dash``` property, we use a space separated string of two numbers like "i j" to specify a pattern of an i-length solid line followed by a j-length space.

In [75]:
# Solid vs. dashed lines
x = np.random.randint(low=1, high=1000, size=100)
x.sort()
y1 = [i for i in x]
y2 = [4*i + 3 for i in x]
y3 = [(-2*i) + 4 for i in x]
output_notebook()
p_multiline = figure(
    title="Dashedness of Line Glyphs",
    x_axis_label="X Coordinate",
    y_axis_label="Y Coordinate"
)
p_multiline.line(x, y1, legend="y=x", color="red", line_width=2)
p_multiline.line(x, y2, legend="y=4x+3", color="purple")
p_multiline.circle(x, y2, legend="y=4x+3", color="purple")
p_multiline.line(x, y3, legend="y=-2x+4", line_dash="3 1")
show(p_multiline)

## Basic Interactivity
<a id='interactivity'></a>

Now that we know how to create basic, static graphs using the Bokeh library, let's see how to add interactive effects to these graphs to help convey more useful information within the same graph. Although Bokeh provides lots of interactive components that can be used in various different settings, this tutorial will cover some of the basic and more commonly used graph interactions.

##### Select/Zoom Area within Graph

As the name implies, this interactive feature allows us to zoom in on a particular range of the data or select the data points in a particular range of the plot. Selecting/zooming on a particular range is a useful interactive feature because it helps direct the audience's attention to a particular area of the graph. 

If you notice in the plots we created above, the 'Box Zoom' tool is already included by default in the tools at the right-hand side! Thus we can zoom on a particular region by selecting this tool and then drawing a box around the region you want to zoom in on. Once you are zoomed in, it is often helpful to pan the view in order to see other parts of the data or go back to the original view. In order to pan, the default toolbar also provides a 'Pan' tool. So panning can be done by selecting this tool and then simply clicking and dragging on the plot area. To go back to the original view of the plot, the default toolbar provides a 'Reset' tool which resets the plot view when clicked!

Although the tools for selecting a particular area of the plot are not included by default in the toolbar, these can be easily added to the plot toolbar as demonstrated below. There are two possible tools for selection: the box selection tool and the lasso selection tool. As the names imply, the lasso selection allows selecting an arbitrary region while the box selection is restricted to a rectangular area. A really helpful feature of the selection tools is that you can make multiple selections using the SHIFT key and clear selection with the ESC key! Try it out on the sample plot below:

In [79]:
# Generate random data
x = np.random.randint(low=1, high=1000, size=100)
y = np.random.randint(low=1, high=1000, size=100)

# Region Select
output_notebook()
p_select = figure(
    title="Graph Area Select/Zoom Example",
    x_axis_label="Random X",
    y_axis_label="Random Y",
    tools="pan,box_select,lasso_select,box_zoom,reset"
)
p_select.circle(x,y)
show(p_select)

Bokeh also makes it possible to simultaneously apply the selection and zooming tools to multiple linked graphs. Why might this be useful? Consider the case of plotting some stock transaction data. You may have a plot for the number of stocks owned by the person on any given day and you may also have the cost of the stock on those days. However since these data features have different units, they cannot be effectively combined into the same plot. But at the same time when we're focusing on a particular region in the number of stocks plot, we would like to focus on the same region in the stock prices graph. 

The process of linking plots for zooming and selecting involves linking both the ranges of the plots (for zooming) and the underlying data via a Column Data Source (for selection). For example with the stock scenario explained earlier, we would want the day data range to be linked between the graphs as shown below with randomly generated data:

In [90]:
from bokeh.layouts import gridplot

# Generate data
day = [i for i in range(100)]
num_stocks = np.random.randint(low=1, high=50, size=100)
price = np.random.randint(low=1, high=500, size=100)
source = ColumnDataSource(dict(day=day,num_stocks=num_stocks,price=price))

# Linked plots
output_notebook()
p_link1 = figure(
    title="Number of Stocks Owned Example",
    x_axis_label="Day",
    y_axis_label="Number of Stocks",
    tools="pan,box_select,lasso_select,box_zoom,reset",
    width=450, 
    height=450
)
p_link1.circle('day','num_stocks',source=source)

p_link2 = figure(
    title="Price of Stocks Example",
    x_axis_label="Day",
    y_axis_label="Price of Stocks",
    tools="pan,box_select,lasso_select,box_zoom,reset",
    x_range=p_link1.x_range,
    width=450, 
    height=450
)
p_link2.circle('day','price',source=source)
show(gridplot([[p_link1,p_link2]]))

##### Hover Effects

Another very useful tool for conveying information that can be relatively easily specified to be included within the Bokeh toolbar is the hover tool. Once selected, the hover tool displays additionally specified information about each data point as the mouse is hovered over the data point. The actual text displayed by the hover tool can be customized as illustrated below. Within the hover tool specification, the '$' character can be used to refer to internal values of the plot like the x-coordinates and y-coordinates while the '@' character can be used to refer to a column within a Column Data Source.

In [82]:
from bokeh.models import HoverTool

# Generate random data
x = np.random.randint(low=1, high=1000, size=100)
y = np.random.randint(low=1, high=1000, size=100)
categories = ['apple', 'banana', 'pear', 'guava', 'peach', 'mango']
category = [categories[i] for i in np.random.randint(low=0, high=5, size=100)]
source = ColumnDataSource(dict(x=x,y=y,category=category))

# Region Select
output_notebook()
hover_text = HoverTool(tooltips=[
    ("X", "@x"),
    ("Y", "@y"),
    ("Category", "@category")
])
p_hover = figure(
    title="Hover Text Example",
    x_axis_label="Random X",
    y_axis_label="Random Y",
    tools=[hover_text,"pan","box_zoom","reset"]
)
p_hover.circle('x','y',source=source)
show(p_hover)

## Bokeh Server
<a id='server'></a>

The following interactive features 

##### Sliding Bars

This feature is a very integral component of graph interactivity because it allows the audience to see the same graph results across different values of a particular dimension. For example, consider looking at the poverty rate of countries relative to the population size across years. Adding a sliding bar could allow the audience to utilize the sliding bar to manually observe the change in the poverty rate to population relation across the years. Since this feature involves having an interactive component (the sliding bar) that has an effect on the rendering of the plot, it is often used in Bokeh with a client and server side but it is possible to create this custom feature using the Jupyter notebook's ```interact``` function in combination with Bokeh's ```push_notebook``` function. For example using randomly generated data, we will display the data associated with one particular year at a time. This involves filtering out portions of the data source of the plot which can be achieved using the column data source view (CDSView) as shown below:

In [117]:
from bokeh.io import push_notebook, show
from ipywidgets import interact
import pandas as pd

# Generate Data
x = np.random.randint(low=1, high=1000, size=500)
y = np.random.randint(low=1, high=100, size=500) 
years = [1900+(5*i) for i in range(10)]
data = pd.DataFrame(dict(x=x,y=y,year=[years[np.random.randint(low=0, high=9)] for i in range(500)]))
source = ColumnDataSource(data[data['year'] == 1900])

# Plot with Sliding Bar
output_notebook()
p_slide = figure(
    title="Sliding Bar Example",
    x_axis_label="Random X",
    y_axis_label="Random Y"
)
circles = p_slide.circle('x', 'y', source=source)

def update(year=1900):
    circles.data_source = ColumnDataSource(data[data['year'] == year])
    push_notebook()

show(p_slide, notebook_handle=True)
interact(update, year=(1900, 1945, 5))

##### Checkboxes

##### Numerical Input Fields

## Example Application: Chocolate Bar Ratings
<a id='app1'></a>

"Change over time, specify cocoa percentage range"

In [4]:
import pandas

## Example Application: Zomato Restaurants
<a id='app2'></a>

"Change over price range, specify average cost for two range, Checkbox for particular cuisines/currency/etc., Side-by-side comparison, geo-data"

## Summary
<a id='summary'></a>

As you have seen, this tutorial provides an overview of the basic functionalities of the Bokeh library in Python for interactive graphical visualization as well as some real-world applications of the Bokeh library. For more detailed information regarding the Bokeh library, the datasets used in this tutorial, or other interactive graph libraries to explore, please see the links provided in the resources below.

## Resources
<a id='resources'></a>

* Bokeh: [https://bokeh.pydata.org/en/latest/docs/user_guide.html](https://bokeh.pydata.org/en/latest/docs/user_guide.html)
* Chocolate Bar Ratings Data: [https://www.kaggle.com/rtatman/chocolate-bar-ratings/](https://www.kaggle.com/rtatman/chocolate-bar-ratings/)
* Zomato Restaurants: [https://www.kaggle.com/shrutimehta/zomato-restaurants-data/](https://www.kaggle.com/shrutimehta/zomato-restaurants-data/)
* mpld3: [http://mpld3.github.io/](http://mpld3.github.io/)
* pygal: [http://pygal.org/](http://pygal.org/)
* Plotly: [https://plot.ly/python/](https://plot.ly/python/)