## Intro to Bokeh

Bokeh is another plotting library for Python. Unlike `matplotlib` or `seaborn`, which serve up images, `bokeh` generates javascript that you can display in a notebook or serve with Flask. Because it isn't just a static image, the end user will still be able to interact with it.

### Pros of Bokeh over D3:
- Allows us to quickly make standard charts in Python, without having to write Javascript.
- Has sensible 'defaults', so we don't have to manually specify everything when making a chart.

### Pros of D3 over Bokeh:
- Much more customizable. Can also be used for interactive graphics and "scrollytelling". See examples of D3 in [this visualization of decision trees](http://www.r2d3.us/visual-intro-to-machine-learning-part-1/), [this article about p-hacking](https://fivethirtyeight.com/features/science-isnt-broken/#part1), the ["paths to victory"](https://www.nytimes.com/elections/2012/results/president/scenarios.html), or even the smiley face in the movie sentiment application.

**Pick standard visualizations over custom ones, unless the custom ones truly add value**

For each custom visualization, the user will have to spend time deciphering what you are communicating. You should make sure that custom visualizations communicate enough that they warrent the extra 'processing power' of the user. For example, I am not a fan of the [NYT graphs on how the recession changed the economy](https://www.nytimes.com/interactive/2014/06/05/upshot/how-the-recession-reshaped-the-economy-in-255-charts.html).

In [None]:
# imports
from bokeh.plotting import figure, show
from bokeh.embed import components
from bokeh.models import CategoricalColorMapper, HoverTool

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

In [None]:
from bokeh.io import output_notebook
output_notebook()

## 1. Making a simple plot

Before we start using dataframes, let's create a very simple scatter plot using Bokeh to understand the basic syntax. The steps are

1. Create a plot object `simpleScatterPlot` using `figure`
2. Add glyphs to that plot object (e.g. `simpleScatterPlot.<glyph>`, where `<glyph>` is `circle`, `square`, `line`, etc)
3. Edit the axis labels
4. Show the plot with `show(simpleScatterPlot)`

The show command opens a new browser window with the plot in it. Note that you can zoom the graph, as well as pan the viewport.

In [None]:
# simple plot
simpleScatterPlot = figure(plot_width=600, plot_height=600)

x = [1,2,3,4,5,6]
y = [5,4,3,2,1,0]

simpleScatterPlot.circle(x, y, size=10, color='firebrick', alpha=0.7)

simpleScatterPlot.xaxis.axis_label='x'
simpleScatterPlot.yaxis.axis_label='y'

# generate above in an html file, make sure pop-up blockers aren't hindering the render
show(simpleScatterPlot)

We can also see that glyphs are additive: we can add multiple glyphs to the same graph

In [None]:
# new simple plot
simpleScatterPlot = figure(plot_width=600, plot_height=600)

simpleScatterPlot.circle(x, y, size=10, color='firebrick', alpha=0.7)
simpleScatterPlot.square(x, y, size=6, color='yellow', alpha=0.7)
simpleScatterPlot.circle(x, y, size=3, color='black', alpha=0.7)
simpleScatterPlot.line(x, y, color='purple', alpha=0.7)


simpleScatterPlot.xaxis.axis_label='x'
simpleScatterPlot.yaxis.axis_label='y'

# generate above in an html file, make sure pop-up blockers aren't hindering the render
show(simpleScatterPlot)

## 2. Making a scatter plot from the iris set

The iris dataset needs some unpacking (i.e. we have to extract the columns from the numpy array before we plot). The work can be done on the dataframe itself, or within Bokeh with raw data. Of course, if we used our local `iris.csv` file, we would have clean headers and labels, but using Bokeh solely serves to show how we can tailor our graphs.

The basic plotting syntax is:

```python
plotName.markerGlyph( X, Y, ... formatting options)
```

We will make a simple scatter plot before jumping into the iris dataset to show the syntax.

### First do our standard loading and converting into a dataframe

In [None]:
iris_data = load_iris()

In [None]:
iris_df = pd.DataFrame(iris_data['data'], columns=iris_data['feature_names'])
iris_df['target'] = iris_data['target']
iris_df['target_names'] = iris_df['target'].apply(lambda x: iris_data['target_names'][x] )
iris_df.head()

### 2.1 Make a basic scatter plot

In [None]:
# now in the iris dataset:
scatterPlot = figure(plot_width=600, plot_height=600)

color_mapper = CategoricalColorMapper(factors = iris_data['target_names'], 
                                     palette=['#449D66', '#D46666', '#6688EC']) # pick our colors with hex, rgba, etc.

scatterPlot.circle('petal length (cm)', 'petal width (cm)', 
                   source=iris_df, legend_label='target_names', size=10,
                   color={
                       'field': 'target_names',
                       'transform': color_mapper
                   })

scatterPlot.xaxis.axis_label = iris_data['feature_names'][2]
scatterPlot.yaxis.axis_label = iris_data['feature_names'][3]

scatterPlot.legend.location = 'bottom_right'
scatterPlot.legend.background_fill_color = '#f9e9b6'

show(scatterPlot)

**Basic idea with Bokeh:**
1. Make a plot by calling `figure`. You can set some overall parameters here (e.g. plot size, and x and y axes)
2. On your figure, add glyphs (i.e., circles, squares, triangles, etc) that represent your data. You can do this either in "matplotlib style":
```python
# first series
figure.circle(X1, Y1, .......formatting options for series 1 .....)
# seconds series
figure.circle(X2, Y2, .......formatting options for series 1 .....)
```
or "seaborn style":
```python
figure.circle('col1', 'col2', source=df, ......)
```
3. Add interactive elements (not seen yet)
4. Add final touches to overall plot (e.g. legend location, setting axis titles)
5. Display to screen (seen), or make javascript and HTML divs (shown when we use Flask)

Some of the formatting options are a little complicated. In the example above, the color of the circle was a dictionary:
```python
scatterPlot.circle('petal length (cm)', 'petal width (cm)', 
                   source=iris_df, legend='target_names', size=10,
                   color={
                       'field': 'target_names',
                       'transform': color_mapper
                   })
```
The value of `'field'` is the name of the column that we are going to use to determine the color (in this case, the `target_names` column of `iris_df`). The `color_mapper` takes in color names, and returns strings representing the hex color code.

### 2.2 Let's make a tooltip!

Tooltips show up when you hover over a data point. We can make them pretty easily; our tooltips are a list of tuples of the form `( string for title of tooltip, string for accessing information )`. The title string can be whatever we want. The accessory information takes one of two formats
- Getting information from the graph, usually `$x` (get the x coordinate) or `$y` (get the y coordinate)
- Getting information from the dataframe used in `source` when creating the figure. To get the data from column `'colname'`, use `@colname`.

For example, on the car dataset, hovering over a datapoint would display a box showing mpg, weight, and length:

```python
tooltips = [
    ('Miles per gallon (mpg)', '@mpg'),
    ('Weight (lbs)', '@weight'),
    ('Length', '@length')
]
```

In [None]:
# Note that we could add sepal information as well, if the column
# names didn't have spaces in them. For example, we would have
# ('Sepal length', '@sepal_length')

tooltips = [
    ("Petal width", "$y"),
    ("Petal length", "$x"),
    ("Flower type", "@target_names")
]

hover = HoverTool(tooltips=tooltips)
scatterPlot.add_tools(hover)

show(scatterPlot)

## 3. Making a line chart (using a different data set)

The iris dataset isn't great for line graphs, as the irises don't have any particular order. Time data, such as the rainfall in Chicago, is a much better example of where we would use a line chart. Let's plot the Chicago rainfall per day

In [None]:
chicago_df = pd.read_csv('chicago-weather.csv', parse_dates=['DATE'])
chicago_df.head()

In [None]:
rain_plot = figure(plot_width = 600, plot_height=600, 
                   x_axis_label="Date", y_axis_label="mm of rain",
                   x_axis_type="datetime")

rain_plot.line('DATE', 'RAIN_mm', source=chicago_df)

rain_plot.title.text = "Chicago Rainfall (2017)"
rain_plot.title.align = "center"
rain_plot.title.text_color = "#4343FD"
rain_plot.title.text_font_size = "25px"
rain_plot.title.background_fill_color = "#ADADAD"

show(rain_plot)

## 4. Graphs on Maps

This is a fun example taken verbatim from the Bokeh documentation. Just as sklearn has built in datasets, `bokeh` does as well (including a map of the US, US counties, and US airports). Let's start by downloading the sample data (not included with a default install).

In [None]:
from bokeh import sampledata

sampledata.download() # this may take up to a minute

In [None]:
# create US employment map

from bokeh.sampledata import us_states, us_counties, unemployment

us_states = us_states.data.copy()
us_counties = us_counties.data.copy()
unemployment = unemployment.data

del us_states["HI"]
del us_states["AK"]

state_xs = [us_states[code]["lons"] for code in us_states]
state_ys = [us_states[code]["lats"] for code in us_states]

county_xs=[us_counties[code]["lons"] for code in us_counties if us_counties[code]["state"] not in ["ak", "hi", "pr", "gu", "vi", "mp", "as"]]
county_ys=[us_counties[code]["lats"] for code in us_counties if us_counties[code]["state"] not in ["ak", "hi", "pr", "gu", "vi", "mp", "as"]]

colors = ["#F1EEF6", "#D4B9DA", "#C994C7", "#DF65B0", "#DD1C77", "#980043"]

county_colors = []
for county_id in us_counties:
    if us_counties[county_id]["state"] in ["ak", "hi", "pr", "gu", "vi", "mp", "as"]:
        continue
    try:
        rate = unemployment[county_id]
        idx = min(int(rate/2), 5)
        county_colors.append(colors[idx])
    except KeyError:
        county_colors.append("black")

p = figure(title="US Unemployment 2009", toolbar_location="left",
    plot_width=1100, plot_height=700)

p.patches(county_xs, county_ys, fill_color=county_colors, fill_alpha=0.7,
    line_color="white", line_width=0.5)
p.patches(state_xs, state_ys, fill_alpha=0.0,
    line_color="#884444", line_width=2)

show(p)

## 4. Make a histogram from the iris dataset

Making a histogram with Bokeh is more complicated than the other graphs. Bokeh doesn't have binning logic built into it to make sensible divisions, but it can draw rectangles given arrays for the values of the `top`, `bottom`, `left` and `right` sides of the rectangles. 

We will use numpy's `np.histogram` command to generate the locations of the rectangle sides, and then build the rectangles (a `quad` for quadralateral) glyph.

In [None]:
# Create the figure with only the save tool showing
fig = figure(title='Plot of iris data', 
             tools="save", background_fill_color="#E8DDCB")

# Process the iris data by species, and assign colors
fill_color = ['#036564', '#650364', '#656503']
line_color = ['#033649', '#033649', '#033649']

for index, species in enumerate(iris_data['target_names']):
    # select the petal length column for all petals of this species
    petal_length = iris_data['data'][iris_data['target'] == index, 2]
    my_hist, my_edges = np.histogram(petal_length, bins = 20)
    fig.quad(top=my_hist, bottom=0, left=my_edges[:-1], right=my_edges[1:],
             fill_color=fill_color[index], line_color=line_color[index],
             legend_label=iris_data['target_names'][index])

In [None]:
fig.xaxis.axis_label = 'Petal length (mm)'
fig.yaxis.axis_label = 'Count'

In [None]:
show(fig)

## 5. Make a histogram from generated data

The histogram is certainly a lot harder to create than the pandas `df['col_name'].hist()`!

So far, we have seen that Bokeh has the following benfits:
- the generated graph is interactive (which can be a plus for exploration, but a negative for static reporting)
- for non-histograms, it has generally been pretty intuitive to "add" different glyph types
The big use case is being able to generate the HTML needed to display the graph, so that when new data comes in, we are able to update the graph live in a webapp.

We will first show how to generate a histogram from randomly generated data (simulating new data we collected); in the next section we will show the piece we have to add to get the HTML we can use in Flask.


In [None]:
import numpy as np

measured = np.random.normal(0, 1, 1000)

hist, edges = np.histogram(measured, density=True, bins=50)

fig_random = figure(title='Plot of random data', 
                    tools="save",
                    background_fill_color="#E8DDCB")

fig_random.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
                fill_color="#036564", line_color="#033649")

In [None]:
show(fig_random)

## Optional: Using with Flask

Ok, so we have seen how to generate histograms, bar charts, line charts, and scatter plots from Bokeh.

We will use the `components` function from `bokeh.embed` to actually generate the javascript and div that we need to put on a webpage inside `app.py`. The general syntax is
```python
javascript, div = component(figure_object)
```
where the returned `javascript` and `div` are both strings that should get copied into the HTML (we will do this using templates).

If you look inside `app.py`, you will see the index function has the code for generated a random histogram in a function called index, as well as the `javascript, div = components(fig)` line.

In [None]:
# Look at app.py!
!cat app.py

We can look inside the `templates/index.html` file to see how `javascript` and `div` are being used:

In [None]:
!cat templates/index.html

This will make a little more sense after the Flask lecture =). The basic idea is that the `{{div | safe}}` code says "take what is in the variable `div`, and paste its value here". This is similar to the f-string in Python when we use one curly brace, i.e.

Python `f"{div}` is similar to Flask's `{{div | safe}}`. The `safe` is to let Flask know that we are running code, and that we trust that code (it is "safe" to run).

To see it in action, run `python app.py` from the command line in this directory.