Find the notebook here: https://github.com/DmitriyLeybel/Bokeh_Overview_Tutorial

By Dmitriy Leybel - dmleybel@gmail.com


![bokeh-eh](bokeh.jpg)
*(ha-ha)*

## Why visualization?

![xkcd](xkcd.png)


# A Bouquet of Bokeh

The interpreter creates the models for the plots. Models are rendered with BokehJS.

![architecture](architecture.png)

![models](https://bokeh.pydata.org/en/latest/_images/document.svg)

### **RUN CELLS SEQUENTIALLY IF YOU VALUE YOUR SANITY**

In [1]:
# whoa that's a big import
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

from bokeh.models import (ColumnDataSource, DataRange1d, 
                          FactorRange, CategoricalScale, CategoricalTicker, 
                          BasicTicker, FixedTicker, BoxAnnotation)
from bokeh.models.tools import HoverTool, WheelZoomTool, ResetTool
from bokeh.models.formatters import NumeralTickFormatter
from bokeh.models.widgets import Slider, Div
from bokeh.models.callbacks import CustomJS
from bokeh.models.glyphs import DiamondCross
from bokeh.plotting import figure, show
from bokeh.embed import components
from bokeh.io import output_notebook, push_notebook, curdoc
from bokeh.transform import jitter, factor_cmap, LinearColorMapper
from bokeh.palettes import RdYlGn3, Inferno256
from bokeh.layouts import column
from bokeh.client import push_session
from bokeh.document import Document
from bokeh.events import Reset

Don't be scared.
> I must not fear.
> Fear is the mind-killer.

## Airbnb Dataset

In [2]:
df = pd.read_csv('airbnb_data.csv')
df = df[['neighborhood','bedrooms', 'price', 'reviews', 'accommodates', 'room_type', 'city']]
df.head()

Unnamed: 0,neighborhood,bedrooms,price,reviews,accommodates,room_type,city
0,Downtown,1.0,150.0,1,4,Shared room,Los Angeles
1,Long Beach,1.0,130.0,0,1,Shared room,Los Angeles
2,Long Beach,1.0,130.0,0,2,Shared room,Los Angeles
3,Glendale,1.0,125.0,0,1,Shared room,Los Angeles
4,Koreatown,1.0,120.0,0,1,Shared room,Los Angeles


In [3]:
# Sample down the data
pd.np.random.seed(5)
df_ds = df.sample(n=2500)
df_ds.shape

(2500, 7)

In [4]:
# Loads plots directly into the notebook when show() is called
output_notebook()

In [5]:
# Dataframe is put into a ColumnDataSource -- Used by Bokeh internals
# Internally, all Bokeh data is stored in a CDS
cds = ColumnDataSource(df_ds)

# A plot is created
p = figure()

# Circle glyph renderer is instantiated within the plot
circ = p.circle(x='bedrooms', y='price', source=cds)

show(p)

## Building a plot, object by object, model by model

In [6]:
# HoverTool object is instantiated with its default properties
p.add_tools(HoverTool())

# No variable assignment in previous step, so we must find the HoverTool instance with select()
ht = p.select(HoverTool)

# Edit property to display the desired values in the hover tip
ht.tooltips = [
    ("Index", '$index'),
    ("Room type", '@room_type'),
    ("City", "@city"),
    ("Neighborhood", "@neighborhood"),
    ("Review", "@reviews"),
    ("Price", "$@price")
]

# Access the circle glyph's properties and modify them
circ.glyph.size = 11
circ.glyph.line_color = (12,255,23,0.4)
circ.glyph.line_width = 1

# Replace the x axis 'bedrooms' with a jittered 'bedrooms' 
circ.glyph.x = jitter('bedrooms', width=0.5)

# Tick values on the x-axis changed to the discrete number of bedrooms
p.xaxis.ticker = FixedTicker(ticks=list(set(cds.data['bedrooms'])))

# Y ticks are formatted to display currency -- formats available in Bokeh documentation
p.yaxis.formatter = NumeralTickFormatter(format='$ 0,0[.]00')

# Annotate the plot
p.xaxis.axis_label = "# of Bedrooms"
p.yaxis.axis_label = "Price of room per day"
p.title.text = "Beds vs Price"
p.title.align = "center"

# Modifies the grid lines
p.ygrid.grid_line_color = 'gray'
p.xgrid.visible = False
p.ygrid.minor_grid_line_color = 'blue'
p.ygrid.minor_grid_line_alpha = 0.1

# Specify the initial values of the y range
p.y_range.start = -100
p.y_range.end = 5000

# Specify the bounds at which the graph is no longer accessible
p.x_range.bounds = (-2, 11)
p.y_range.bounds = (-5000, 30000)


show(p)

In [7]:
p.width = 950

p.min_border = 45

p.background_fill_color = p.border_fill_color = (64, 48, 117, 0.3)

p.border_fill_alpha = 0.2

p.axis.axis_label_text_font_size = p.title.text_font_size = '20px'

p.axis.major_label_text_font_size = '14px'

p.toolbar_location = 'above'

p.add_layout(BoxAnnotation(bottom=3000, fill_alpha=.3, fill_color='crimson'))

p.select_one(WheelZoomTool).dimensions = 'height'


show(p)

### Embedding

In [8]:
script, div = components(p)
print('Script: \n' + '*'*50 + script[0:1000])
print('\n\nDiv: \n' + '*'*50 + div)

Script: 
**************************************************
<script type="text/javascript">
    (function() {
  var fn = function() {
    Bokeh.safely(function() {
      (function(root) {
        function embed_document(root) {
          var docs_json = {"ab118276-a6c9-4cba-82df-ad47e740b143":{"roots":{"references":[{"attributes":{},"id":"e7d1dd77-83ef-4965-a7e8-e48e32f10374","type":"LinearScale"},{"attributes":{},"id":"3e02d31d-66b8-4f23-b2cc-b073f2ac169a","type":"HelpTool"},{"attributes":{},"id":"9b1a0dd2-2720-42c5-93eb-c840ad6d9d64","type":"LinearScale"},{"attributes":{"align":"center","plot":null,"text":"Beds vs Price","text_font_size":{"value":"20px"}},"id":"a2922599-47fa-4cd3-b1db-dfcc68642d42","type":"Title"},{"attributes":{"ticks":[0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0]},"id":"c061a2ee-c396-4448-b031-63bcf70ce617","type":"FixedTicker"},{"attributes":{"axis_label":"# of Bedrooms","axis_label_text_font_size":{"value":"20px"},"formatter":{"id":"87e0f609-82db-4bc0-ad30-89cde

#### Resources

``` html
<link
    href="http://cdn.pydata.org/bokeh/release/bokeh-0.12.9.min.css"
    rel="stylesheet" type="text/css">
<link
    href="http://cdn.pydata.org/bokeh/release/bokeh-widgets-0.12.9.min.css"
    rel="stylesheet" type="text/css">
<link
    href="http://cdn.pydata.org/bokeh/release/bokeh-tables-0.12.9.min.css"
    rel="stylesheet" type="text/css">

<script src="http://cdn.pydata.org/bokeh/release/bokeh-0.12.9.min.js"></script>
<script src="http://cdn.pydata.org/bokeh/release/bokeh-widgets-0.12.9.min.js"></script>
<script src="http://cdn.pydata.org/bokeh/release/bokeh-tables-0.12.9.min.js"></script>
```

### Autoloading - for the smart and lazy

``` python
from bokeh.resources import CDN
from bokeh.embed import autoload_static

js, tag = autoload_static(p, CDN, "some/path")
```

* JS goes into *some/path*.

* Script goes into your code.

``` js
<script
    src="some/path"
    id="c5339dfd-a354-4e09-bba4-466f58a574f1"
    async="true"
    data-bokeh-data="static"
    data-bokeh-modelid="7b226555-8e16-4c29-ba2a-df2d308588dc"
    data-bokeh-modeltype="Plot"
    data-bokeh-loglevel="info"
></script>
```


## Bar graph with groups
The amounts of listings with high, medium, and low prices in the most active neighborhoods.

In [39]:
hoods = df.groupby('neighborhood')['neighborhood'].count().sort_values(ascending=False)[0:5].index

df_h = df.loc[df.neighborhood.isin(hoods)].copy()

df_h.loc[:,'price_q'] = pd.qcut(df_h['price'], 3, labels=['Low', 'Medium', 'High'])

group = df_h.groupby(['neighborhood', 'price_q'])

In [15]:
source = ColumnDataSource(group)

# Maps red, yellow, and green to the price quantiles high, medium, low
colors =  factor_cmap('neighborhood_price_q', palette=RdYlGn3, factors=list(df_h['price_q'].unique().sort_values()), start=1)

f = figure(plot_width=950, title="Daily price by Neighborhood", x_range=group, background_fill_color='grey', tools='')

f.vbar(x='neighborhood_price_q', top='price_count', width=1, fill_color=colors, line_color='white', source=source)

f.x_range.range_padding = 0.1

f.xgrid.visible = False

def sfunc(x):
    if x[1] == 'High':
        return 0
    elif x[1] == 'Medium':
        return 1
    elif x[1] == 'Low':
        return 2

f.x_range.factors.sort(key=sfunc)
f.x_range.factors.sort(key=lambda tup: tup[0])


show(f)

## Unlimited power!
Event driven workflows

In [11]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/PAGjm4BMKlk?start=1340" frameborder="0" allowfullscreen></iframe>

## Interactive Widgets. No server required.

### Sorting the Bedrooms vs Price plot by # of reviews

In [12]:
slider = Slider(start=0, end=cds.data['reviews'].max(), value=0, step=5, title='At least this # of reviews:(Slide me pls)')
slider.bar_color = 'pink'
ccds = ColumnDataSource(cds.to_df())
slider.callback = CustomJS(args=dict(source=cds, ccds=ccds), code=
                          """
                          var data = source.data;                          
                          var cdata = ccds.data;                          
                          var slider_val = cb_obj.value;                          
                          var len = cdata['reviews'].length;                                           
                                                                           
                          var new_data = {
                            'bedrooms': [],
                            'price': [],
                            'reviews': []
                          };
                          
                          for(var i = 0; i < cdata['reviews'].length; i++) {
                            if(cdata['reviews'][i] >= slider_val) {
                                new_data['bedrooms'].push(cdata['bedrooms'][i]);
                                new_data['price'].push(cdata['price'][i]);
                                new_data['reviews'].push(cdata['reviews'][i]);
                            }
                          }
                          
                          source.data = new_data;
                          source.change.emit();
                          """
                          )
layout = column(slider, p)
show(layout)

#### Other Widgets
![widgets](widgets.png)

### Dynamic Linear Regression of a Selection - Using the Bokeh Server and bokeh.client interface

#### Amount of Rooms vs Accommodations
Expect linearity of accommodations with respect to rooms.

To run the following, bokeh server must be running. 
Start with *bokeh serve*

*Warning: The callback code should be impeccable, unless you want to dig through a mountain-sized traceback*

In [13]:
import math

sdf = cds.to_df().sample(n=1000)
scds = ColumnDataSource(sdf)

n = figure(tools=['lasso_select','box_select','reset'], title='Bedrooms vs Accomodated', x_axis_label='# of Bedrooms',
          y_axis_label='# of People Accomodated')

mapper = LinearColorMapper(palette=Inferno256, low=sdf['accommodates'].min(), high=sdf['accommodates'].max())

dc = n.diamond_cross('bedrooms', 'accommodates', source=scds, size=20, line_alpha=.3, fill_alpha=0.4, 
                 fill_color={'field':'accommodates', 'transform':mapper})

dc.selection_glyph = DiamondCross(fill_alpha=.7, line_width=3, line_color='crimson', fill_color='crimson')
dc.nonselection_glyph = DiamondCross(size=10, fill_alpha=0.1, line_alpha=.03, fill_color={'field':'accommodates',
                                                                                        'transform':mapper})

line = n.line(x=[], y=[], line_width=6, line_alpha=0.8, line_dash=(7,1))

div = Div(text='Equation goes here.', width=600)

def update(attr, old, new):
    indices = new['1d']['indices']
    
    if len(indices) > 0:
        bed_array = np.array(sdf['bedrooms'].iloc[indices]).reshape(-1,1)
        accom_array = np.array(sdf['accommodates'].iloc[indices]).reshape(-1,1)

        regression = LinearRegression()
        regression.fit(bed_array, accom_array)

        start = float(regression.predict(0))
        end = float(regression.predict(8))

        line.data_source.data['x'] = [0, 8]
        line.data_source.data['y'] = [start, end]
        
        coef = float(regression.coef_)
        intercept = float(regression.intercept_)
        div.text = f'<b>Equation:</b> y = {coef:0.02f}x + {intercept:0.02f}'
        
        y = intercept
        b = coef
        if coef == 0:
            theta = 1.5708
            dc.glyph.angle = 1.5708
        else:
            x = -y/coef
            hyp = math.sqrt(x**2 + y**2)
            theta = math.asin(y/hyp)
            if coef < 0:
                theta = -theta
            dc.glyph.angle = theta+1.5707

# bokeh.client has a bit of trouble on_event method
# def update_reset(event):
#     line.data_source.data['x'] = []
#     line.data_source.data['y'] = []
#     div.text = 'Equation goes here.'

# n.select_one(ResetTool).on_event(Reset, update_reset)

rjs = CustomJS(args=dict(line_data=line.data_source, div=div, dc=dc.glyph), code='''
    dc.angle = 0   
    line_data.data['x'] = []
    line_data.data['y'] = []
    line_data.change.emit()
    div.text = 'Equation goes here.'
    div.change.emit()
    ''')
n.js_on_event(Reset, rjs)
dc.data_source.on_change('selected', update)

col = column(div, n)

doc = Document()
doc.add_root(col)

session = push_session(doc)
session.show()
session.loop_until_closed() # run forever


    !!!! PLEASE NOTE !!!!

The use of `session.loop_until_closed` and `push_session` to run Bokeh
application code outside a Bokeh server is **HIGHLY DISCOURAGED** for any real
use.

Running application code outside a Bokeh server with bokeh.client in this way
has (and always will have) several intrinsic drawbacks:

* Fast binary array transport is NOT available! Base64 fallback is much slower
* All network traffic is DOUBLED due to extra hop between client and server
* Server *and* client process must be running at ALL TIMES for callbacks to work
* App code run outside the Bokeh server is NOT SCALABLE behind a load balancer

The bokeh.client API is recommended to use ONLY for testing, or for customizing
individual sessions running in a full Bokeh server, before passing on to viewers.

For information about different ways of running apps in a Bokeh server, see:

    http://bokeh.pydata.org/en/latest/docs/user_guide/server.html



### More types of glyphs
![glyphs](glyphs.png)

### Network Graphs
![network](network.png)

### Geodata
![chloropleth](chloropleth.png)

#### With tile providers
![tile](tile.png)

## Big data
**USE DATASHADER**

Dask + Datashader + Bokeh allow you to explore massive datasets
![datashader](datashader.png)

## Holoviews
``` python
# Declare
from bokeh.sampledata.iris import flowers
from holoviews.operation import gridmatrix

ds = hv.Dataset(flowers)

grouped_by_species = ds.groupby('species', container_type=hv.NdOverlay)
grid = gridmatrix(grouped_by_species, diagonal_type=hv.Scatter)

# Plot
plot_opts = dict(tools=['hover', 'box_select'], bgcolor='#efe8e2')
style = dict(fill_alpha=0.2, size=4)

grid({'Scatter': {'plot': plot_opts, 'style': style}})
```

![holoviews](holoviews.png)

# That's it.

### Documentation:
https://bokeh.pydata.org/en/latest/

### This notebook/presentation can be found here:
https://github.com/DmitriyLeybel/Bokeh_Overview_Tutorial

*Be sure to download the entire repo if you want the pictures to show, and lets be real, you do.* 
