# How to make interactive visualization with bokeh

## Introduction

This tutorial will introduce how to make interactive visualization with Bokeh. Bokeh is a Python library that enable you to make high-quality interactive visualizations based on web browser in a D3.js style without knowing any knowledge about D3.js. It provides two kinds of interfaces to users: bokeh.models for developers and bokeh.plotting for ordinary users. In this tutorial we will focus on bokeh.plotting

After reading this tutorial, you will be able to:
- draw 


## Tutorial content

In this tutorial, we will show how to do some basic spatial analysis in Python, specifically using [GeoPandas](http://geopandas.org/), [Shapeley](http://toblerity.org/shapely/), and [GeoPy](https://geopy.readthedocs.io).

We'll be using data collected from the Pittsburgh GIS data repository from the Pittsburgh city planning department: http://pittsburghpa.gov/dcp/gis/gis-data-new.  While there aren't always public repositories for GIS data, a surprisingly large number of cities or counties maintain an up-to-date set of GIS data available on their web pages, and the analysis we do here could be repeated for many other cities as well.

The following topics will be covered in this tutorial:
- [Installing the Bokeh](#Installing-the-libraries)
- [Bokeh basics: make your first plot](#Loading-data-and-plotting)
- [Geometric operations](#Geometric-operations)
- [Spatial joins](#Spatial-joins)
- [Coordinate reference systems](#Coordinate-reference-systems)
- [Example application: adddress2neighborhood](#Example-application:-address2neighborhood)
- [Example applicaiton: Yelp by neighborhood](#Example-application:-Yelp-by-neighborhood)

- bokeh基本操作，画个折线图
- hover 数据标签
- 调整日期范围
- 选择不同类别的数据，下拉式菜单
- example ： 自己拉伸，crossfilter 图表 
https://github.com/bokeh/bokeh/tree/branch-2.4/examples/app/crossfilter
- example ： 会动的图表
https://github.com/bokeh/bokeh/tree/branch-2.4/examples/app/gapminder#gapminder-example

## Installing the Bokeh

You can use pip or conda to install Bokeh

In [None]:
#pip install
pip install bokeh

In [None]:
#conda install
conda install bokeh

## Bokeh basics

Bokeh is quite similar to ggplot in R. It has two important concepts: figure and glyphs.

- `figure` is like a canvas where you can plot the data and add titles, axis labels, etc. Every visualization must start from creating a figure.
- `glyph` is the symbol of data that we want to add on the figure. There are various kinds of glyphs, including lines, points, patches, arcs, bars and so on. We can lay multiple glyphs on the figure. 

After placing the glyphs on the figure, we should call the `show()` function to display the figure. In this tutorial I intend show the visualization in this notebook so I called `output_notebook()`. You can call `output_files()` if you want to save the figures in a seperate html file.

In [16]:
from sklearn import datasets
boston = datasets.load_boston()

In [1]:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
import bokeh

In [4]:
#show the visualizaiton in the notebook()
output_notebook()

# prepare some data
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

# create a scatter plot. Set the title, the axis labels and the size
fig = figure(title="Make your first plot", plot_width=400, plot_height=400,x_axis_label="x", y_axis_label="y")

# add a line with a legend and set line width
fig.line(x, y, legend_label="Temp.", line_width=2)

# add a circle renderer with a size, color, and alpha. 
# We can use different sizes of circles to represent a new series of data
circle_size = [3,2,1,4,5]
fig.circle(x, y, size=circle_size, color="navy", alpha=0.5,legend_label="circle")

# add bars on the same figure
z = [5,6,2,3,1]
fig.vbar(x=x,top=z,width = 0.3, bottom = 0,color = "blue")
# p.vbar(x=[1, 2, 3], width=0.5, bottom=0,top=[1.2, 2.5, 3.7], color="firebrick")

# show the results
show(fig)

You can find a series of default buttons on the right hand side of the figure. 
- By clicking the cross mark and draging the plot， you can change the displayed area and look at anywhere you like
- By clicking the magnifying glass symbol and select an area in the plot, you can enlarge the selected area and check its details
- By clicking the second magnifying glass symbol, you can change the scale factor as you scroll up or down
- By clicking the save button, you can download a png file of this figure
- By clicking the reset button, you can reset the figure
- By clicking the question mark, you can get more information about how to use this plot


In [4]:
# prepare some data
x = [1, 2, 3, 4, 5]
y1 = [6, 7, 2, 4, 5]
y2 = [2, 3, 4, 5, 6]
y3 = [4, 5, 5, 7, 2]

# create a new plot with a title and axis labels
p = figure(title="Multiple line example", x_axis_label="x", y_axis_label="y")

# add multiple renderers
p.line(x, y1, legend_label="Temp.", line_color="blue", line_width=2)
p.line(x, y2, legend_label="Rate", line_color="red", line_width=2)
p.line(x, y3, legend_label="Objects", line_color="green", line_width=2)

# show the results
show(p)

In [5]:
from numpy import cos, linspace
x = linspace(-6, 6, 100)
y = cos(x)
p = figure(width=500, height=500)
p.circle(x, y, size=7, color="firebrick", alpha=0.5)
show(p)

In [6]:
bokeh.sampledata.download()

Creating /Users/bytedance/.bokeh directory
Creating /Users/bytedance/.bokeh/data directory
Using data directory: /Users/bytedance/.bokeh/data
Downloading: CGM.csv (1589982 bytes)
   1589982 [100.00%]
Downloading: US_Counties.zip (3171836 bytes)
   3171836 [100.00%]
Unpacking: US_Counties.csv
Downloading: us_cities.json (713565 bytes)
    713565 [100.00%]
Downloading: unemployment09.csv (253301 bytes)
    253301 [100.00%]
Downloading: AAPL.csv (166698 bytes)
    166698 [100.00%]
Downloading: FB.csv (9706 bytes)
      9706 [100.00%]
Downloading: GOOG.csv (113894 bytes)
    113894 [100.00%]
Downloading: IBM.csv (165625 bytes)
    165625 [100.00%]
Downloading: MSFT.csv (161614 bytes)
    161614 [100.00%]
Downloading: WPP2012_SA_DB03_POPULATION_QUINQUENNIAL.zip (4816256 bytes)
   4816256 [100.00%]
Unpacking: WPP2012_SA_DB03_POPULATION_QUINQUENNIAL.csv
Downloading: gapminder_fertility.csv (64346 bytes)
     64346 [100.00%]
Downloading: gapminder_population.csv (94509 bytes)
     94509 [100.0

In [8]:
import pandas as pd

In [9]:
from bokeh.sampledata.stocks import AAPL
#from bokeh.models import (PanTool, WheelZoomTool)

df = pd.DataFrame(AAPL)
df['date'] = pd.to_datetime(df['date'])


In [11]:
import pandas as pd

from bokeh.layouts import column, row
from bokeh.models import Select
from bokeh.palettes import Spectral5
from bokeh.plotting import curdoc, figure
from bokeh.sampledata.autompg import autompg_clean as df

df = df.copy()

SIZES = list(range(6, 22, 3))
COLORS = Spectral5
N_SIZES = len(SIZES)
N_COLORS = len(COLORS)

# data cleanup
df.cyl = df.cyl.astype(str)
df.yr = df.yr.astype(str)
del df['name']

columns = sorted(df.columns)
discrete = [x for x in columns if df[x].dtype == object]
continuous = [x for x in columns if x not in discrete]

def create_figure():
    xs = df[x.value].values
    ys = df[y.value].values
    x_title = x.value.title()
    y_title = y.value.title()

    kw = dict()
    if x.value in discrete:
        kw['x_range'] = sorted(set(xs))
    if y.value in discrete:
        kw['y_range'] = sorted(set(ys))
    kw['title'] = "%s vs %s" % (x_title, y_title)

    p = figure(plot_height=600, plot_width=800, tools='pan,box_zoom,hover,reset', **kw)
    p.xaxis.axis_label = x_title
    p.yaxis.axis_label = y_title

    if x.value in discrete:
        p.xaxis.major_label_orientation = pd.np.pi / 4

    sz = 9
    if size.value != 'None':
        if len(set(df[size.value])) > N_SIZES:
            groups = pd.qcut(df[size.value].values, N_SIZES, duplicates='drop')
        else:
            groups = pd.Categorical(df[size.value])
        sz = [SIZES[xx] for xx in groups.codes]

    c = "#31AADE"
    if color.value != 'None':
        if len(set(df[color.value])) > N_COLORS:
            groups = pd.qcut(df[color.value].values, N_COLORS, duplicates='drop')
        else:
            groups = pd.Categorical(df[color.value])
        c = [COLORS[xx] for xx in groups.codes]

    p.circle(x=xs, y=ys, color=c, size=sz, line_color="white", alpha=0.6, hover_color='white', hover_alpha=0.5)

    return p


def update(attr, old, new):
    layout.children[1] = create_figure()


x = Select(title='X-Axis', value='mpg', options=columns)
x.on_change('value', update)

y = Select(title='Y-Axis', value='hp', options=columns)
y.on_change('value', update)

size = Select(title='Size', value='None', options=['None'] + continuous)
size.on_change('value', update)

color = Select(title='Color', value='None', options=['None'] + continuous)
color.on_change('value', update)

controls = column(x, y, color, size, width=200)
layout = row(controls, create_figure())

curdoc().add_root(layout)
curdoc().title = "Crossfilter"

In [38]:
import pandas as pd
import numpy as np

from bokeh.io import show, output_notebook, push_notebook
from bokeh.plotting import figure

from bokeh.models import CategoricalColorMapper, HoverTool, ColumnDataSource, Panel
from bokeh.models.widgets import CheckboxGroup, Slider, RangeSlider, Tabs

from bokeh.layouts import column, row, WidgetBox
from bokeh.palettes import Category20_16

from bokeh.application.handlers import FunctionHandler
from bokeh.application import Application
import bokeh
output_notebook()

In [39]:
# !!
data = pd.read_csv('vehicle.csv')
destination = list(set(data['des']))
all_des = list(set(data['des']))
data.head()

Unnamed: 0,tmstmp1,vid,tmstmp,lat,lon,hdg,pid,rt,des,pdist,spd,tablockid,tatripid
0,2016-08-11 10:56:00,5549,2016-08-11 10:56:00,40.439504,-79.996981,114,4521,61A,Swissvale,1106,0,061A-164,6691
1,2016-08-11 10:56:00,5287,2016-08-11 10:56:00,40.438016,-79.92738,83,4521,61A,Swissvale,22921,20,061A-163,6687
2,2016-08-11 10:56:00,6114,2016-08-11 10:56:00,40.418897,-79.88397,128,4521,61A,Swissvale,48014,12,061A-162,6683
3,2016-08-11 10:56:00,5646,2016-08-11 10:56:00,40.441155,-79.89299,274,4663,61A,Downtown,15953,23,061A-166,6433
4,2016-08-11 10:56:00,5443,2016-08-11 10:56:00,40.43637,-79.968362,269,4663,61A,Downtown,40770,30,061A-165,6430


In [40]:

def modify_doc(doc):
    def make_dataset(df, des_list,upper_bound = 50,lower_bound= 0,bin_width = 5):
        each_des = pd.DataFrame(columns=['proportion', 'vehicles_cnt','left','right','destination','color'])
        data_range = upper_bound-lower_bound

        if data_range<=0:
            return None

        #iterate through all the destinations
        for i, des in enumerate(des_list):
            # select desired destination
            sub_df = df[df['des']== des]

            #create a np histogram
            arr_hist, edges = np.histogram(sub_df['spd'], 
                                           bins = int(data_range / bin_width), 
                                           range = [lower_bound, upper_bound])

            #
            des_df =  pd.DataFrame({'proportion': arr_hist / np.sum(arr_hist), 
                                   'left': edges[:-1], 'right': edges[1:]}, )


            #count the # of vehicles driving to this destination
            des_df['vehicles_cnt'] = np.sum(arr_hist)

            # Assign the carrier for labels
            des_df['destination'] = des

            # Color each carrier differently
            des_df['color'] = Category20_16[i]

            each_des = each_des.append(des_df)
        each_des = each_des.sort_values(['destination', 'left'])
#     return each_des
        return ColumnDataSource(each_des)

 
    def set_style(fig):
        '''
        Input: a figure object
        Output: a figure object with desired style
        '''
        # Title 
        fig.title.align = 'center'
        fig.title.text_font_size = '20pt'
        fig.title.text_font = 'serif'

        # Axis titles
        fig.xaxis.axis_label_text_font_size = '14pt'
        fig.xaxis.axis_label_text_font_style = 'bold'
        fig.yaxis.axis_label_text_font_size = '14pt'
        fig.yaxis.axis_label_text_font_style = 'bold'

        # Tick labels
        fig.xaxis.major_label_text_font_size = '12pt'
        fig.yaxis.major_label_text_font_size = '12pt'

        return fig

    def make_plot(ColumnDataSource_obj):
        '''
        Input: a ColumnDataSource object
        Output: a figure with quad glyphs
        '''
        # Blank plot with correct labels
        fig = figure(plot_width = 600, plot_height = 600, 
                  title = "Histogram of Vehicles' Speed by Destinations",
                  x_axis_label = 'Speed', y_axis_label = 'Proportion')

        # Quad glyphs to create a histogram
        fig.quad(source = ColumnDataSource_obj, bottom = 0, top = 'proportion', left = 'left', right = 'right',
               color = 'color', fill_alpha = 0.7, hover_fill_color = 'color', legend_field = 'destination',
               hover_fill_alpha = 1.0, line_color = 'black')

        # Hover tool with vline mode
        hover = HoverTool(tooltips=[('Destination', '@destination'), 
                                    ('# Vehicles', '@vehicles_cnt'),
                                    ('Proportion', '@proportion')],
                          mode='vline')

        fig.add_tools(hover)

        # Styling
        fig = set_style(fig)

        return fig   

    # Update function takes three default parameters
#     def update(attr, old, new):
#         # Get the list of carriers for the graph
#         plot_selected = [select_des.labels[i] for i in select_des.active]

#         # Make a new dataset based on the selected carriers and the 
#         # make_dataset function defined earlier
#         new_des = make_dataset(data, plot_selected)

#         # Update the source used the quad glpyhs
#         data_des.data.update(new_des.data)

    def update(attr, old, new):
        plot_selected = [select_des.labels[i] for i in select_des.active]
        
        new_des = make_dataset(data, plot_selected,
                               lower_bound = range_select.value[0],
                               upper_bound = range_select.value[1],
                               bin_width = binwidth_select.value)

        data_des.data.update(new_des.data)

        
    select_des = CheckboxGroup(labels=all_des, active = [0, 1,3,4])
    select_des.on_change('active', update)

    binwidth_select = Slider(start = 1, end = 30, 
                         step = 1, value = 5,
                         title = 'Bar Width (min)')
    binwidth_select.on_change('value', update)
    
    range_select = RangeSlider(start = 0, end = 100, value = (0, 100),
                               step = 5, title = 'Speed Range (min)')
    range_select.on_change('value', update)
    
    controls = bokeh.models.Column(select_des,binwidth_select, range_select)
    
    des_ = [select_des.labels[i] for i in select_des.active]
    
    data_des = make_dataset(data,des_)

    fig_des = make_plot(data_des)
    
    layout = row(fig_des,controls)
    doc.add_root(layout)
    
    
# Set up an application
handler = FunctionHandler(modify_doc)
app = Application(handler)
show(app)

In [41]:
show(app)

In [29]:
def make_dataset(df, des_list,upper_bound = 50,lower_bound= 0,bin_width = 5):
    each_des = pd.DataFrame(columns=['proportion', 'vehicles_cnt','left','right','destination','color'])
    data_range = upper_bound-lower_bound
    
    if data_range<=0:
        return None
    
    #iterate through all the destinations
    for i, des in enumerate(des_list):
        # select desired destination
        sub_df = df[df['des']== des]
        
        #create a np histogram
        arr_hist, edges = np.histogram(sub_df['spd'], 
                                       bins = int(data_range / bin_width), 
                                       range = [lower_bound, upper_bound])
        
        #
        des_df =  pd.DataFrame({'proportion': arr_hist / np.sum(arr_hist), 
                               'left': edges[:-1], 'right': edges[1:]}, )
        
        
        #count the # of vehicles driving to this destination
        des_df['vehicles_cnt'] = np.sum(arr_hist)
        
        # Assign the carrier for labels
        des_df['destination'] = des

        # Color each carrier differently
        des_df['color'] = Category20_16[i]
        
        each_des = each_des.append(des_df)
    each_des = each_des.sort_values(['destination', 'left'])
    return each_des
#     return ColumnDataSource(each_des)

In [41]:
def set_style(fig):
    '''
    Input: a figure object
    Output: a figure object with desired style
    '''
    # Title 
    fig.title.align = 'center'
    fig.title.text_font_size = '20pt'
    fig.title.text_font = 'serif'

    # Axis titles
    fig.xaxis.axis_label_text_font_size = '14pt'
    fig.xaxis.axis_label_text_font_style = 'bold'
    fig.yaxis.axis_label_text_font_size = '14pt'
    fig.yaxis.axis_label_text_font_style = 'bold'

    # Tick labels
    fig.xaxis.major_label_text_font_size = '12pt'
    fig.yaxis.major_label_text_font_size = '12pt'

    return fig


def make_plot(ColumnDataSource_obj):
    '''
    Input: a ColumnDataSource object
    Output: a figure with quad glyphs
    '''
    # Blank plot with correct labels
    fig = figure(plot_width = 700, plot_height = 700, 
              title = 'Histogram of Arrival Delays by Carrier',
              x_axis_label = 'Speed', y_axis_label = 'Proportion')

    # Quad glyphs to create a histogram
    fig.quad(source = ColumnDataSource_obj, bottom = 0, top = 'proportion', left = 'left', right = 'right',
           color = 'color', fill_alpha = 0.7, hover_fill_color = 'color', legend_label = 'destination',
           hover_fill_alpha = 1.0, line_color = 'black')

    # Hover tool with vline mode
    hover = HoverTool(tooltips=[('Destination', '@destination'), 
                                ('# Vehicles', '@vehicles_cnt'),
                                ('Proportion', '@proportion')],
                      mode='vline')

    fig.add_tools(hover)

    # Styling
    fig = set_style(fig)

    return fig

In [42]:
show(make_plot(ColumnDataSource(make_dataset(data,d))))

In [14]:
select_des = CheckboxGroup(labels=d, active = [0, 1])
show(select_des)

[select_des.labels[i] for i in select_des.active]

['Murray-Waterfront', 'Downtown']

In [None]:
# Update function takes three default parameters
def update(attr, old, new):
    # Get the list of carriers for the graph
    plot_selected = [select_des.labels[i] for i in select_des.active]

    # Make a new dataset based on the selected carriers and the 
    # make_dataset function defined earlier
    updated = make_dataset(data, plot_selected,
                           lower_bound = 0,
                           upper_bound = 30,
                           bin_width = 5)
    
    # Convert dataframe to column data source
    updated = ColumnDataSource(updated)

    # Update the source used the quad glpyhs
    src.data.update(updated.data)

In [11]:
show(app)