<a href="https://colab.research.google.com/github/GabeMaldonado/JupyterNotebooks/blob/master/Bokeh_II.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bokeh Basics - I

Bokeh is a python library that enables users to create beautiful, dynamic and interactive visualizations. 

To [learn more about Bokeh visit its website](https://docs.bokeh.org/en/latest/index.html)

Want to create visualizations? Let's get started!


In [1]:
# import required libraries

from bokeh.io import output_notebook, show, reset_output, output_file

import bokeh
from bokeh.plotting import figure

import numpy as np
import pandas as pd

# import library for toy datasets
from vega_datasets import data as vds

## Load Data
We need data to plot! Bokeh provides example datsets we can use.

In [2]:
from bokeh.sampledata import iris
# load iris dataset
df_iris = iris.flowers
# display first five rows in the df
df_iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [3]:
# To display the graphs/output we need to run `output_notebook()` once if using JupyterLab
# and in every cell that would return a graph if using Colab
output_notebook()

## Creating Plots

To create plots we must follow this workflow:
*   Create a figure -- 
*   Create a glyph/plot. We have several options: line, bar, scatter.
*   show plot

## Bokeh's Data Structure
Bokeh uses the ColumnDataSource as its main data structure. The ColumnDataSource is created automatically most of the time but it can also be created explicitly. This data structure is a dictionary which maps the column names to sequences of values. 

In [4]:
 # to create the ColumnDataSource

 from bokeh.models import ColumnDataSource

 df = ColumnDataSource({'A' : [1, 2, 3, 4, 5],
                        'B' : [5, 4, 3, 2, 1],
                        'C' : [1, 3, 5, 1, 2 ]})
 df.data

{'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1], 'C': [1, 3, 5, 1, 2]}

## Create a Line Plot

We can create some random data to pass as our x and y values.

In [5]:
# plot a linear graph

from bokeh.models import HoverTool

# create toy data

x_ax = np.arange(10)
y_ax = np.random.rand(10)

# Create plot

line_plot = figure(plot_width=600, plot_height=425, title='Line Plot', x_axis_label='X', y_axis_label='Y')
line_plot.line(x_ax, y_ax, legend_label='line', line_width=2)

# add hover tool
line_plot.add_tools(HoverTool())
output_file('/line_chart.html')

show(line_plot)

### Creating a multi-variable line plot

In [6]:
output_notebook()

# create some random data
x_multi = np.arange(10)
y1_multi = np.random.rand(10)
y2_multi = np.random.rand(10)
y3_multi = np.random.rand(10)

# crete instance of the plot

multi_var_plot = figure(plot_width=600, plot_height=400, toolbar_location='below')
multi_var_plot.line(x_multi, y1_multi, color='yellow', line_width=4, legend_label='y1')
multi_var_plot.line(x_multi, y2_multi, color='blue', line_width=4, legend_label='y2')
multi_var_plot.line(x_multi, y3_multi, color='red', line_width=4, legend_label='y3')
multi_var_plot.add_tools(HoverTool())

output_file('/multiline_chart.html')

show(multi_var_plot)


## Creating Bar Charts

In [7]:
# create random data

x_ax = ['cat1', 'cat2', 'cat3', 'cat4', 'cat5']
y_ax = np.random.rand(5) * 10

# sort data

sorted_cat = sorted(x_ax, key=lambda x: y_ax[x_ax.index(x)], reverse=True)

# Create instance of the bar chart

bar_chart = figure(x_range=sorted_cat, title='Bar Chart', x_axis_label='X', y_axis_label='Y', plot_height=300)
# use vbar for vertical and hvar for horizontal 
bar_chart.vbar(x_ax, top=y_ax, color='blue', width=0.4)
bar_chart.y_range.start = 0
bar_chart.add_tools(HoverTool())
output_file('/bar_chart.html')
show(bar_chart)

### Stacked Bar Chart

In [8]:
# Stacked Bar Chart
df_stacked = pd.DataFrame({'y': [1, 2, 3, 4, 5],
                           'x1': [1,2, 4, 3, 4],
                           'x2' : [1, 4, 2, 2, 3]})

df_CDS_tacked = ColumnDataSource(df_stacked)

stacked_bar_chart = figure(plot_width=600, plot_height=300, title='Stacked Bar Chart')
stacked_bar_chart.hbar_stack(['x1', 'x2'],
                             y = 'y',
                             height = 0.8,
                             color = ('green', 'lightgreen'),
                             source=df_stacked
                             )
stacked_bar_chart.add_tools(HoverTool())
output_file('/stacked_bar_chart.html')
show(stacked_bar_chart)

## Bar Charts Using Grouped-by Data

In [9]:
from bokeh.core.properties import value
from bokeh.transform import dodge

# create some random data

categories = ['category1', 'category2', 'category3']

df_grouped = pd.DataFrame({'categories' : categories,
                           '2018' : [2, 1, 4],
                           '2019' : [5, 3, 3],
                           '2020' : [3, 2, 4]})

# create instance of a figure
bar_grouped = figure(x_range=categories, y_range = (0, 10), plot_height = 350)

# define position of bars on chart
dodge1 = dodge('categories', -0.25, range=bar_grouped.x_range)
dodge2 = dodge('categories', 0.0, range=bar_grouped.x_range)
dodge3 = dodge('categories', 0.25, range=bar_grouped.x_range)

bar_grouped.vbar(x=dodge1, top='2018', width=0.2, source=df_grouped, color='blue', legend_label='2018')
bar_grouped.vbar(x=dodge2, top='2019', width=0.2, source=df_grouped, color='green', legend_label='2019')
bar_grouped.vbar(x=dodge3, top='2020', width=0.2, source=df_grouped, color='red', legend_label='2020')

# configure legend

bar_grouped.legend.location = 'top_left'
bar_grouped.legend.orientation = 'horizontal'


bar_grouped.add_tools(HoverTool())
output_file('/grouped_bar_chart.html')
show(bar_grouped)


In [10]:
df_grouped.head()

Unnamed: 0,categories,2018,2019,2020
0,category1,2,5,3
1,category2,1,3,2
2,category3,4,3,4


In [11]:
print(bar_chart.x_range)

FactorRange(id='1554', ...)


## Stacked Area Chart


In [12]:
# create dummy data for the chart

df_area_stacked = pd.DataFrame({'x' : [1, 2, 3, 4, 5],
                                'y1' : [1, 3, 1, 4, 5],
                                'y2' : [1, 2, 3, 4, 2]})

stacked_area_chart = figure(plot_width=600, plot_height=300)

stacked_area_chart.varea_stack(['y1', 'y2'],
                               x = 'x',
                               color = ('coral', 'cadetblue'),
                               source = df_area_stacked)

output_file('/stacked_area_bar_chart.html')
show(stacked_area_chart)

In [13]:
# Scatter Plots
# Load the car dataset from vega
df_cars = vds.cars()
df_cars.head()

Unnamed: 0,Name,Miles_per_Gallon,Cylinders,Displacement,Horsepower,Weight_in_lbs,Acceleration,Year,Origin
0,chevrolet chevelle malibu,18.0,8,307.0,130.0,3504,12.0,1970-01-01,USA
1,buick skylark 320,15.0,8,350.0,165.0,3693,11.5,1970-01-01,USA
2,plymouth satellite,18.0,8,318.0,150.0,3436,11.0,1970-01-01,USA
3,amc rebel sst,16.0,8,304.0,150.0,3433,12.0,1970-01-01,USA
4,ford torino,17.0,8,302.0,140.0,3449,10.5,1970-01-01,USA


In [14]:
# Plot the car's weight against its mph

x_weight = df_cars.Weight_in_lbs
y_mph = df_cars.Miles_per_Gallon

# create an instance of the plot


scatter_plot = figure(title='Weight and MPH Comparison',plot_height= 400, plot_width=600, 
                      x_axis_label='Weight in pounds', y_axis_label='MPH')
scatter_plot.circle(x_weight, y_mph, size=10, line_color='gray', fill_color='blue', fill_alpha=0.5)
scatter_plot.add_tools(HoverTool())

output_file('/scatter_plot_1.html')
show(scatter_plot)


In [15]:
# Scatter plot for different categories

# load iris df from vega

df_iris = vds.iris()
df_iris.head()

Unnamed: 0,sepalLength,sepalWidth,petalLength,petalWidth,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [16]:
from bokeh.transform import factor_cmap, factor_mark

# load data
species = ['setosa', 'versicolor', 'virginica']
markers = ['hex', 'cross', 'triangle']

sub_scatter = figure(plot_width=600,
                     plot_height=400, 
                     title='Iris Scatter Plot',
                     x_axis_label='Petal Length',
                     y_axis_label='Petal Width')

sub_scatter.scatter(x='petalLength',
                    y='petalWidth',
                    source=df_iris,
                    legend_label='species',
                    fill_alpha=0.5,
                    size=15,
                    color=factor_cmap(field_name='species', palette='Dark2_3', factors=species),
                    marker=factor_mark('species', markers, species))

sub_scatter.legend.location="top_left"
output_file('/sub_scatter.html')
show(sub_scatter)

In [17]:
df_cars.head()

Unnamed: 0,Name,Miles_per_Gallon,Cylinders,Displacement,Horsepower,Weight_in_lbs,Acceleration,Year,Origin
0,chevrolet chevelle malibu,18.0,8,307.0,130.0,3504,12.0,1970-01-01,USA
1,buick skylark 320,15.0,8,350.0,165.0,3693,11.5,1970-01-01,USA
2,plymouth satellite,18.0,8,318.0,150.0,3436,11.0,1970-01-01,USA
3,amc rebel sst,16.0,8,304.0,150.0,3433,12.0,1970-01-01,USA
4,ford torino,17.0,8,302.0,140.0,3449,10.5,1970-01-01,USA


## Creating Subplots

To create subplots we would need to import `gridplot` from `bokeh.layouts`.

In [18]:
from bokeh.layouts import gridplot

# load data from the cars dataset

subplot_x1 = df_cars['Acceleration']; subplot_y1 = df_cars['Miles_per_Gallon']
subplot_x2 = df_cars['Cylinders']; subplot_y2 = df_cars['Miles_per_Gallon']
subplot_x3 = df_cars['Horsepower']; subplot_y3 = df_cars['Miles_per_Gallon']
subplot_x4 = df_cars['Weight_in_lbs']; subplot_y4 = df_cars['Miles_per_Gallon']




# create an instance for every subplot
subplot1 = figure(title='Acceleration & MPH Comparison', plot_width = 300, plot_height=300, x_axis_label='Acceleration', y_axis_label='MPH')
subplot2 = figure(title='Cylinders & MPH Comparison', plot_width = 300, plot_height=300, x_axis_label='Cylinders', y_axis_label='MPH')
subplot3 = figure(title='Horsepower & MPH Comparison', plot_width = 300, plot_height=300, x_axis_label='Horsepower', y_axis_label='MPH')
subplot4 = figure(title='Weight/lbs & MPH Comparison', plot_width = 300, plot_height=300, x_axis_label='Weight / lbs', y_axis_label='MPH')

# create the plots
subplot1.circle(subplot_x1, subplot_y1, color='red')
subplot2.circle(subplot_x2, subplot_y2, color='blue')
subplot3.circle(subplot_x3, subplot_y3, color='green')
subplot4.circle(subplot_x4, subplot_y4, color='gray')

# create the grid
grid = gridplot([subplot1, subplot2, subplot3, subplot4], ncols=2)

output_file('/subplots_grid.html')
# show grid
show(grid)



## Linked Plots

In Bokeh, we can link multiple plots so they can be manipulated as one. We would reuse the same grid plot above and link the first three plots to show this concept.  

In [19]:
from bokeh.layouts import gridplot

# load data from the cars dataset

linked_plotx1 = df_cars['Acceleration']; linked_ploty1 = df_cars['Miles_per_Gallon']
linked_plotx2 = df_cars['Cylinders']; linked_ploty2 = df_cars['Miles_per_Gallon']
linked_plotx3 = df_cars['Horsepower']; linked_ploty3 = df_cars['Miles_per_Gallon']
linked_plotx4 = df_cars['Weight_in_lbs']; linked_ploty4 = df_cars['Miles_per_Gallon']


# create an instance for every subplot
linked_plot1 = figure(title='Acceleration & MPH Comparison', plot_width = 300, plot_height=300, 
                      x_axis_label='Acceleration', y_axis_label='MPH')
linked_plot2 = figure(title='Cylinders & MPH Comparison', plot_width = 300, plot_height=300, 
                      x_axis_label='Cylinders', y_axis_label='MPH', x_range=linked_plot1.x_range, y_range=linked_plot1.y_range)
linked_plot3 = figure(title='Horsepower & MPH Comparison', plot_width = 300, plot_height=300, 
                      x_axis_label='Horsepower', y_axis_label='MPH', x_range=linked_plot1.x_range, y_range=linked_plot1.y_range)
linked_plot4 = figure(title='Weight/lbs & MPH Comparison', plot_width = 300, plot_height=300, 
                      x_axis_label='Weight / lbs', y_axis_label='MPH')

# create the plots
linked_plot1.circle(linked_plotx1, linked_ploty1, color='red')
linked_plot2.circle(linked_plotx2, linked_ploty2, color='blue')
linked_plot3.circle(linked_plotx3, linked_ploty3, color='green')
linked_plot4.circle(linked_plotx4, linked_ploty4, color='gray')

# create the grid
linked_grid_plot = gridplot([linked_plot1, linked_plot2, linked_plot3, linked_plot4], ncols=2)

output_file('/linked_grid_plot.html')
# show grid
show(linked_grid_plot)


## Adding Lasso and Box Select to Linked Plots

We can explore our data further by adding a Lasso and Box select tools to our linked plots. For this example, we will work with a weather dataset which contains temperature readings as well as the weather conditions. 

In [20]:
# load Seattle weather data from vega

df_weather = vds.seattle_weather()
df_weather.head()

Unnamed: 0,date,precipitation,temp_max,temp_min,wind,weather
0,2012-01-01,0.0,12.8,5.0,4.7,drizzle
1,2012-01-02,10.9,10.6,2.8,4.5,rain
2,2012-01-03,0.8,11.7,7.2,2.3,rain
3,2012-01-04,20.3,12.2,5.6,4.7,rain
4,2012-01-05,1.3,8.9,2.8,6.1,rain


In [25]:
from bokeh.transform import factor_cmap, factor_mark

TOOLS = 'box_select, lasso_select, reset, wheel_zoom, pan' 

weather_conditions = ['drizzle', 'rain', 'sun', 'snow', 'fog']
weather_markers = ['hex', 'cross', 'triangle', 'square', 'circle_x']

# create a ColumnDataSource

cds_weather = ColumnDataSource(df_weather)

# create 1st scatter plot
weather_plot = figure(plot_width=900, plot_height=400, x_axis_type='datetime',
                      y_axis_label='Temperature', tools=TOOLS)
weather_plot.circle('date', 'temp_max', size=10, fill_alpha=0.2, source=cds_weather)

# create 2nd scatter plot for the detailed weather data
weather_detail = figure(plot_width=900, plot_height=400, x_axis_type='datetime',
                      y_axis_label='Weather Conditions', tools=TOOLS)
weather_detail.scatter('date', 'temp_max', size=10, fill_alpha=0.2, source=cds_weather,
                     color=factor_cmap(field_name='weather', palette='Dark2_5',
                     factors=weather_conditions),
                     marker=factor_mark('weather', weather_markers, weather_conditions),
                     legend_group='weather')


weather_plot.add_tools(HoverTool(tooltips=[('date', '@date{"%F"}'), ('temp', '@temp_max')],
                      formatters={'@date': 'datetime'}))
weather_detail.add_tools(HoverTool(tooltips=[('date', '@date{"%F"}'), ('condition', '@weather')],
                                   formatters={'@date': 'datetime'}))

# configure legend

weather_detail.legend.location='top_left'
weather_detail.legend.orientation='horizontal'


#create grid
weather_grid = gridplot([[weather_plot], [weather_detail]])
show(weather_grid)
output_file('/linked_lasso_box_grid_plot.html')




## Creating a Color Bar Graph

Color Bars can quick help us understand our data as the color palette changes based on the magnitude/intensity of the values. 
We can reuse the car dataset to illustrate this concept by graphing the relationship between MPG/Horsepoer and the car's weight.


In [26]:
df_cars.head()

Unnamed: 0,Name,Miles_per_Gallon,Cylinders,Displacement,Horsepower,Weight_in_lbs,Acceleration,Year,Origin
0,chevrolet chevelle malibu,18.0,8,307.0,130.0,3504,12.0,1970-01-01,USA
1,buick skylark 320,15.0,8,350.0,165.0,3693,11.5,1970-01-01,USA
2,plymouth satellite,18.0,8,318.0,150.0,3436,11.0,1970-01-01,USA
3,amc rebel sst,16.0,8,304.0,150.0,3433,12.0,1970-01-01,USA
4,ford torino,17.0,8,302.0,140.0,3449,10.5,1970-01-01,USA


In [35]:
from bokeh.models import LinearColorMapper, ColorBar
from bokeh.transform import transform

# map values to a color pallette
color_mapper = LinearColorMapper(palette="Cividis256", low=df_cars.Weight_in_lbs.min(), 
                                 high=df_cars.Weight_in_lbs.max())

# create instance of the plot

color_bar_plot = figure(plot_width=800, plot_height=600,
                        x_axis_label="Horsepower", 
                        y_axis_label="MPG")

color_bar_plot.circle(x="Horsepower",
                      y="Miles_per_Gallon",
                      source=df_cars,
                      color=transform('Weight_in_lbs', color_mapper),
                      size=10,
                      alpha=0.5)

# add Hovertool 
color_bar_plot.add_tools(HoverTool(tooltips=[('MPG', '@Miles_per_Gallon'), ('Horsepower', '@Horsepower')]))


# display the colors in the bar based on the color mapper
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=10, 
                     location=(0, 0), title="Weight")
color_bar_plot.add_layout(color_bar, 'right')

show(color_bar_plot)
output_file('/color_bar_plot.html')