# Working with Bokeh
Bokeh is a Python library for creating powerful and interactive visualizations for modern web browsers. It helps you build beautiful graphics, ranging from simple plots to complex dashboards with streaming datasets. With Bokeh, you can create JavaScript-powered visualizations without writing any JavaScript yourself.

In [1]:
import pandas as pd
import numpy as np

from bokeh.io import output_file
from bokeh.layouts import layout
from bokeh.plotting import figure, show

from bokeh.models import ColumnDataSource
from bokeh.models import Div, RangeSlider, Spinner

In [2]:
#!pip install jupyter_bokeh

In [3]:
# Import Llibraries and open 
from bokeh.io import output_notebook # enables plot interface in Jupyter notebook
output_notebook()

## Create a Simple Line Graph from years sales data

In [4]:
# Create a Single line Graph with year and sales data

# prepare data
sales = [29909, 31956, 34527, 37520, 38945, 42904]
years = [2016, 2017, 2018, 2019, 2020, 2021]

# create a new plot with a title and axis labels
p = figure(title="Simple line example", x_axis_label="Year", y_axis_label="Sales")

# add a line renderer with legend and line thickness
p.line(years, sales, legend_label="Sales Trend", line_width=3)

# display the plot
show(p)

In [5]:
# prepare multi-city sales data - represented by a multi-line graph

sales_city1 = [29909, 31956, 34527, 37520, 38945, 42904]
sales_city2 = [23112, 24324, 25646, 25879, 26342, 26903]
sales_city3 = [32110, 35319, 37459, 38784, 38765, 37632]

years = [2016, 2017, 2018, 2019, 2020, 2021]

# create a new plot with a title and axis labels
p = figure(title="Multiple line example", x_axis_label='Year', y_axis_label='Sales')

# add multiple renderers
p.line(years, sales_city1, legend_label="City 1", line_color="blue", line_width=2)
p.line(years, sales_city2, legend_label="City 2", line_color="red", line_width=2)
p.line(years, sales_city3, legend_label="City 3", line_color="green", line_width=2)

show(p)

### Working with Bar, Cicle and Line Graphs together

In [6]:
# prepare multi-city sales data - represented by three different types

sales_city1 = [29909, 31956, 34527, 37520, 38945, 42904]
sales_city2 = [23112, 24324, 25646, 25879, 26342, 26903]
sales_city3 = [32110, 35319, 37459, 38784, 38765, 37632]

years = [2016, 2017, 2018, 2019, 2020, 2021]

# create a new plot with a title and axis labels
p = figure(title="Multiple glyphs example", x_axis_label="x", y_axis_label="y")

# add multiple renderers
p.line(years, sales_city1, legend_label="City 1", line_color="blue", line_width=2)
p.vbar(x=years, top=sales_city2, legend_label="City 2", width=0.5, bottom=0, color="red")
p.circle(years, sales_city3, legend_label="City 3", line_color="yellow", size=10)

# show the results
show(p)

### Customising your Plots
We will see how to custpoise various aspects of our graphs.

In [7]:
from bokeh.io import curdoc

# Change the style - options are - caliber, dark_minimal, light_minimal, night_sky, and contrast
curdoc().theme = "caliber"

In [8]:
p = figure(
    title="Plot responsive sizing example",
    sizing_mode="stretch_width",
    plot_height=400,
    plot_width=800,
    x_axis_label="Years",
    y_axis_label="Sales",
)

# add circle renderer
circle = p.circle(years, sales_city1, fill_color="red", size=10)

# change some things about the x-axis
p.xaxis.axis_label = "Years"
p.xaxis.axis_line_width = 1
p.xaxis.axis_line_color = "red"

# change some things about the y-axis
p.yaxis.axis_label = "Sales"
p.yaxis.major_label_text_color = "orange"
p.yaxis.major_label_orientation = "vertical"

# show the results
show(p)

## Vectorising Glyph Properties
We will use vectors of data to influence aspects of your plot and its elements.

In [9]:
# Use the GapMinder Dataset - publicly available
# 'https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/gapminderData.csv'

gmdata = pd.read_csv('gapminderData.csv')
gmdata

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap
0,Afghanistan,1952,8425333.0,Asia,28.801,779.445314
1,Afghanistan,1957,9240934.0,Asia,30.332,820.853030
2,Afghanistan,1962,10267083.0,Asia,31.997,853.100710
3,Afghanistan,1967,11537966.0,Asia,34.020,836.197138
4,Afghanistan,1972,13079460.0,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418.0,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340.0,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948.0,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563.0,Africa,39.989,672.038623


In [10]:
# Filter all records from 2007 [Each country has one record]

latestrecs = gmdata.loc[gmdata['year']==2007]
latestrecs

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap
11,Afghanistan,2007,31889923.0,Asia,43.828,974.580338
23,Albania,2007,3600523.0,Europe,76.423,5937.029526
35,Algeria,2007,33333216.0,Africa,72.301,6223.367465
47,Angola,2007,12420476.0,Africa,42.731,4797.231267
59,Argentina,2007,40301927.0,Americas,75.320,12779.379640
...,...,...,...,...,...,...
1655,Vietnam,2007,85262356.0,Asia,74.249,2441.576404
1667,West Bank and Gaza,2007,4018332.0,Asia,73.422,3025.349798
1679,"Yemen, Rep.",2007,22211743.0,Asia,62.698,2280.769906
1691,Zambia,2007,11746035.0,Africa,42.384,1271.211593


In [11]:
# Create a circe/Bubble Plot - plotting circles along the GDP and Life Expectancy values 
# Bubble sizes representing population ratios

# create a new plot with a specific size
p = figure(
    title="Vectorized colors and radii example",
    plot_height=400,
    plot_width=800,
)

# add circle renderer
p.circle(
    latestrecs.gdpPercap,               # X-axis is GDP Per capita
    latestrecs.lifeExp,                 # Y-Axis is Life expectancy
    radius=latestrecs['pop']/500000,    # Determine Bubble size
    fill_alpha=0.8,
    line_color="red",
)

# show the results
show(p)

## Combining Multiple Plots using Layouts 

In [12]:
# In this example, we will combine three separate plots on the same canvas. 
# This is similar to having three separate subplots.  

# prepare some data
sales_city1 = [29909, 31956, 34527, 37520, 38945, 42904]
sales_city2 = [23112, 24324, 25646, 25879, 26342, 26903]
sales_city3 = [32110, 35319, 37459, 38784, 38765, 37632]

years = [2016, 2017, 2018, 2019, 2020, 2021]

# create three plots with one renderer each
s1 = figure(plot_width=250, plot_height=250, background_fill_color="#fafafa")
s1.circle(years, sales_city1, size=12, color="#53777a", alpha=0.8)   # Create a circle renderer using city 1 sales

s2 = figure(plot_width=250, plot_height=250, background_fill_color="#fafafa")
s2.triangle(years, sales_city2, size=12, color="#c02942", alpha=0.8)  # Create a Traingle renderer using city 2 sales

s3 = figure(plot_width=250, plot_height=250, background_fill_color="#fafafa")
s3.square(years, sales_city3, size=12, color="#d95b43", alpha=0.8)    # Create a Square renderer using city 3 sales

# Combining the plots in a Row Layout
from bokeh.layouts import row
show(row(s1, s2, s3))

In [13]:
# Combining the plots in a Column Layout

from bokeh.layouts import column
show(column(s1, s2, s3))

In [14]:
# Combining the plots in a Column Layout

from bokeh.layouts import gridplot
grid = gridplot([[s1, None, s2], [None, s3, None]], plot_width=300, plot_height=300)

show(grid)

### Using ColumnDataSource object to import and filter data

In [15]:
# The ColumnDataSource is Bokeh's own internal data structure similar to pandas and numpy data structures.
# ColumnDataSource supports a number of functions and interoperability with numpy and pandas. 

from bokeh.models import ColumnDataSource

# create dict as basis for ColumnDataSource

data = {'sales_city3': [32110, 35319, 37459, 38784, 38765, 37632], 
        'years': [2016, 2017, 2018, 2019, 2020, 2021]}

# create ColumnDataSource based on dict
source = ColumnDataSource(data=data)

# create a plot and renderer with ColumnDataSource data
p = figure(plot_width=250, plot_height=250, background_fill_color="#fafafa")
p.circle(x='years', y='sales_city3', source=source)
show(p)

## We can also convert DataFrame or Series objects into ColumnDataSource
### Explore the use of Views and Filters on ColumnDataSource object

In [16]:
gmdata = pd.read_csv('gapminderData.csv')
latestrecs = gmdata[(gmdata['continent']=='Europe') & (gmdata['year']==2007)]
latestrecs

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap
23,Albania,2007,3600523.0,Europe,76.423,5937.029526
83,Austria,2007,8199783.0,Europe,79.829,36126.4927
119,Belgium,2007,10392226.0,Europe,79.441,33692.60508
155,Bosnia and Herzegovina,2007,4552198.0,Europe,74.852,7446.298803
191,Bulgaria,2007,7322858.0,Europe,73.005,10680.79282
383,Croatia,2007,4493312.0,Europe,75.748,14619.22272
407,Czech Republic,2007,10228744.0,Europe,76.486,22833.30851
419,Denmark,2007,5468120.0,Europe,78.332,35278.41874
527,Finland,2007,5238460.0,Europe,79.313,33207.0844
539,France,2007,61083916.0,Europe,80.657,30470.0167


In [17]:
# Convert the latestrecs (pands DataFrame) into a ColumnDataSource

gm_cds = ColumnDataSource(latestrecs)

### Working with IndexFilter

#### The IndexFilter is the simplest filter type. It has an indices property, which is a list of integers that are the indices of the data you want to include in your plot.
Bokeh uses a concept called “view” to select subsets of data. Views are represented by Bokeh’s CDSView class. When you use a view, you can use one or more filters to select specific data points without changing the underlying data. 

### In the following example, we are going to look at how to plot a Circle graph from a ColumnDataSource object and also use a Row Filter using the IndexFilter.
Display only the first 8 rows from the dataset

In [18]:
# We apply a row filter (IndexFilter) and create a view (subset) from the orginal ColumndataSource.
# Note that a number of different types of Filters such as IndexFilter, BooleanFilter and Group filter are
# available to be operated on ColumnDataSource objects

from bokeh.layouts import gridplot
from bokeh.models import CDSView, IndexFilter

# create a view using an IndexFilter with the ROW index positions [0-7]
view = CDSView(source=gm_cds, filters=[IndexFilter([0, 1, 2, 3, 4, 5, 6, 7])])

# setup tools
tools = ["box_select", "hover", "reset"]

# create a first plot with all data in the ColumnDataSource
p1 = figure(plot_height=300, plot_width=300, tools=tools)
p1.circle(x="gdpPercap", y="lifeExp", size=4, hover_color="blue", source=gm_cds)

# create a second plot with a subset of ColumnDataSource, based on view
p1_filtered = figure(plot_height=300, plot_width=300, tools=tools)
p1_filtered.circle(x="gdpPercap", y="lifeExp", size=4, hover_color="red", source=gm_cds, view=view)

# show both plots next to each other in a gridplot layout
show(gridplot([[p1, p1_filtered]]))

### Working with BooleanFilter
A BooleanFilter selects rows from a data source using a list of True or False values in its booleans property.

### We want to display only the plots where Life Expectancy > 75

In [19]:
# We now apply a Boolean filter (BooleanFilter) and create a view (subset) from the orginal ColumndataSource.

from bokeh.models import BooleanFilter

# create a view using a BooleanFilter for all values of Life Expectancy > 75
booleans = [True if y_val > 75 else False for y_val in gm_cds.data['lifeExp']]
view = CDSView(source=gm_cds, filters=[BooleanFilter(booleans)])

# setup tools
tools = ["box_select", "hover", "reset"]

# create a first plot with all data in the ColumnDataSource
p1 = figure(plot_height=300, plot_width=300, tools=tools)
p1.circle(x="gdpPercap", y="lifeExp", size=4, hover_color="blue", source=gm_cds)

# create a second plot with a subset of ColumnDataSource, based on view
p1_filtered = figure(plot_height=300, plot_width=300, tools=tools)
p1_filtered.circle(x="gdpPercap", y="lifeExp", size=4, hover_color="blue", source=gm_cds, view=view)

# show both plots next to each other in a gridplot layout
show(gridplot([[p1, p1_filtered]]))

### Working with GroupFilter
The GroupFilter is a filter for categorical data. With this filter, you can select rows from a dataset that are members of a specific category.

In [20]:
gmdata

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap
0,Afghanistan,1952,8425333.0,Asia,28.801,779.445314
1,Afghanistan,1957,9240934.0,Asia,30.332,820.853030
2,Afghanistan,1962,10267083.0,Asia,31.997,853.100710
3,Afghanistan,1967,11537966.0,Asia,34.020,836.197138
4,Afghanistan,1972,13079460.0,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418.0,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340.0,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948.0,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563.0,Africa,39.989,672.038623


In [21]:
gmdata_2007 = gmdata[(gmdata['year']==2007)]
gm_cds1 = ColumnDataSource(gmdata_2007)
gmdata_2007

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap
11,Afghanistan,2007,31889923.0,Asia,43.828,974.580338
23,Albania,2007,3600523.0,Europe,76.423,5937.029526
35,Algeria,2007,33333216.0,Africa,72.301,6223.367465
47,Angola,2007,12420476.0,Africa,42.731,4797.231267
59,Argentina,2007,40301927.0,Americas,75.320,12779.379640
...,...,...,...,...,...,...
1655,Vietnam,2007,85262356.0,Asia,74.249,2441.576404
1667,West Bank and Gaza,2007,4018332.0,Asia,73.422,3025.349798
1679,"Yemen, Rep.",2007,22211743.0,Asia,62.698,2280.769906
1691,Zambia,2007,11746035.0,Africa,42.384,1271.211593


### We want to display the GDP-Life Expectancy relationship in al coutries in Americas

In [22]:
# We now apply a Group filter (BooleanFilter) and create a view (subset) from the orginal ColumndataSource.

from bokeh.models import GroupFilter

# create a view using a GroupFilre for all values of continent = Americas

view1 = CDSView(source=gm_cds1, filters=[GroupFilter(column_name='continent', group='Americas')])

# setup tools
tools = ["box_select", "hover", "reset"]

# create a first plot with all data in the ColumnDataSource
p1 = figure(plot_height=300, plot_width=300, tools=tools)
p1.circle(x="gdpPercap", y="lifeExp", size=4, hover_color="blue", source=gm_cds1)

# create a second plot with a subset of ColumnDataSource, based on view
p1_filtered = figure(plot_height=300, plot_width=300, tools=tools)
p1_filtered.circle(x="gdpPercap", y="lifeExp", size=4, hover_color="blue", source=gm_cds1, view=view1)

# show both plots next to each other in a gridplot layout
show(gridplot([[p1, p1_filtered]]))

## Handling Categorical data

In [23]:
# Plot a bar graph of city-wise sales numbers

from bokeh.palettes import Spectral6

city = ['Chicago', 'Houston', 'Columbus', 'Seattle', 'Austin', 'Boston']
sales = [32110, 35319, 37459, 38784, 38765, 37632]

source = ColumnDataSource(data=dict(city=city, sales=sales, color=Spectral6))

p = figure(x_range=city, y_range=(0, 50000), plot_height=250, title="City-Sales",
           toolbar_location=None, tools="")

p.vbar(x='city', top='sales', width=0.6, color='color', legend_field="city", source=source)

p.xgrid.grid_line_color = None
p.legend.orientation = "horizontal"
p.legend.location = "top_left"

show(p)

## Working with Widgets

### In the following example, we will see how to create a circle-plot and control the X-Axis scale and circle sizes through Spinner and Range Slider Widgets

In [24]:
# Use the GapMinder Dataset - publicly available
# 'https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/gapminderData.csv'

gmdata = pd.read_csv('gapminderData.csv')
gmdata

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap
0,Afghanistan,1952,8425333.0,Asia,28.801,779.445314
1,Afghanistan,1957,9240934.0,Asia,30.332,820.853030
2,Afghanistan,1962,10267083.0,Asia,31.997,853.100710
3,Afghanistan,1967,11537966.0,Asia,34.020,836.197138
4,Afghanistan,1972,13079460.0,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418.0,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340.0,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948.0,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563.0,Africa,39.989,672.038623


In [25]:
# Filter all records from 2007 [Each country has one record]

latestrecs = gmdata.loc[gmdata['year']==2007]
latestrecs

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap
11,Afghanistan,2007,31889923.0,Asia,43.828,974.580338
23,Albania,2007,3600523.0,Europe,76.423,5937.029526
35,Algeria,2007,33333216.0,Africa,72.301,6223.367465
47,Angola,2007,12420476.0,Africa,42.731,4797.231267
59,Argentina,2007,40301927.0,Americas,75.320,12779.379640
...,...,...,...,...,...,...
1655,Vietnam,2007,85262356.0,Asia,74.249,2441.576404
1667,West Bank and Gaza,2007,4018332.0,Asia,73.422,3025.349798
1679,"Yemen, Rep.",2007,22211743.0,Asia,62.698,2280.769906
1691,Zambia,2007,11746035.0,Africa,42.384,1271.211593


In [26]:
# We will plot the Life Expectancy and GDP Per Capita Data

x = latestrecs['lifeExp']
y = latestrecs['gdpPercap']

output_file("gapminder-with-widgets.html")

p = figure(x_range=(1,len(x)), plot_width=500, plot_height=250)
points = p.circle(x=x, y=y, size=7, fill_color="#21a7df")

In [27]:
len(x)

142

In [28]:
# Create a 'Div' Element for a Caption for the Caption

div = Div(
    text="""
        <p>Select the circle's size using this control element:</p>
        """,
    width=200,
    height=30,)

# Create a Spinner Object to control the circle size

spinner = Spinner(
    title="Circle size",  # a string to display above the widget
    low=0,  # the lowest possible number to pick
    high=60,  # the highest possible number to pick
    step=5,  # the increments by which the number can be adjusted
    value=points.glyph.size,  # the initial value to display in the widget
    width=200,)  #  the width of the widget in pixels
    
spinner.js_link("value", points.glyph, "size")

# Create a Range_Slider object to control the x-axis range

range_slider = RangeSlider(
    title="Adjust x-axis range", # a title to display above the slider
    start=0,  # set the minimum value for the slider
    end=180,  # set the maximum value for the slider
    step=1,  # increments for the slider
    value=(p.x_range.start, p.x_range.end),  # initial values for slider
    )
    
range_slider.js_link("value", p.x_range, "start", attr_selector=0)
range_slider.js_link("value", p.x_range, "end", attr_selector=1)

In [29]:
from bokeh.layouts import layout
layout = layout([
    [div, spinner],
    [range_slider],
    [p],])

show(layout)

## Working with Google Maps
To plot glyphs over a Google Map, we use the function gmap(). For the gmap() function to work, we must pass it a Google API Key and configure the Google Map underlay GMapOptions. The Google API Key will be stored in the Bokeh Document JSON.
In the following example, however, we chose not to pass an API key, that works albeit with watermark on the map.

### In the following example, we will look into how we can bring up the Google map and plot specific locations using lattitude and longitudes values

In [30]:
# Using Google Maps API using gmap() method.
# Plot 4 specific locations in London using lattitude and logitude values
# The Map API shows watermark due to not supplying an API-key. 
# For proper output, you have to supply a google account API key to the gmap() method - first parameter. 

from bokeh.models import  GMapOptions
from bokeh.plotting import gmap

output_file("london.html")

map_options = GMapOptions(lat=51.5074, lng=-0.1278, map_type="roadmap", zoom=9)

# For GMaps to function, Google requires you obtain and enable an API key:
#
#     https://developers.google.com/maps/documentation/javascript/get-api-key
#
# Replace the value below with your personal API key:

p = gmap("", map_options, title="London")

source = ColumnDataSource(
    data=dict(lat=[51.697301,  51.688377,  51.293971, 51.334749],
              lon=[-0.444855, -0.098305, -0.297686, -0.154442])
)

p.circle(x="lon", y="lat", size=15, fill_color="red", fill_alpha=0.6, source=source)

show(p)

# To Do's in Future Version
## Annotation
## Bokeh Server
## WebGL
## Interactive Plots in Detail
## Exporting Plots
## AjaxDataSource