## Interactive Graphs using Bokeh

In [4]:
from bokeh.plotting import figure
from bokeh.io import show, output_notebook
from bokeh.plotting import figure, show, output_notebook, output_file
from bokeh.models import ColumnDataSource, HoverTool, CategoricalColorMapper, Panel
from bokeh.application.handlers import FunctionHandler
from bokeh.application import Application
from bokeh.models.widgets import CheckboxGroup, Slider, RangeSlider, Tabs, CheckboxButtonGroup
from bokeh.layouts import column, row, WidgetBox
from scipy.stats import gaussian_kde
import numpy as np
import pandas as pd
import yfinance as yf

In this Example Class, we are going to demonstrate how to introduce more interactivities to a graphical output. In particular, the module Bokeh is considered here because this module Bokeh, just like plotly, is capable of generating interactive data visualization easily without the knowledge of the Javasript language. Also, the module can handle the visualization of various data types and huge amount of data. Finally, a web application can be developed easily using the service Bokeh server without any charge.

To install this module from Anaconda, we enter

conda install -c anaconda bokeh

To illustrate the basic usage of Bokeh, we extract some simple examples from its official documentation. In the following, the first example is simple scatter plot with toy data.

In [5]:
output_notebook()

# Create a blank figure with labels
p = figure(plot_width = 600, plot_height = 600, 
           title = 'Example Glyphs',
           x_axis_label = 'X', y_axis_label = 'Y')

# Example data
squares_x = [1, 3, 4, 5, 8]
squares_y = [8, 7, 3, 1, 10]
circles_x = [9, 12, 4, 3, 15]
circles_y = [8, 4, 11, 6, 10]

# Add squares glyph
p.square(squares_x, squares_y, size = 12, color = 'navy', alpha = 0.6)
# Add circle glyph
p.circle(circles_x, circles_y, size = 12, color = 'red')

# Set to output the plot in the notebook
# Show the plot
show(p)

The second example is the generation of line plot of mathemtical functions. Note that the horizontal axis is expressed in terms of the natural logarithm value. Also, the icons in the toolbar are customized as well.

In [6]:
# prepare some data
x = [0.1, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0]
y0 = [i**2 for i in x]
y1 = [10**i for i in x]
y2 = [10**(i**2) for i in x]

# output to static HTML file
# output_file("log_lines.html")

# create a new plot
p = figure(
   tools = "pan,box_zoom,reset,save",
   y_axis_type = "log", y_range = [0.001, 10**11], title = "log axis example",
   x_axis_label = 'sections', y_axis_label='particles'
)

# add some renderers
p.line(x, x, legend="y=x")
p.circle(x, x, legend="y=x", fill_color="white", size=8)
p.line(x, y0, legend="y=x^2", line_width=3)
p.line(x, y1, legend="y=10^x", line_color="red")
p.circle(x, y1, legend="y=10^x", fill_color="red", line_color="red", size=6)
p.line(x, y2, legend="y=10^x^2", line_color="orange", line_dash="4 4")

# show the results
show(p)

### Construct time series plot of stock price data (AAPL)

In the following, the time series of stock price of AAPL is downloaded from Yahoo Finance. In order to provide more information to the time series plot, the 30-day moving average is also calculated here using convolve() method in numpy.

In [7]:
# prepare some data
start = '2016-01-01'
end = '2019-09-18'
stockcode = 'AAPL'
stock = yf.download(stockcode,start,end)
plot_val = np.array(stock['Adj Close'])
plot_date = np.array(stock.index, dtype=np.datetime64)

win_size = 30
window = np.ones(win_size)/float(win_size)
plot_avg = np.convolve(plot_val, window, 'same')

# create a new plot with a a datetime axis type
p = figure(plot_width=800, plot_height=600, 
           x_axis_type="datetime",
           toolbar_location=None)
# add renderers
p.circle(plot_date[win_size:-win_size], plot_val[win_size:-win_size], size=4, color='darkgrey', alpha=0.2, legend='Close')
p.line(plot_date[win_size:-win_size], plot_avg[win_size:-win_size], color='navy', legend=str(win_size)+'-Day Average')

# NEW: customize by setting attributes
p.title.text = stockcode + ' One-Month Average'
p.legend.location = "top_left"
p.grid.grid_line_alpha = 0
p.xaxis.axis_label = 'Date'
p.yaxis.axis_label = 'Price'
p.ygrid.band_fill_color = "olive"
p.ygrid.band_fill_alpha = 0.1

# show the results
show(p)

[*********************100%***********************]  1 of 1 downloaded


### Construct a histogram plot for a stock (IBM)

To construct an histogram in Bokeh, the histogram() method in numpy is required here to compute the variables close_hist and edges in the following example. In particular, the variable close_hist stores the frequency count of each bin in the histogram while the variable edge stores the boundaries of bins in the graph. Then, the method quad() is applied to generate the histogram.

In [8]:
# prepare some data
start = '2010-01-01'
end = '2019-09-20'
stockcode = 'IBM'
stock = yf.download(stockcode,start,end)
bin_size = 5
plot_val = stock['Adj Close']
range_start = min(plot_val)
range_end = max(plot_val)

close_hist, edges = np.histogram(plot_val, 
                                 bins = int(180/bin_size), 
                                 range = [range_start, range_end])
plot_dat = pd.DataFrame({'price': close_hist, 
                         'left': edges[:-1], 
                         'right': edges[1:]})

# create a new plot
p = figure(plot_width = 600, plot_height = 600, 
           background_fill_color = "grey",
           background_fill_alpha = 0.3,
           title = 'Histogram of Stock Price',
           x_axis_label = "Stock Price",
           y_axis_label = "Freqency Count")

# add renderer
p.quad(bottom=0, top=plot_dat['price'], 
       left=plot_dat['left'], right=plot_dat['right'], alpha=0.8, 
       fill_color="darkgreen", line_color='black')

# show the results
show(p)

[*********************100%***********************]  1 of 1 downloaded


### Add interactivity to the histogram

To introduce interactivity to the historgram, the following data stored in the pandas DataFrame has to be converted by ColumnDataSource whose data atrribute is actually a dictionary. Specficially,

source = ColumnDataSource(data=plot_dat)

When the variables specified in the quad() method, the variables are specified in the form of strings.

When the mouse pointer is moved around the histgram, one may want to show additional information of histogram, e.g. the percentage of observations falling withion this bin or the price in this bin. Actually, the method HoverTool() is used to construct the tooltips and the information in the tooltips is specified in form of list. In the following example, the f_price, which is the average of left and right boundaries of a bin, is constructed in the ColumnDataSource.data. In addition, format strings are allowed to specify for the variables in the tooltips. Finally, the object by HoverTool() method must be added to the histogram by add_tools() method.

In [9]:
# prepare some data
start = '2010-01-01'
end = '2019-09-20'
stockcode = 'IBM'
stock = yf.download(stockcode,start,end)
bin_size = 5
plot_val = stock['Adj Close']
range_start = min(plot_val)
range_end = max(plot_val)

close_hist, edges = np.histogram(plot_val, 
                                 bins = int(180/bin_size), 
                                 range = [range_start, range_end])
plot_dat = pd.DataFrame({'prop': close_hist/np.sum(close_hist), 
                         'left': edges[:-1], 
                         'right': edges[1:]})

# Convert the DataFrame to ColumnDataSource whose data attribute is a dict
source = ColumnDataSource(data=plot_dat)

# create a new plot
p = figure(plot_width = 600, plot_height = 600, 
           background_fill_color = "grey",
           background_fill_alpha = 0.3,
           title = 'Histogram of Stock Price',
           x_axis_label = "Stock Price",
           y_axis_label = "Relative Freqency")

# add renderer
p.quad(source=source, bottom=0, top='prop', 
       left='left', right='right', alpha=0.8,
       fill_color="darkgreen", line_color='black')

# add hover tool
source.data['f_price'] = [(left+right)/2 for left, right in zip(plot_dat['left'],plot_dat['right'])]
hover = HoverTool(tooltips = [('Price', '@f_price{0.2f}'),
                             ('Percentage', '@prop{%0.2f}')])

p.add_tools(hover)
# show the results
show(p)

[*********************100%***********************]  1 of 1 downloaded


### Construct An Interactive Density Plot

To construct dennsity plot in Bokeh, the kernel density estimation feature is imported from the scipy module. In the following illustration, the Gaussian kernel density function with fixed bandwidth of 1 is considered. Here, the lists of x-values and y-values are constructed by the numpy module. Also, ColumnDataSource() method is used if we want to generate an interactive density plot. The multi_line() method is used to generate the line plot of estiamted kernel density.

In [10]:
# prepare some data
xs = []
ys = []
start = '2010-01-01'
end = '2019-09-20'
stockcode = 'IBM'
stock = yf.download(stockcode,start,end)
bandwidth = 1
plot_val = stock['Adj Close']
range_start = min(plot_val)
range_end = max(plot_val)

# compute the kde
kde = gaussian_kde(plot_val, bw_method=bandwidth)
# Evenly space x values
x = np.linspace(range_start, range_end, 200)
# Evaluate pdf at every value of x
y = kde.pdf(x)
# Append the values to plot
xs.append(list(x))
ys.append(list(y))      

# Construct ColumnDataSource from kde
kernel_source = ColumnDataSource(data={'x': xs, 'y': ys})
    
p = figure(plot_width = 600, plot_height = 600, 
           background_fill_color = "grey",
           background_fill_alpha = 0.3,
           title = 'Density Plot of Stock Price',
           x_axis_label = 'Stock Price', y_axis_label = 'Density')

# add renderer
p.multi_line(source = kernel_source, xs='x', ys='y',
             color = 'navy', line_width = 2)

# show the results
show(p)

[*********************100%***********************]  1 of 1 downloaded


### Generate a simple application of interactive kernel density plot with user-defined bandwdith

Basically, the task here becomes more complicated. The first half the python code is almost the same as before. However, for the second half, we are required to define new functions to fullfil the requirement of Bokeh to develop this simple application. In partcular, a big function modify_doc() is defined to handle all the operations when the density plot is generated and updated. Within the modify_doc() function, make_dataset(), make_plot() and update() are specified to construct the data set for plotting, graphical plot in the output, procedure of update respectively. Note that these three funcctions must be defined here as required by Bokeh.

Since we want to change the value of bandwidth and its effect on the density plot, the Slider() method is applied here to specify the change in the value in bandwidth by the users. The bandwidth_select.on_change() can detect any change in the value of bandwidth and the plot will be updated accordingly.

Then, the objects src and p are constructed initially under the default value of bandwidth, which is 5 in this example. Also, the object controls by the method WidgetBox() collects the list of controls on the webpage and the arrangement of controls is handled by the method row(). Afterwards, we may make a tab with layout using the methods Panel() and Tabs(). Each tab is then added to the doc object in the modify_doc() method at the beginning.

Finally, the application can be established by the methods FunctionHandler() and Application().

In [11]:
# prepare some data
xs = []
ys = []
start = '2010-01-01'
end = '2019-09-20'
stockcode = 'IBM'
stock = yf.download(stockcode,start,end)
bandwidth = 1
plot_val = stock['Adj Close']
range_start = min(plot_val)
range_end = max(plot_val)

def modify_doc(doc):

    def make_dataset(bandwidth = 5):
        xs = []; ys = []
        # compute the kde
        kde = gaussian_kde(plot_val, bw_method=bandwidth)
        # Evenly space x values
        x = np.linspace(range_start, range_end, 200)
        # Evaluate pdf at every value of x
        y = kde.pdf(x)
        # Append the values to plot
        xs.append(list(x))
        ys.append(list(y)) 

        return ColumnDataSource(data={'x': xs, 'y': ys})
    
    def make_plot(src):
        # Blank plot with correct labels
        p = figure(plot_width = 600, plot_height = 600, 
                   background_fill_color = "grey",
                   background_fill_alpha = 0.3,
                   title = 'Density Plot of Stock Price',
                   x_axis_label = 'Stock Price', y_axis_label = 'Density')

        # Construct the line plot for density
        p.multi_line(source = src, xs='x', ys='y',
             color = 'navy', line_width = 2)

        return p
    
    def update(attr, old, new):
        new_src = make_dataset(bandwidth = bandwidth_select.value)
        src.data.update(new_src.data)

    # define a slider to choose bandwidth
    bandwidth_select = Slider(start = 0.01, end = 2,
                              step = 0.01, value = 0.2,
                              title = 'Bandwidth')
    bandwidth_select.on_change('value', update)
    
    src = make_dataset(bandwidth = bandwidth_select.value)
    
    p = make_plot(src)
    
    # Put controls in a single element
    controls = WidgetBox(bandwidth_select)
    
    # Create a row layout
    layout = row(controls, p)
    
    # Make a tab with the layout 
    tab = Panel(child=layout, title = 'Stock Price Density')
    tabs = Tabs(tabs=[tab])
    
    doc.add_root(tabs)
    
# Set up an application
handler = FunctionHandler(modify_doc)
app = Application(handler)

[*********************100%***********************]  1 of 1 downloaded


In [12]:
show(app)