# Histograms with Bokeh

In [1]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', 10)

from bokeh.io import show, output_notebook
from bokeh.plotting import figure
output_notebook()

## Automobile Dataset

We will use the Automobile Data Set [https://archive.ics.uci.edu/ml/datasets/automobile] from the UCI Machine Learning Repository [https://archive-beta.ics.uci.edu/]. It includes categorical and continuous variables. 

Defining the headers

In [2]:
headers = ["symboling", "normalized_losses", "make", "fuel_type", "aspiration", "num_doors", 
            "body_style", "drive_wheels", "engine_location", "wheel_base", "length", "width", "height", 
            "curb_weight", "engine_type", "num_cylinders", "engine_size", "fuel_system", "bore", "stroke", 
            "compression_ratio", "horsepower", "peak_rpm", "city_mpg", "highway_mpg", "price"]
df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data",
                  header=None, names=headers, na_values="?" )
df.head()

Unnamed: 0,symboling,normalized_losses,make,fuel_type,aspiration,...,horsepower,peak_rpm,city_mpg,highway_mpg,price
0,3,,alfa-romero,gas,std,...,111.0,5000.0,21,27,13495.0
1,3,,alfa-romero,gas,std,...,111.0,5000.0,21,27,16500.0
2,1,,alfa-romero,gas,std,...,154.0,5000.0,19,26,16500.0
3,2,164.0,audi,gas,std,...,102.0,5500.0,24,30,13950.0
4,2,164.0,audi,gas,std,...,115.0,5500.0,18,22,17450.0


## Histograms

One of the most common graphs for displaying frequency distributions is a histogram. Bokeh does not have a built-in histogram glyph, but we can make our own using the quad glyph, which allows us to specify each bar's bottom, top, left, and right edges.

Let's work with three variables: `length`, `width`, and `height`.

In [3]:
arr, edges = np.histogram(df.length, bins = 10, 
                        range = [df.length.min(), df.length.max()])

In [4]:
print('Freq: ',len(arr))
print('Edges:',len(edges))

Freq:  10
Edges: 11


Create the blank plot

In [5]:
# Create the blank plot
h = figure(height = 400, width = 600, 
        title = 'Histogram of Car length')

Adding a quad glyph and showing the plot

In [6]:
h.quad(bottom=0, top=arr, left=edges[:-1], right=edges[1:], 
       fill_color='salmon', line_color='white')
show(h)       

Show the plot

Let's create a function `create_hist`

In [7]:
def create_hist(variabl, bins=10):
    '''
    Plot an histogram with Bokeh
    variabl: variable to plot
    bins: number of bins
    '''
    arr, edges = np.histogram(variabl, bins = bins, 
                        range = [np.min(variabl), np.max(variabl)])
    h = figure(height = 400, width = 600, 
        title = 'Histogram')                    
    h.quad(bottom=0, top=arr, left=edges[:-1], right=edges[1:], 
       fill_color='salmon', line_color='white')
    show(h)   

### Car length

In [8]:
create_hist(df.length,10)

In [9]:
create_hist(df.length,5)

### Car width

In [10]:
create_hist(df.width,10)

In [11]:
create_hist(df.width,5)

### Car height

In [12]:
create_hist(df.height,10)

In [13]:
create_hist(df.height,5)

## Overlapping histograms

The function `create_hist()` we create only works with one variable. We cannot use it for plotting overlapping histograms.

Let's create three histograms for car length, width, and height.

In [14]:
bins = 10

In [15]:
arr_l, edges_l = np.histogram(df.length, bins = bins, 
                    range = [df.length.min(), df.length.max()])

In [16]:
arr_w, edges_w = np.histogram(df.width, bins = bins, 
                    range = [df.width.min(), df.width.max()])

In [17]:
arr_h, edges_h = np.histogram(df.height, bins = bins, 
                    range = [df.height.min(), df.height.max()])

In [18]:
# Create the blank plot
overlapp = figure(height = 400, width = 600, 
            toolbar_location='above', title = 'Histograms')

In [19]:
overlapp.quad(bottom=0, top=arr_l, left=edges_l[:-1], right=edges_l[1:], legend_label="length",
        fill_color='indianred', fill_alpha=0.8, line_color='salmon',line_alpha=0.8);

In [20]:
overlapp.quad(bottom=0, top=arr_w, left=edges_w[:-1], right=edges_w[1:], legend_label="width",
        fill_color='darkorange', fill_alpha=0.8, line_color='darkorange',line_alpha=0.8);

In [21]:
overlapp.quad(bottom=0, top=arr_h, left=edges_h[:-1], right=edges_h[1:], legend_label="height",
        fill_color='dodgerblue', fill_alpha=0.8, line_color='lightblue', line_alpha=0.8);

In [22]:
show(overlapp)

### Adding lines

Let's connect the means using a line.

Computing the means (x-values)

In [23]:
means = df[['height', 'width', 'length']].mean()
means

height     53.724878
width      65.907805
length    174.049268
dtype: float64

Computing the max (y-values)

In [24]:
top = [arr_h.max(), arr_w.max(), arr_l.max()]
top

[41, 44, 54]

In [25]:
overlapp.line(x=means, y=top, color="grey", line_width=2)
overlapp.circle(x=means, y=top,  fill_color = "grey", size=10)
show(overlapp)

### Adding annotations

Annotations are visual features that we add to our graph to make it easier to read. 

In [26]:
from bokeh.models import BoxAnnotation

In [27]:
low_box = BoxAnnotation(top=20,             fill_alpha=0.2, fill_color="powderblue")
mid_box = BoxAnnotation(bottom=20, top=40,  fill_alpha=0.2, fill_color="beige")
high_box = BoxAnnotation(bottom=40,         fill_alpha=0.2, fill_color="powderblue")

In [28]:
# Adding the BoxAnnotation objects to our figure
overlapp.add_layout(low_box)
overlapp.add_layout(mid_box)
overlapp.add_layout(high_box)
show(overlapp)

## Pie Charts

A pie chart is another way to show numerical proportions. In Bokeh, you can use the `wedge()` method for displaying a pie chart.

In [29]:
df.body_style.value_counts()

sedan          96
hatchback      70
wagon          25
hardtop         8
convertible     6
Name: body_style, dtype: int64

In [30]:
bs = pd.DataFrame()
bs['body_style']  = df.body_style.value_counts().index
bs['number_cars'] = df.body_style.value_counts().values
bs

Unnamed: 0,body_style,number_cars
0,sedan,96
1,hatchback,70
2,wagon,25
3,hardtop,8
4,convertible,6


You need to compute the angle of each slide.

In [31]:
from math import pi

In [32]:
bs['angles'] = bs.number_cars / bs.number_cars.sum() * 2*pi
bs

Unnamed: 0,body_style,number_cars,angles
0,sedan,96,2.94237
1,hatchback,70,2.145478
2,wagon,25,0.766242
3,hardtop,8,0.245197
4,convertible,6,0.183898


And we can add colors

In [33]:
from bokeh.palettes import Bokeh

In [34]:
bs['colors'] = Bokeh[bs.shape[0]]
bs

Unnamed: 0,body_style,number_cars,angles,colors
0,sedan,96,2.94237,#EC1557
1,hatchback,70,2.145478,#F05223
2,wagon,25,0.766242,#F6A91B
3,hardtop,8,0.245197,#A5CD39
4,convertible,6,0.183898,#20B254


Defining the `ColumnDataSource`

In [35]:
from bokeh.models import ColumnDataSource
from bokeh.transform import cumsum

In [36]:
source_pie = ColumnDataSource(data=bs)

In [37]:
p = figure(width=600, height=600, x_range=source_pie.data['body_style'])
p.axis.visible = False
p.grid.grid_line_color = None

In [38]:
p.wedge(x=bs.shape[0]/2,
        y=0, radius=2,
        start_angle=cumsum('angles', include_zero=True),
        end_angle=cumsum('angles'),
        line_color="white",
        fill_color='colors',
        legend_field='body_style',
        source=source_pie)
show(p)        

Modifying the legend location and orientation

In [39]:
# Modifying the legend location and orientation
p.legend.orientation = "horizontal"
p.legend.location = "top"
show(p)

## References

- Hussain S, Dahan N.A, Ba-Alwi F.M, Ribata N. Educational Data Mining and Analysis of Studentsâ€™ Academic Performance Using WEKA. Indonesian Journal of Electrical Engineering and Computer Science. 2018; Vol. 9, No. 2. February. pp. 447~459
- https://docs.bokeh.org/en/latest/