In this notebook we are going to be learning about the different types of plots that can be made using bokeh. We are going to go over a subset of the types of plots and provide links to tutorials to document for other types of plots. It entirely possible to generate variations of existing plot types and since bokeh is actively under development new plot types with be continually added hence we do not expect to be the exhaustive in the types of plots but merely instructive. 

The types of plots that we are going to go over are - 


1) Bar graph - vertical and horizontal <br>
2) Adding legends and axis range <br>
3) Histograms <br>
4) Patches and varea <br>
5) Other plots



Let us first start with the bar plot

##  1) Bar plot - vertical and horizontal 

1) Bar plot 

The bar plot is one of the most commonly used plot types to represent categorical data. For this purpose let us use the iris dataset. The problem we want to solve is the following, given a dataset, how can I visualize how many examples of each label we have in the dataset. For example as far as the iris dataset is concerned there are three labels

```python
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

```
The data set itself contains four measurements of 500 samples of iris plants, based on the values of these 4 measurement, we end up assigning a label to each sample/plant in our collection. So, the best way to represent how many of each type of iris plant we have, we will use a bar plot. 

The iris dataset is easily available to python users through sklearn package, simply run the following import statements and commands- 


```python
from sklearn.datasets import load_iris
data = load_iris()
label_names = data.target_names 
label_values = data.target
```
The best way to move forward would be to create a dataset frame in pandas. Do not worry about the details of how it is done. Generate a data frame for the data using the following code- 

**Exercise:**
Generate the a data frame from the data for iris target and and use value counts to count number of entries of each label. 

**Solution:**
Run - 

```python 
iris_target_df = pd.DataFrame(data=data.target, columns=["class_value"])
count_list = iris_target_df["class_value"].value_counts().tolist()
count_list 
```

your output should look like this- 

```python 
[50, 50, 50]
```
So there are 50 entries for each type of plant, so let us visualize this with a vertical bar plot in bokeh. The way that you would do that would be- 

```python 
from bokeh.plotting import figure 
from bokeh.io import output_notebook, show
output_notebook()
p = figure( x_range= data.target_names, plot_height=250, title="Iris plant type counts",
           toolbar_location="right", tools="hover, pan")

p.vbar(x=data.target_names, top=count_list, bottom=0,   width=0.3)
show(p)

```

This should generate a plot that looks like - 

![iris_vbar](images/iris_bar_plot_vbar.jpg)


We are going to go over this code carefully. Up till the third line where it says 'output_notebook()' its all importing packages, similar to what we had done earlier. So, generate the figure object 'p'. The only differences from the last notebook is - 

1) We choose to only keep 2 tools, hover and pan and got rid of the other tools <br>
2) We have this new option  'toolbar_location' this just decides where you can place the tool bar. <br>  
3) The argument 'x_range' lets us set the names of the ticks on the x axis, since we have 3 ticks, we have 3 names <br>




The main difference between plotting a scatter plot or a line plot or a bar plot comes in the next line where we generate a figure of type 'vbar' which stands for vertical bar. Here you need to specific the x axis label for each category is 'x' and the counts for each of the categories which is given by 'top=count_list'. The argument bottom tells us what is the starting values for the bar. Currently its set to 0. Suppose we wanted to plot the bar from 10 to 50 units, we can do that by specifying the argument 'bottom' as 10 and 'top' as 50. If we want to specific different values for each bar, then we need to specify an array of values. Remember count list here is- [50, 50, 50] We can also specific the width of each bin.
 

***Exercise:***
Plot a bar graph above with the same categorical labels but where the counts for each category are-[50,20,10]. Also change the bar width to 0.9, alpha to 0.7, and bar color to red.  
***Solution:***
Your code should read- 

```python
from bokeh.plotting import figure 
from bokeh.io import output_notebook, show
output_notebook()
p = figure( x_range= data.target_names, plot_height=250, title="Iris plant type counts",
           toolbar_location="right", tools="hover, pan")

p.vbar(x=data.target_names, top=[50,20,10],  width=0.9, alpha=0.7, color="red")
show(p)
```
The result of this code is going to look like- 

![iris_2](images/iris_2.jpg)


#### horizontal bar 
Suppose you want to take the data from the previous exercise and plot it as a **horizontal bar** rather than vertical bar. You would then do- 

```python
p = figure( y_range= data.target_names, plot_height=250, title="Iris plant type counts",
           toolbar_location="right", tools="hover, pan")
p.hbar(y=data.target_names, height =[0.5,0.5,0.5], right=[50,30,20], left=0 )
show(p)

```
the plot would look like-

![hiris](images/hiris.jpg)

there are a few differences in the code compared to the code of the vertical bar plot. First being, rather than 'x_range' we have 'y_range' this is because we if we are plotting horizontal plots our labels will be on the y-axis. In other words, the axis where we are going to place our categories is the y-axis. The rest of 'figure' definition does not change. The method to add a horizontal bar is 'hbar', similar to 'vbhar' it takes an axis argument 'y' this argument should contain the names of the categories that we want to plot. We have the argument 'height' which controls the thickness of the horizontal bars, this is the same as the argument 'width' for vertical bars, I have plotting this as a list to show that you can control the height of each bar. Similarly, you can also do 'width = [1, 0.2, 0.6]' this would mean that we plot vertical bars of different thicknesses. Like vertical bar, we can specific location of the bar, in this case,  as 'left' and 'right'. So, if we wanted to specify a horizontal bar from 10 to 30 we would write- 'right=30' and 'left=10'. 


This way we specific the bars in bokeh allows for a great degree of control. Further more, we are carrying over many of the figure properties that we learned from the first bokeh notebook. 

Next up are Pie chart- 

## 2) Adding legends to plots

Regardless of the type of chart we want to be able to add legends to a chart to show what are the fields of data that we are looking at. We held of doing this from earlier since its easier to illustrate with bar charts, suppose you have a vertical bar chart of the iris dataset. From the above example you would realize that all three of the categories have the same color. So how can we represented them differently, we can do the following

```python
p = figure( x_range= data.target_names, plot_height=250, title="Iris plant type counts",
           toolbar_location="right", tools="hover, pan")

p.vbar(x=data.target_names, top=count_list, bottom=0,   width=0.3, color=["red", "green", "blue"])
show(p)
```
You will notice a plot with three bars with colors red, green and blue. Now we want to add a legend to this so that we can easily see which category does each color belong to. Ofcourse you can read it off the x-axis but if you add the legend, you wont need to the x axis labels. So you would add the legend by adding- 

```python 

p = figure( x_range= data.target_names, plot_height=250, title="Iris plant type counts",
           toolbar_location="right", tools="hover, pan")
color=["red", "green", "blue"]
for name, count, clr  in zip(list(data.target_names), count_list, color) :
    p.vbar(x=[name]  , top=count, bottom=0,   width=0.3,color=clr, legend=name )

p.legend.orientation="vertical"
show(p)
```

You will notice that in order to have the legend we had to plot each of the bars individually, hence the for loop for plotting the data. the 'zip' method creates tuples from combining the three lists and we loop over each individual elements from the list. Finally we can set the orientation of the legend using the 'p.legend' method.

#### Axis ranges 
What you will notice when you plot the above code is that the legend will overlap with the bar plots, in this situation we need to plot the legend differently. Hence we can do. 

```python 
p = figure( x_range= data.target_names, y_range=[0,70], plot_height=250, title="Iris plant type counts",
           toolbar_location="right", tools="hover, pan")
color=["red", "green", "blue"]
for name, count, clr  in zip(list(data.target_names), count_list, color) :
    p.vbar(x=[name]  , top=count, bottom=0,   width=0.3,color=clr, legend=name )
    
p.legend.orientation="horizontal"
show(p)
```
you should see - 

![iris_legend](images/iris_legend.jpg)

you will find that the range of the y axis has increased from 50 to 70, this is done by using the 'y_range' property in figure. You can set the scale of the x and y range this way. In fact we have set the x_range to the three entries which are the category names from our dataset. 

Now suppose you want change numerically the x axis, similar to what we did, you would change the 'x_range' quantity. 

**Exercise**: Plot the horizontal bar chart with a legend for the the iris dataset. Adjust range of the x axis to be in between 0 and 100. 

**Solution**: 
  Here is what the code should read- 
  
```python 
p = figure( y_range= data.target_names, x_range=[0,100], plot_height=250, title="Iris plant type counts",
           toolbar_location="right", tools="hover, pan")
color=["red", "green", "blue"]
for name, count, clr  in zip(list(data.target_names), count_list, color) :
    p.hbar(y=[name]  ,height=0.5,  right=count, left=0,color=clr, legend=name )

p.legend.orientation="vertical"
show(p)
```
#### show/hide parts of plot
There is a neat little thing you can do with the legend. Add the line- 

```python
p.legend.click_policy = "hide"
```
before 'show(p)' and then click on any one of the legend entries. You will see the bar for that legend entry will disappear. This is a useful tool when you have lots of data and you want to compare only certain curves or bars etc. 

Suppose you do not want a curve or bar to completely disappear, you just want to gray it out. Then you can 
do the following in the code- 

```python 
p = figure( y_range= data.target_names, x_range=[0,100], plot_height=250, title="Iris plant type counts",
           toolbar_location="right", tools="hover, pan")
color=["red", "green", "blue"]
for name, count, clr  in zip(list(data.target_names), count_list, color) :
    p.hbar(y=[name], height=0.5,  right=count, left=0,color=clr, legend=name, muted_color=clr, muted_alpha= 0.3 )

p.legend.orientation = "vertical"
p.legend.click_policy = "mute
show(p)
``` 
Here you can will see in the 'p.hbar' two properties have been added, 'muted_color' and 'muted_alpha'. Both of these allow you to control the color and the amount of translucency of the curve being displayed. In this case bar becomes translucent. Here is an example. 

![legend_translucent](images/translucent_legend.jpg)

Next section we have histograms. Histograms are some of the most important plot types we will learn. Histograms are frequently used in data science and other fields and come in handy when we are trying to identify the type of distribution we are working with. 


## 3) Histogram 

Histograms are essentially bar plots but each bar counts the number of entries in a certain range of values. For this we, will take the most common example of a distribution. The normal distribution, we can generate the values of a normal distribution using the following code- 

```python
import numpy as np 
mu, sigma = 0, 0.5
normal = np.random.normal(mu, sigma, 1000)
hist, edges = np.histogram(normal, density=False, bins=50)
```

Run this in the code cell below. 


In [1]:
import numpy as np 
## run the numpy code from above


You will have two numpy arrays, 'edges' will be the range of values and 'hist' will be the count of the number of entries within the range of values. For examples first value of 'hist' will be the number of entries in the array 'normal' that fall in between 'edges[0]' and 'edges[1]'. 

**Exercise**: Show that the above statement is true using 'np.where'. 


In [2]:
# solution

import numpy as np 
mu, sigma = 0, 0.5
normal = np.random.normal(mu, sigma, 1000)
hist, edges = np.histogram(normal, density=False, bins=50)

np.where(np.logical_and(normal>=edges[0],normal<=edges[1]))[0].size, hist[0]


(2, 2)

With this information we can now plot the histogram. To plot the histogram in bokeh. Run the following code in the cell below - 

```python 
p= figure(y_range=[0,100], tools=["hover", "pan","reset","box_zoom","wheel_zoom"])
p.quad(top=hist, bottom=0, right=edges[:-1], left=edges[1:])
show(p)
```
Notice that in plotting a histogram you have to specify top, bottom, right and left of the bars. The top and bottom are straight forward since they are just the heights of the bar which are given by counts for a given bin. The left and right are the tricky ones. The right entries are the right edges of the the bar and the left entries are the left edges of the bar. These we get from the histogram function. Note there is no specifying the bar width here, since the edges represent the range of values hence those ranges must contain all the values in the array- 'normal'. What we are doing in taking a continuous variable and representing it as a categorical variable and each category is a range of values. Hence we have to be careful and ensure that we do not omit any values due that is why we cannot specify bar width. 



In [3]:
from bokeh.plotting import figure 
from bokeh.io import output_notebook, show
output_notebook()

# run the code above here 
p= figure(y_range=[0,100], tools=["hover", "pan","reset","box_zoom","wheel_zoom"])
#
#
#


Running the code above will yield something like this- 
![hist](images/hist_example.jpg)

Note: This plot will look different for you since we have not fixed the random number generator. This means that each time you run the notebook you will be generating different set of random numbers and hence the plot will also change. So do not be surprised if you see a plot that looks somewhat different. Important thing to keep in mind is that the plot show be somewhat symmetric about 0 since the mean value of the distribution is 0 and the standard deviation should be roughly 0.5. 



If you have done the plotly notebook, then you would have seen the led dataset for which we plotted the histogram, we are going to do the same here. You can view the dataframe for the led dataset by running the cell below- 


In [4]:
import pandas as pd

led_data = pd.read_csv("data/LED_bulb.csv")

led_data.head()


Unnamed: 0,Tester_ID,lumen_intensity,color_temperature,energy_consumption,component_temperature,time_to_failure
0,1,1627,4952,16.43,21.1,13.77
1,2,1744,4952,19.38,21.38,13.31
2,3,1763,4956,21.5,21.15,13.52
3,4,1765,4969,20.76,21.94,13.71
4,5,1784,4953,20.99,21.03,13.35


**Exercise**: Plot energy consumption as a histogram in bokeh. Hint: You are going to have to use the numpy function histogram first. 


In [5]:
# solution 
hist, bins= np.histogram(led_data["energy_consumption"], bins=25, density=False)
p= figure(y_range=[0,100], tools=["hover", "pan","reset","box_zoom","wheel_zoom"])
p.quad(top=hist, bottom=0, right=edges[:-1], left=edges[1:])
show(p)



We have changed the number of bins to 25 to make it look somewhat similar to the plotly histogram. You will see that there are similarities between both plots.

The next topic that we will tackle are patches-

## 4) Patches 

Patches are useful plot types especially when you are trying to highlight parts of a plot. For example if you want have a straight line and highlight the region under the line, we can do so using the following code- 

```python 
m = 0.5
y_o = 4
x = np.linspace(0,100,1000)
y_value = (m*x)+y_o

#patch coordinates 
shade_y = y_value.copy()
shade_y[0] = 0
shade_y[-1]= 0

p= figure( tools=["hover", "pan","reset","box_zoom","wheel_zoom"])
p.line(x, y_value, color="red",line_width=6)
p.patch(x,shade_y, color="green", alpha=0.3)
show(p)
```

run this code in the below cell to see the result. 


In [6]:
from bokeh.plotting import figure 
from bokeh.io import output_notebook, show
output_notebook()

# copy code for generating patches for area under the curve 



The first few lines lines of the code are there to generate the straight line. We get the x coordinates of the line by running linspace and generating a 1000 points in a range of 0 to 100. Then using a slope of 0.5 and intercept of 4 we get the y values. 

Next we need to plot the patch. The patch is displayed using the method 'p.patch()' where we have to specify the patch coordinates. Specifying the patch coordinates can be a bit tricky. In our case the first point of the patch must be (0,0) hence we have copied the 'y_value' into a new variable 'shade_y' where we set the first y value to be 0. Then next coordinate would be x value and corresponding y value. To make sure that the patch ends at the end of the line segment we set the y value of 'shade_y' to be 0 at end of the line segment. 

What we are essentially doing is specifying the coordinates of the top line of the patch, the bottom is automatically joined.

Another alternate way of doing this is to use the 'varea()' method. This is only available from bokeh version 1.2.0 +. Copy the code to plot a line with area under the curve. 

```python 
m = 0.5
y_o = 4
x = np.linspace(0,100,1000)
y_value = (m*x)+y_o
y_zeros = np.linspace(0,0,1000)


p= figure( tools=["hover", "pan","reset","box_zoom","wheel_zoom"])
p.line(x, y_value, color="red",line_width=6)
p.varea(x, x_zeros, y_value, color="green", alpha=0.3)
show(p)

```


In [7]:
from bokeh.plotting import figure 
from bokeh.io import output_notebook, show
output_notebook()

# copy code for seeing how output for p.varea looks 


##  5) Other plots

Apart from the charts shown here, there are many types of charts bokeh can plot. Another commonly used one will be heatmaps. You can find the code to plot a heatmap plot here- 

https://docs.bokeh.org/en/latest/docs/gallery/unemployment.html

However, fair warning, heatmaps are a bit painful in bokeh, matplotlib, seaborn and plotly have much better means of plotting heatmaps than bokeh so you should look into that if you are looking for a quick way to plot heatmaps. 

The same can be said for box plots, the code for bokeh box plots can be found here- 

https://docs.bokeh.org/en/latest/docs/gallery/boxplot.html

you will find that bokeh allows for greater customization in box plots and heatmaps but it comes at the cost that the complexity is too high for simple plots.



With this we close out the second notebook in bokeh. There are other types of plots that we have not covered here (for example- pie chats, donut charts, heatmaps) but are easy to learn once you know these basics plots and understand how data input works in bokeh. Try to repeat all the examples from the plotly notebooks using bokeh plots to better learn the difference between the libraries. Bokeh does not support 3d plots hence you will not be seeing any 3d plots next notebook, what we will get into however is linked plots and widgets. So lets go there next! 



## Rough Notes 

In [8]:
from bokeh.plotting import figure 
from bokeh.io import output_notebook, show
output_notebook()


In [9]:
from sklearn.datasets import load_iris
import pandas as pd
data = load_iris()
label_names = data.target_names 
label_values = data.target

In [10]:
iris_target_df = pd.DataFrame(data=data.target, columns=["class_value"])
count_list = iris_target_df["class_value"].value_counts().tolist()
count_list

[50, 50, 50]

In [11]:
import numpy as np 
mu, sigma = 0, 0.5
normal = np.random.normal(mu, sigma, 1000)
hist, edges = np.histogram(normal, density=False, bins=50)


In [12]:
p= figure(y_range=[0,100], tools=["hover", "pan","reset","box_zoom","wheel_zoom"])
p.quad(top=hist, bottom=0, right=edges[:-1], left=edges[1:])
show(p)

In [13]:
m = 0.5
y_o = 4
x = np.linspace(0,100,1000)
y_value = (m*x)+y_o

#patch coordinates 
shade_y = y_value.copy()
shade_y[0] = 0
shade_y[-1]= 0

p= figure( tools=["hover", "pan","reset","box_zoom","wheel_zoom"])
p.line(x, y_value, color="red",line_width=6)
p.patch(x,shade_y, color="green", alpha=0.3)
show(p)


In [14]:
import pandas as pd
from bokeh.models import LinearColorMapper,PrintfTickFormatter,ColorBar, BasicTicker

gapminder_data = pd.read_csv("data/gapminder.csv")

gapminder_data["year"] = gapminder_data["year"].astype(str)
gapminder_data["continent"]= gapminder_data["continent"].astype(str)

gapminder_data = gapminder_data[["year","continent","lifeExp"]]
gapminder_data = gapminder_data.sort_values(by="year")
gapminder_data = gapminder_data.reset_index()
unique_years = gapminder_data["year"].value_counts().index.tolist()
unique_contient = gapminder_data["continent"].value_counts().index.tolist()

# get pivot table 

gapminder_data = gapminder_data.drop(columns="index")
gapminder_table = pd.pivot_table(gapminder_data, values="lifeExp", index=["continent"], columns="year",aggfunc=np.size)

# >>> table = pd.pivot_table(df, values='D', index=['A', 'B'],
# ...                     columns=['C'], aggfunc=np.sum, fill_value=0

unique_max_value = gapminder_table.max().unique()[0]
unique_min_value = gapminder_table.min().unique()[0]


In [20]:
colors = ["#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce", "#ddb7b1", "#cc7878", "#933b41", "#550b1d"]
mapper = LinearColorMapper(palette=colors, low=gapminder_data['lifeExp'].min(), high=gapminder_data['lifeExp'].max())


heatmap = figure(title="Categorical Heatmap", tools="hover", toolbar_location=None,
           x_range=unique_years, y_range=unique_contient)
heatmap.rect(x="year",y="continent", source=gapminder_data, width=1, height=1,
        fill_color={'field': "lifeExp", 'transform': mapper}, line_color=None)

color_bar = ColorBar(color_mapper=mapper, major_label_text_font_size="10pt",
                     ticker=BasicTicker(),
                     label_standoff=12, border_line_color=None, location=(0, 0))
heatmap.add_layout(color_bar, 'right')

show(heatmap)

In [16]:
from math import pi
import pandas as pd

from bokeh.io import show
from bokeh.models import LinearColorMapper, BasicTicker, PrintfTickFormatter, ColorBar
from bokeh.plotting import figure
from bokeh.sampledata.unemployment1948 import data

data['Year'] = data['Year'].astype(str)
data = data.set_index('Year')
data.drop('Annual', axis=1, inplace=True)
data.columns.name = 'Month'

years = list(data.index)
months = list(data.columns)

# reshape to 1D array or rates with a month and year for each row.
df = pd.DataFrame(data.stack(), columns=['rate']).reset_index()

# this is the colormap from the original NYTimes plot
colors = ["#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce", "#ddb7b1", "#cc7878", "#933b41", "#550b1d"]
mapper = LinearColorMapper(palette=colors, low=df.rate.min(), high=df.rate.max())

TOOLS = "hover,save,pan,box_zoom,reset,wheel_zoom"

p = figure(title="US Unemployment ({0} - {1})".format(years[0], years[-1]),
           x_range=years, y_range=list(reversed(months)),
           x_axis_location="above", plot_width=900, plot_height=400,
           tools=TOOLS, toolbar_location='below',
           tooltips=[('date', '@Month @Year'), ('rate', '@rate%')])

p.grid.grid_line_color = None
p.axis.axis_line_color = None
p.axis.major_tick_line_color = None
p.axis.major_label_text_font_size = "5pt"
p.axis.major_label_standoff = 0
p.xaxis.major_label_orientation = pi / 3

p.rect(x="Year", y="Month", width=1, height=1,
       source=df,
       fill_color={'field': 'rate', 'transform': mapper},
       line_color=None)

color_bar = ColorBar(color_mapper=mapper, major_label_text_font_size="5pt",
                     ticker=BasicTicker(desired_num_ticks=len(colors)),
                     formatter=PrintfTickFormatter(format="%d%%"),
                     label_standoff=6, border_line_color=None, location=(0, 0))
p.add_layout(color_bar, 'right')

show(p)      # show the plot

In [17]:
df

Unnamed: 0,Year,Month,rate
0,1948,Jan,4.0
1,1948,Feb,4.7
2,1948,Mar,4.5
3,1948,Apr,4.0
4,1948,May,3.4
5,1948,Jun,3.9
6,1948,Jul,3.9
7,1948,Aug,3.6
8,1948,Sep,3.4
9,1948,Oct,2.9


In [18]:
gapminder_data.lifeExp.max()

82.603