# Interactive Plotting with Bokeh
* Author: Johannes Maucher
* Last update: 03.05.2018

In [1]:
import pandas as pd
#import bokeh
from IPython.display import display
from bokeh.plotting import figure 
from bokeh.io import output_notebook, show
from bokeh.models import ColumnDataSource,HoverTool

In [2]:
germanFood=pd.read_csv("germanProductSubset.csv",index_col=0)

In [3]:
display(germanFood.head(20))
print(germanFood.shape)

Unnamed: 0,brands,product_name,main_category,energy_100g,fat_100g,proteins_100g,salt_100g,carbohydrates_100g,sugars_100g,sodium_100g,saturated-fat_100g
0,Candy Crush,Sour Fruit Gummies,en:sugary-snacks,1360.0,0.0,5.0,0.127,75.0,57.5,0.05,0.0
1,Candy Crush,Jelly Fish,en:sugary-snacks,586.0,0.0,0.0,0.0762,34.0,24.0,0.03,0.0
2,Del Monte,Ananas Scheiben gezuckert,en:plant-based-foods-and-beverages,322.0,0.0,0.0,0.02286,19.66,17.95,0.009,0.0
4,The Invisible Chef,Pancake & Waffle Mix Lemon Blueberry,en:pancake-mix,852.0,1.0,6.0,1.33,42.0,8.0,0.523622,0.0
8,The Invisible Chef,Pancake & Waffle Mix Buttermilk,en:pancake-mix,962.0,0.0,7.0,1.57,49.0,13.0,0.61811,0.0
11,vitasia,Thai Noodles sweet & sour sauce,en:plant-based-foods-and-beverages,617.0,2.6,3.9,0.75,26.3,7.0,0.295276,0.6
12,Sun Snacks (Aldi Süd),Salzstangen,en:salty-snacks,1583.0,4.2,11.0,6.7,71.0,2.3,2.637795,1.6
14,Bergader,Edelpilz,en:dairies,1376.0,28.0,19.5,3.5,0.5,0.3,1.377953,19.3
15,Ikea,Rye crispbread,en:plant-based-foods-and-beverages,1463.0,2.6,9.0,1.2,64.0,1.3,0.472441,0.6
16,"Freihofer Gourmet,Aldi Einkauf GmbH & Co. oHG",Edelbitterschokolade Sao Thomé 75% Kakao,en:sugary-snacks,2314.0,42.5,9.0,0.0254,27.6,21.9,0.01,26.6


(466, 11)


In [4]:
output_notebook()

## Simple 2D-Plot with interactive tools

In [6]:
source = ColumnDataSource(germanFood)
options = dict(plot_width=800, plot_height=500,
               tools="pan,wheel_zoom,box_zoom,box_select,lasso_select,reset")

p = figure(title="carbohydrates vs. sugar", x_axis_label="carbohydrates/100g", y_axis_label="sugars/100g", **options)
p.circle("carbohydrates_100g","sugars_100g", color="blue", size=8, alpha=0.4, source=source)
show(p)

**Conclusions from the plot above:**
* sugar is a form of carbohydrate, but not the only one

In [7]:
p = figure(title="Proteins vs. Carbohydrates", x_axis_label="carbohydrates/100g", y_axis_label="proteins/100g", **options)
p.circle("carbohydrates_100g","proteins_100g", color="blue", size=8, alpha=0.4, source=source)
show(p)

**Improvements:**
* For a given point, we want to 
    * determine the coordinates more accurately
    * know the name of the corresponding product

In [8]:
hover = HoverTool(tooltips = [("Productname","@product_name"),("Brand","@brands")], mode="mouse") #other modes: vline, hline
options = dict(plot_width=800, plot_height=500,
               tools=[hover,"crosshair,pan,wheel_zoom,box_zoom,box_select,lasso_select,reset"])

p = figure(title="Proteins vs. Carbohydrates", x_axis_label="carbohydrates/100g", y_axis_label="proteins/100g", **options)
p.circle("carbohydrates_100g","proteins_100g", color="blue", size=8, alpha=0.4, source=source)
show(p)

## Encode further information in marker size- and color
As demonstrated in the example above, each single point in the 2-dimensional plot encodes at least two feature-values. It's first component is the value of the first feature and the second is it's value of the second feature. Moreover, we can also annotate each point with textual information. In the example above, the `HoverTool` has been applied to display for each point two further feature-values: *product_name* and *brand*. Further information (feature values) can be visualized by the size and color of single points. This is demonstrated below, where 
* the marker-size shall encode the value of the feature `energy_100g`
* the marker-color shall encode the value of the feature `fat_100g`

### Use marker-size to encode further information
In order to encode the `energy_100g`-values by the marker size, 

1. we scale the `energy_100g`-values into a range, which is suitable for marker sizes. This is done by first dividing all values by the maximum value. After this devision all values are $\leq 1$. Then all scaled values are multiplied by an integer, here $E=30$, which shall be the maximum marker size.
2. the scaled values are assigned to a new column, here `normedEnergy`, of the dataframe.
3. the new column `normedEnergy` is assigned to the `scale`-argument of the `circle()`-method.


In [9]:
import numpy as np
energy=germanFood["energy_100g"].values.astype(int)
maxEnergy=np.max(energy)
normedEnergy=energy/maxEnergy
E=30
germanFood["normedEnergy"]=E*normedEnergy
source = ColumnDataSource(germanFood)

In [10]:
hover = HoverTool(tooltips = [("Productname","@product_name"),("Brand","@brands")], mode="mouse") #other modes: vline, hline
options = dict(plot_width=800, plot_height=500,
               tools=[hover,"crosshair,pan,wheel_zoom,box_zoom,box_select,lasso_select,reset"])

p = figure(title="Proteins vs. Carbohydrates", x_axis_label="carbohydrates/100g", y_axis_label="proteins/100g", **options)
p.circle("carbohydrates_100g","proteins_100g", size="normedEnergy", color="blue", alpha=0.4, source=source)
show(p)

### Use marker-color to encode further information
In order to map feature values to marker-colors, *Bokeh* provides
* different types of [Color Mappers](https://bokeh.pydata.org/en/latest/docs/reference/models/mappers.html#). As the name suggests, a color mapper maps a value of a categorical or a numeric variable into a color of a specified color pallete.
* a bunch of [Color Palettes](https://bokeh.pydata.org/en/latest/docs/reference/palettes.html).

In the code-cells below the `LinearColorMapper` and the `viridis` color-pallete is applied. For the ColorMapper the color-pallete and the minimum- and maximum-value of the variable's value range must be defined. For the color-pallete the number of different colors must be defined (20 in the example below). In the `circle()`-method, the a dictionary must be assigned to the `color`-argument,

```color={'field': 'fat_100g', 'transform': mapper},```

which defines 
* the dataframe-column for which values shall be mapped to color
* the color-mapper object to be used.


In order to understand which values are mapped to which colors a corresponding [ColorBar](https://bokeh.pydata.org/en/latest/docs/reference/models/annotations.html) can be added to the layout, as shown below:

In [11]:
#import modules for color - mapping:
from bokeh.palettes import magma,viridis
from bokeh.models.mappers import LinearColorMapper
from bokeh.models import ColumnDataSource,HoverTool, ColorBar

In [12]:
#create Color-Mapper object with specified color-pallete
mapper=LinearColorMapper(palette=viridis(20),low=germanFood["fat_100g"].min(),high=germanFood["fat_100g"].max())

In [13]:
#create ColorBar object
color_bar = ColorBar(color_mapper=mapper, location=(0, 0))

In [14]:
hover = HoverTool(tooltips = [("Productname","@product_name"),("Brand","@brands")], mode="mouse") #other modes: vline, hline
options = dict(plot_width=800, plot_height=500,
               tools=[hover,"crosshair,pan,wheel_zoom,box_zoom,box_select,lasso_select,reset"])

p = figure(title="Proteins vs. Carbohydrates", x_axis_label="carbohydrates/100g", y_axis_label="proteins/100g", **options)
p.circle("carbohydrates_100g","proteins_100g", size="normedEnergy", color={'field': 'fat_100g', 'transform': mapper}, alpha=0.8, source=source)
p.add_layout(color_bar, 'left')
show(p)

## Linking of Plots
Sometimes one dataset is visualized in multiple plots, each providing a different perspective on or different features of the given dataset. In this case a linking of the related plots may be advantageous. With linking one or more data points can be selected in one plot and they get highlighted in all joint plots.

For displaying multiple joint plots usually Bokeh's `gridplot()`-method is applied and the joint plots all apply the same *data source* of type `ColumnDataSource`: 

In [15]:
from bokeh.layouts import gridplot
source = ColumnDataSource(germanFood)

options = dict(plot_width=500, plot_height=500,
               tools="pan,wheel_zoom,box_zoom,box_select,lasso_select,reset")

p1 = figure(title="energy vs fat", **options)
p1.circle("fat_100g","energy_100g", color="blue", size=6, source=source)

p2 = figure(title="energy vs protein",y_range=p1.y_range, **options)
p2.diamond("proteins_100g","energy_100g", color="green", size=6, source=source)
#p3.circle("proteins_100g","energy_100g", size="sugars_100g", fill_color="fat_100g", source=source)

p3 = figure(title="energy vs sugar",y_range=p1.y_range, **options)
p3.square("sugars_100g","energy_100g", color="red", size=6, source=source)

p4 = figure(title="energy vs carbohydrates",y_range=p1.y_range, **options)
p4.triangle("carbohydrates_100g","energy_100g",color="orange", size=6, source=source)

p = gridplot([[ p1, p2],[ p3, p4]], toolbar_location="right")

show(p)

In the plots above different marker-symbols have been applied. There exist much more markers and other chart-types, that can be used to draw data. See for example [Bokeh, Plotting with basic glyphs](http://bokeh.pydata.org/en/latest/docs/user_guide/plotting.html#userguide-plotting) or [bokeh.plotting reference](https://bokeh.pydata.org/en/latest/docs/reference/plotting.html). Some of the most frequently applied chart-types are demonstrated in the following section.

## Other plot types
### Bar Plots
Bar plots are frequently applied, e.g. for the viusalisations of value-frequencies (histograms). For example, in the food-dataset, used in this notebook, one may be interested in the frequency of each category (column `main_category` in the dataframe). For this we first apply the *Pandas*-method `groupby()` for summarizing all products by their `main_category`. Then for all categories with more than one product, a histogram is plotted: 

In [16]:
categoryCount=germanFood.groupby(by="main_category")["brands"].count()

In [17]:
cats=categoryCount[categoryCount>1]

In [18]:
px = figure(title="Categories with more than one product",x_range=cats.index.tolist(),plot_width=700)
px.vbar(cats.index.tolist(),bottom=0,top=cats.values.tolist(),color="blue",width=0.8,alpha=0.6)
px.xaxis.major_label_orientation = "vertical"
show(px)