# Plotting in Bokeh

Over the last few weeks, we've spend a lot of time analyzing our text data and meticulously pulling out important information. Even though our spreadsheets and notebook tables are useful, our data can best be showcased using cool visualizations. The Bokeh library in Python allows us to create powerful visualizations of our data quite easily.

# Bokeh vs. Gephi


As it turns out, not everyone's data can be visualizated in its fullest potential in a network! In certain data sets where the classic network-style connections do not exist, it is better to visualize points using other means. Bokeh gives us the ability to do so. Feel free to discuss with Adam if you are unsure which program is best suited for your data set!

In [104]:
!pip install bokeh 



In [105]:
import numpy as np
import pandas as pd
from datascience import *
from IPython.core.display import display, HTML
import codecs

from bokeh.plotting import figure, show, output_file
from bokeh.palettes import brewer
from bokeh.charts import Scatter, output_file, show
from bokeh.charts import Bar, output_file, show

### Uploading CSV's

![](google_sheets_coffee.png)

In [106]:
# We are reading the CSV that contains Clayton's data and selecting a range of points
coffee = Table().read_table('CTCPcc.csv')
coffee = coffee.take(range(6, 111))
coffee

"Coffee, tea, and cocoa: Per capita availability",Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Filename: CTCSP,Unnamed: 10
1910,92.407,,,9.2,7.7,9.2,7.7,1.0,1.2,0.9
1911,93.863,,,8.4,7.0,8.4,7.0,1.1,1.4,1.0
1912,95.335,,,10.8,9.1,10.8,9.1,1.0,1.6,1.2
1913,97.225,,,9.0,7.5,9.0,7.5,0.9,1.6,1.2
1914,99.111,,,9.2,7.7,9.2,7.7,0.9,1.7,1.2
1915,100.546,,,10.6,8.9,10.6,8.9,1.0,1.9,1.4
1916,101.961,,,11.5,9.6,11.5,9.6,1.0,2.3,1.7
1917,103.414,,,12.1,10.2,12.1,10.2,1.2,3.6,2.7
1918,104.55,,,10.1,8.4,10.1,8.4,1.2,3.3,2.4
1919,105.063,,,11.8,9.9,11.8,9.9,0.6,3.5,2.6


In [107]:
# In this cell, we are using the "relabeled" function to rename the columns into names that are more friendly/relevant
original_names = coffee.labels
new_names = ['Year', 'Population (millions)', 'Instant (GBE)', 'Instant (Retail)', 'Regular (GBE)', 'Regular (Retail)', 'Total (GBE)', 'Total (Retail)', 'Tea (DLE)', 'Cocoa (BE)', 'Cocoa (CLE)']
coffee = coffee.relabeled(original_names, new_names)
c2 = coffee.to_df()
coffee

Year,Population (millions),Instant (GBE),Instant (Retail),Regular (GBE),Regular (Retail),Total (GBE),Total (Retail),Tea (DLE),Cocoa (BE),Cocoa (CLE)
1910,92.407,,,9.2,7.7,9.2,7.7,1.0,1.2,0.9
1911,93.863,,,8.4,7.0,8.4,7.0,1.1,1.4,1.0
1912,95.335,,,10.8,9.1,10.8,9.1,1.0,1.6,1.2
1913,97.225,,,9.0,7.5,9.0,7.5,0.9,1.6,1.2
1914,99.111,,,9.2,7.7,9.2,7.7,0.9,1.7,1.2
1915,100.546,,,10.6,8.9,10.6,8.9,1.0,1.9,1.4
1916,101.961,,,11.5,9.6,11.5,9.6,1.0,2.3,1.7
1917,103.414,,,12.1,10.2,12.1,10.2,1.2,3.6,2.7
1918,104.55,,,10.1,8.4,10.1,8.4,1.2,3.3,2.4
1919,105.063,,,11.8,9.9,11.8,9.9,0.6,3.5,2.6


### Time Series

In this section, we will be plotting a line graph of the population vs. time. Documentation on specific Bokeh functions is available on the site: http://bokeh.pydata.org/en/latest/docs/user_guide.html#userguide

Try to parse through the code and see what variables you can change around to modify the input data/labels. 

In [108]:

def datetime(x):
    return np.array(x, dtype=np.datetime64)

a = figure(x_axis_type="datetime", title="Population vs Year")
a.grid.grid_line_alpha=0.3
a.xaxis.axis_label = 'Year'
a.yaxis.axis_label = 'Population'

a.line(datetime(c2['Year']), c2['Population (millions)'], color='#A6CEE3', legend='Population')
a.legend.location = "top_left"

show(a)

### Time Series of Population Growth Rates

In [109]:
#In this cell, we are creating a table that helps us calculate the percent change from year to year
c3 = c2.copy()
c3 = c3[['Year', 'Population (millions)']]
c3 = c3.set_index('Year')
c3['Population (millions)'] = c3['Population (millions)'].apply(pd.to_numeric)
c3['Population (millions)'] = c3.pct_change(1)['Population (millions)']
c3


Unnamed: 0_level_0,Population (millions)
Year,Unnamed: 1_level_1
1910,
1911,0.015756
1912,0.015682
1913,0.019825
1914,0.019398
1915,0.014479
1916,0.014073
1917,0.014251
1918,0.010985
1919,0.004907


In [110]:
# plotting the growth rate
pop_growth_plot = figure(x_axis_type="datetime", title="Population vs Year")
pop_growth_plot.grid.grid_line_alpha=0.3
pop_growth_plot.xaxis.axis_label = 'Year'
pop_growth_plot.yaxis.axis_label = 'Population (percent change)'

pop_growth_plot.line(datetime(c2['Year']), c3['Population (millions)'], color='#A6CEE3', legend='Population')
pop_growth_plot.legend.location = "top_left"

show(pop_growth_plot)

### Time Series of Multiple 5 Year Growth Rates

For this visualization, instead of just having one line, we are going to plot multiple lines and compare them. We will be using very similar methods to before to obtain the table. This time however, we will plot the growth rate percents all together so we can compare them visually. 

In [111]:
# Calculating 5 year growth rates
c4 = c2.copy()
c4 = c4[['Population (millions)', 'Instant (GBE)', 'Regular (GBE)', 'Total (GBE)', 'Cocoa (CLE)']]
c4['Instant (GBE)'] = [0 if x == 'nan'  else x for x in c4['Instant (GBE)']]
c4 = c4.apply(pd.to_numeric).pct_change(5)
c4

Unnamed: 0,Population (millions),Instant (GBE),Regular (GBE),Total (GBE),Cocoa (CLE)
0,,,,,
1,,,,,
2,,,,,
3,,,,,
4,,,,,
5,0.088078,,0.152174,0.152174,0.555556
6,0.086275,,0.369048,0.369048,0.700000
7,0.084743,,0.120370,0.120370,1.250000
8,0.075341,,0.122222,0.122222,1.000000
9,0.060054,,0.282609,0.282609,1.166667


In [112]:
# Plotting the 5 Year Growth rates in one graph from the table above
pop_growth_plot = figure(x_axis_type="datetime", title="Growth Rates over Time")
pop_growth_plot.grid.grid_line_alpha=0.3
pop_growth_plot.xaxis.axis_label = 'Year'
pop_growth_plot.yaxis.axis_label = 'Percent Change (in decimal)'

pop_growth_plot.line(datetime(c2['Year']), c4['Population (millions)'], color='#A6CEE3', legend='Population')
pop_growth_plot.line(datetime(c2['Year']), c4['Instant (GBE)'], color='#B2DF8A', legend='Instant')
pop_growth_plot.line(datetime(c2['Year']), c4['Regular (GBE)'], color='#33A02C', legend='Regular')
pop_growth_plot.line(datetime(c2['Year']), c4['Total (GBE)'], color='#FB9A99', legend='Total Coffee')
pop_growth_plot.legend.location = "top_left"

show(pop_growth_plot)

# Other Types of Visualizations

### Scatter 

In [113]:
#This will draw a scatter plot of the Population and Cocoa (CLE) colums from c4
#The first argument is the table from which you are pulling, and the x and y allow you to specify which columsn you want
p = Scatter(c4, x='Population (millions)', y='Cocoa (CLE)',
            title="Scatter - Cocoa (CLE) vs Population (millions)", legend="top_left",
            xlabel="Population (millions)", ylabel="Cocoa (CLE)")
show(p)

Scatter plots in Bokeh give you many possible features and options to play around with; check out all the options at http://bokeh.pydata.org/en/latest/docs/user_guide/charts.html#scatter-plots

### Bar

In [114]:
c5 = c2.copy()
c5 = c5[['Cocoa (CLE)']]
c5 = c5.apply(pd.to_numeric)
p = Bar(c5, 'Cocoa (CLE)', values='Cocoa (CLE)', title="Bar - Cocoa", legend="")
show(p)

Bar plots in Bokeh give you many possible features and options to play around with; check out all the options at 
http://bokeh.pydata.org/en/latest/docs/user_guide/charts.html#bar-charts

Bokeh also has similarly easy in-built functions for box plots and histograms!

# Final Thoughts

We have now looked at several ways to create visualizations of your data, both in Gephi and in Bokeh. Test out visualizations on your data sets, play around with the parameters and experiment until you find one that really makes your data pop! Bokeh itself has many, many more possibilities; browse the site's gallery, located at http://bokeh.pydata.org/en/latest/docs/gallery.html, in order to see all the possible options. We can help you implement functions shown in the gallery. If you feel like none of these visualizations work for you, then talk with us and we can try and help you find something that fits!