# DVN blogpost 1 - Visualisation experiements with Bokeh

For my first tool experiment, I have decided to explore a python tool called Bokeh. The developers describe it as:
> ***"Bokeh is a Python interactive visualization library that targets modern web browsers for presentation."***

For more information about Bokeh, visit http://bokeh.pydata.org/ or the user guide at http://bokeh.pydata.org/en/latest/docs/user_guide.html#userguide

## Quickstart exercises

Following is me just working through examples

In [5]:
from bokeh.plotting import figure, output_notebook, show

# some data
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

# output to static HTML file
output_notebook()

# create a new plot with a title and axis labels
p = figure(width=800, height=300, title="simple line example", x_axis_label="x", y_axis_label="y")

# add a line renderer with legend and line thickness
p.line(x, y, legend="Temp.", line_width=2)

# show results
show(p)

The basic steps to creating plots with the bokeh.plotting interface are:

 - Prepare some data (in this case plain python lists).
 - Tell Bokeh where to generate output (in this case using output_file(), with the filename "lines.html").
 - Call figure() to create a plot with some overall options like title, tools and axes labels.
 - Add renderers (in this case, Figure.line) for our data, with visual customizations like colors, legends and widths to the plot.
 - Ask Bokeh to show() or save() the results.

The ***bokeh.plotting*** interface is also quite handy if we need to customize the output a bit more by adding more data series, glyphs, logarithmic axis, and so on. It’s also possible to easily combine multiple glyphs together on one plot as shown below:

In [10]:
from bokeh.plotting import figure, output_notebook, show

# prepare some data
x = [0.1, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0]
y0 = [i**2 for i in x]
y1 = [10**i for i in x]
y2 = [10**(i**2) for i in x]

# output to notebook
output_notebook()

# create a new plot
p = figure(
    width=800,
    height=400,
    tools="pan,box_zoom,reset,save",
    y_axis_type="log", 
    y_range=[0.001, 10**11], 
    title="log axis example",
    x_axis_label='sections', 
    y_axis_label='particles'
)

# add some renderers
p.line(x, x, legend="y=x")
p.circle(x, x, legend="y=x", fill_color="white", size=8)
p.line(x, y0, legend="y=x^2", line_width=3)
p.line(x, y1, legend="y=10^x", line_color="red")
p.circle(x, y1, legend="y=10^x", fill_color="red", line_color="red", size=6)
p.line(x, y2, legend="y=10^x^2", line_color="orange", line_dash="4 4")

# show the results
show(p)

### Linked Panning & brushing example

Linking together various aspects of different plots can be a useful technique for data visualization. In Bokeh, such linkages are typically accomplished by sharing some plot component between plots. Below is an example that demonstrates linked panning (where changing the range of one plot causes others to update) by sharing range objects between the plots. Some other things to look out for in this example:

 - calling figure() multiple times to create multiple plots
 - using gridplot() to arrange several plots in an array
 - showing new glyphs using new glyph methods Figure.triangle and Figure.square
 - hiding the toolbar by setting toolbar_location to None
 - setting convenience arguments color (sets both line_color and fill_color) and alpha (sets both line_alpha and fill_alpha)

In [11]:
import numpy as np

from bokeh.layouts import gridplot
from bokeh.plotting import figure, output_notebook, show

# prepare some data
N = 100
x = np.linspace(0, 4*np.pi, N)
y0 = np.sin(x)
y1 = np.cos(x)
y2 = np.sin(x) + np.cos(x)

# output to notebook
output_notebook()

# create a new plot
s1 = figure(width=250, plot_height=250, title=None)
s1.circle(x, y0, size=10, color="navy", alpha=0.5)

# NEW: create a new plot and share both ranges
s2 = figure(width=250, height=250, x_range=s1.x_range, y_range=s1.y_range, title=None)
s2.triangle(x, y1, size=10, color="firebrick", alpha=0.5)

# NEW: create a new plot and share only one range
s3 = figure(width=250, height=250, x_range=s1.x_range, title=None)
s3.square(x, y2, size=10, color="olive", alpha=0.5)

# NEW: put the subplots in a gridplot
p = gridplot([[s1, s2, s3]], toolbar_location=None)

# show the results
show(p)

## Trying some plotting on my own data 

For my first trick I am using an FBI crime dataset. You can find it here: 
https://ucr.fbi.gov/crime-in-the-u.s/2013/crime-in-the-u.s.-2013/tables/1tabledatadecoverviewpdf/table_1_crime_in_the_united_states_by_volume_and_rate_per_100000_inhabitants_1994-2013.xls

I should mention I was lazy so I cleaned it up in Excel (i know, i know..) for convenience and exported it as a csv.

In [12]:
import numpy as np
import pandas as pd
from bokeh.plotting import figure, output_notebook, show

### Prepare the data...

In [13]:
crimeData = pd.read_csv("datasets/table_1_crime_in_the_united_states_by_volume_and_rate_per_100000_inhabitants_1994-2013.csv")

In [14]:
crimeData.head()

Unnamed: 0,Year,Population,ViolentCrimes,VC_rate,Murders,murder_rate,Rapes,Rape_rate,Robbery,Robbery_rate,Assaults,Assaults_rate,PropertyCrimes,PC_rate,Burglaries,Burglary_rate,theft,theft_rate,car_thefts,car_theft_rate
0,1994,260327021,1857670,713.6,23326,9.0,102216,39.3,618949,237.8,1113179,427.6,12131873,4660.2,2712774,1042.1,7879812,3026.9,1539287,591.3
1,1995,262803276,1798792,684.5,21606,8.2,97470,37.1,580509,220.9,1099207,418.3,12063935,4590.5,2593784,987.0,7997710,3043.2,1472441,560.3
2,1996,265228572,1688540,636.6,19645,7.4,96252,36.3,535594,201.9,1037049,391.0,11805323,4451.0,2506400,945.0,7904685,2980.3,1394238,525.7
3,1997,267783607,1636096,611.0,18208,6.8,96153,35.9,498534,186.2,1023201,382.1,11558475,4316.3,2460526,918.8,7743760,2891.8,1354189,505.7
4,1998,270248003,1533887,567.6,16974,6.3,93144,34.5,447186,165.5,976583,361.4,10951827,4052.5,2332735,863.2,7376311,2729.5,1242781,459.9


In [25]:
CRates = crimeData.drop(crimeData.columns[[1,2,4,6,8,10,12,14,16,18]], axis=1)

In [26]:
CRates.head()

Unnamed: 0,Year,VC_rate,murder_rate,Rape_rate,Robbery_rate,Assaults_rate,PC_rate,Burglary_rate,theft_rate,car_theft_rate
0,1994,713.6,9.0,39.3,237.8,427.6,4660.2,1042.1,3026.9,591.3
1,1995,684.5,8.2,37.1,220.9,418.3,4590.5,987.0,3043.2,560.3
2,1996,636.6,7.4,36.3,201.9,391.0,4451.0,945.0,2980.3,525.7
3,1997,611.0,6.8,35.9,186.2,382.1,4316.3,918.8,2891.8,505.7
4,1998,567.6,6.3,34.5,165.5,361.4,4052.5,863.2,2729.5,459.9


### Visualise the data

In [17]:
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import HoverTool

Setup the data:

In [18]:
x = abs(CRates['Year'])

Add some interactive tools:

In [181]:
hover = HoverTool(
        tooltips=[
            ("Year", "@x"),
            ("Rate", "@y")
            ]
    )

Configure plot:

In [133]:
CrimeP = figure(plot_width=600, 
                  plot_height=400,
                  tools=[hover],
                  y_range=[0.001, 1000], 
                  title="FBI Crimes rates",
                  x_axis_label='year', 
                  y_axis_label='rate')

Add rederers:

In [236]:
# Violent Crime
CrimeP.line(x, CRates['VC_rate'], legend="Voilent Crime", line_width=2)
CrimeP.circle(x, CRates['VC_rate'], legend="Voilent Crime", fill_color="white", size=8)

# Murder 
CrimeP.line(x, CRates['murder_rate'], legend="Murder", line_width=3)
CrimeP.circle(x, CRates['murder_rate'], legend="Murder", fill_color="white", line_color="green", size=6)

# Rape
CrimeP.line(x, CRates['Rape_rate'], legend="Rape", line_color="red", line_width=3)
CrimeP.circle(x, CRates['Rape_rate'], legend="Rape", fill_color="red", line_color="red", size=6)

# Robbery
CrimeP.line(x, CRates['Robbery_rate'], legend="Robbery", line_color="orange", line_dash="4 4")

#Assualt
CrimeP.line(x, CRates['Assaults_rate'], legend="Assaults", line_color="purple", line_width=3)
CrimeP.circle(x, CRates['Assaults_rate'], legend="Rape", fill_color="red", line_color="red", size=6)

#CrimeP.line(x, CRates['PC_rate'], legend="Property Crime", line_color="blue", line_width=3)
#CrimeP.circle(x, CRates['PC_rate'], legend="Property Crime", fill_color="red", line_color="red", size=6)

#CrimeP.line(x, CRates['Burglary_rate'], legend="Burglary", line_color="red", line_width=3)
#CrimeP.circle(x, CRates['Burglary_rate'], legend="Rape", fill_color="red", line_color="red", size=6)

#CrimeP.line(x, CRates['theft_rate'], legend="Theft", line_color="red", line_width=3)
#CrimeP.circle(x, CRates['theft_rate'], legend="Rape", fill_color="red", line_color="red", size=6)

#CrimeP.line(x, CRates['car_theft_rate'], legend="Car Theft", line_color="red", line_width=3)
#CrimeP.circle(x, CRates['car_theft_rate'], legend="Rape", fill_color="red", line_color="red", size=6)


I was running into errors which relate to a display limitation. The line plot can only handle 6 lines at a time which is a bit annoying, but whatever... I'm sure its just me being unfamiliar with how the library works.

In [237]:
show(CrimeP)

### Multiline plot  - more of the same

In [30]:
multiP = figure(plot_width=600,
                plot_height=400,
                x_axis_label='year',
                y_axis_label='rate')

In [31]:
multiP.line(CRates['Year'],
            CRates['VC_rate'],
            color='navy', 
            alpha=0.5 )
multiP.line(CRates['Year'],
            CRates['murder_rate'], 
            color='navy', 
            alpha=0.5 )
multiP.line(CRates['Year'],
            CRates['Rape_rate'], 
            color='navy', 
            alpha=0.5 )
multiP.line(CRates['Year'],
            CRates['Robbery_rate'], 
            color='navy', 
            alpha=0.5 )
multiP.line(CRates['Year'],
            CRates['Assaults_rate'], 
            color='navy', 
            alpha=0.5 )
multiP.line(CRates['Year'],
            CRates['PC_rate'], 
            color='navy', 
            alpha=0.5 )
#multiP.line(CRates['Year'],
#            CRates['Burglary_rate'], 
#            color='navy', 
#           alpha=0.5 )
#multiP.line(CRates['Year'],
#            CRates['theft_rate'], 
#            color='navy', 
#            alpha=0.5 )
#multiP.line(CRates['Year'],
#            CRates['car_theft_rate'], 
#            color='navy', 
#            alpha=0.5 )

In [32]:
show(multiP)

## Horizon Chart 

In [136]:
from bokeh.charts import Horizon, output_notebook, show
output_notebook()

Import data

In [145]:
crimeData = pd.read_csv("datasets/table_1_crime_in_the_united_states_by_volume_and_rate_per_100000_inhabitants_1994-2013.csv", parse_dates=['Year'])

In [146]:
CRates = crimeData.drop(crimeData.columns[[1,2,4,6,8,10,12,14,16,18]], axis=1)

In [147]:
CRates.head()

Unnamed: 0,Year,VC_rate,murder_rate,Rape_rate,Robbery_rate,Assaults_rate,PC_rate,Burglary_rate,theft_rate,car_theft_rate
0,1994-01-01,713.6,9.0,39.3,237.8,427.6,4660.2,1042.1,3026.9,591.3
1,1995-01-01,684.5,8.2,37.1,220.9,418.3,4590.5,987.0,3043.2,560.3
2,1996-01-01,636.6,7.4,36.3,201.9,391.0,4451.0,945.0,2980.3,525.7
3,1997-01-01,611.0,6.8,35.9,186.2,382.1,4316.3,918.8,2891.8,505.7
4,1998-01-01,567.6,6.3,34.5,165.5,361.4,4052.5,863.2,2729.5,459.9


In [148]:
CRatesSliced = CRates.drop(CRates.columns[[1,6,7,8,9]], axis=1)

In [149]:
CRatesSliced.head()

Unnamed: 0,Year,murder_rate,Rape_rate,Robbery_rate,Assaults_rate
0,1994-01-01,9.0,39.3,237.8,427.6
1,1995-01-01,8.2,37.1,220.9,418.3
2,1996-01-01,7.4,36.3,201.9,391.0
3,1997-01-01,6.8,35.9,186.2,382.1
4,1998-01-01,6.3,34.5,165.5,361.4


Build chart:

In [162]:
data = dict([
    ('Year', CRatesSliced['Year']),
    ('Murder', CRatesSliced['murder_rate']),
    ('Rape', CRatesSliced['Rape_rate']),
    ('Robbery', CRatesSliced['Robbery_rate']),
    ('Assault', CRatesSliced['Assaults_rate'])]
)

hp = Horizon(data, x='Year', 
             plot_width=800, 
             plot_height=400,
             title="horizon plot", color='red')

In [163]:
show(hp)

## Area Chart 

In [164]:
from bokeh.charts import Area, show, output_notebook
output_notebook()

In [178]:
data = dict([
    ('Murder', CRatesSliced['murder_rate']),
    ('Robbery', CRatesSliced['Robbery_rate']),
    ('Assault', CRatesSliced['Assaults_rate']),
    ('Rape', CRatesSliced['Rape_rate']),]
)

To create an area chart it seems like you have to create a dictionary of tuples, each of which containing a different column from the dataframe. This took me a while to realise!.. **newbie beware!!**

In [179]:
Ap = Area(data, 
          title="Area Chart",
          legend="top_right",
          xlabel='Year',
          ylabel='Rate',
          plot_width=800, 
          plot_height=400,)

In [180]:
show(Ap)

##  Scatter plot

In [47]:
from bokeh.charts import Scatter, output_notebook, show

In [43]:
p = Scatter(CRatesSliced, x='Year', y='murder_rate', title="Crime over Time",
            xlabel="Year", ylabel="Crime Rate")

In [46]:
show(p)

## Scatter abelone dataset 

If you would like to play along, you can find the dataset here: https://archive.ics.uci.edu/ml/datasets/Abalone        

In [1]:
import pandas as pd
from bokeh.charts import Scatter, output_notebook, show
from bokeh.models import HoverTool, WheelZoomTool, BoxZoomTool
output_notebook()

In [2]:
Abelone = pd.read_csv("datasets/abalone.csv",
                     names=["Sex", "Length", "Diam", "Height", "Whole", "Shucked", "Viscera", "Shell", "Rings"])

In [3]:
Abelone.head()

Unnamed: 0,Sex,Length,Diam,Height,Whole,Shucked,Viscera,Shell,Rings
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7


In [4]:
hover = HoverTool(
        tooltips=[
            ("Sex", "@Sex"),
            ("Length", "$x"),
            ("Shell", "$y"),]
)

In [5]:
Abe = Scatter(Abelone, 
              x='Length', 
              y='Shell', 
              color='Sex',
              marker='Sex',
              title='Abelone Dataset Color and Marker by Sex', 
              legend=True, 
              tools=[hover])

Add extra interactive tools:

In [6]:
Abe.add_tools(BoxZoomTool())
Abe.add_tools(WheelZoomTool())

Now lets display the chart

In [8]:
show(Abe)