<table style="float:left; border:none">
   <tr style="border:none">
       <td style="border:none">
           <a href="http://bokeh.pydata.org/">     
           <img 
               src="assets/images/bokeh-transparent.png" 
               style="width:50px"
           >
           </a>    
       </td>
       <td style="border:none">
           <h1>Bokeh Tutorial</h1>
       </td>
   </tr>
</table>

<div style="float:right;"><h2>01. High Level Charts</h2></div>

This section covers the `bokeh.charts` interface, which is a high-level API that is especially useful for exploratory data analysis (for instance, in a Jupyter notebook). It provides functions for quickly producing many standard chart types, often with a single line of code. We will look at the following types in this notebook:

* [Scatter Plot](#Scatter-Plot)
* [Bar Chart](#Bar-Chart)
* [Histogram](#Histogram)
* [Box Plot](#Box-Plot)

In [1]:
from bokeh.io import output_notebook, show
output_notebook()

In [3]:
from IPython.display import display
import pandas as pd
pd.set_option('display.max_rows', 10)
pd.set_option('display.max_columns', 10)

# Scatter Plot

A high-level scatter plot is provided by [`bokeh.charts.Scatter`]().

For this section will use the "iris" data set. First let's import it and take a look at a few rows:

In [6]:
from bokeh.sampledata.iris import flowers
display(flowers)
display(flowers.dtypes)
display(flowers.species.nunique())

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica


sepal_length    float64
sepal_width     float64
petal_length    float64
petal_width     float64
species          object
dtype: object

3

In [7]:
from bokeh.charts import Scatter

A basic scatter chart takes the data (in this case a pandas DataFrame) as the first argument, and specifies the `x` and `y` coordinates for the scatter as the names of columns in the data.

In [8]:
p = Scatter(flowers, x='petal_length', y='petal_width')
show(p)

By passing a column name for the `color` parameter, you can make `Scatter` automatically color the markers according to the groups in that column. Let's also add a legend by specify its location as the value of a `legend` paramter (in this case `"top_left"`)

In [22]:
p = Scatter(flowers, x='petal_length', y='petal_width', color='species', legend='top_left', marker='species')
show(p)

By passing a column name for the `marker` parameter, you can make `Scatter` automatically vary the marker shapes according to the groups in that column. Let's try that as an exercise.

In [6]:
# EXERCISE: vary the marker shape by passing a column name as the `marker` keyword argument


# Bar Chart

A high-level bar chart is provided by [`bokeh.charts.Bar`]()

For this section, we will use the "autompg" data set. Let's import it and take a quick look:

In [17]:
from bokeh.sampledata.autompg import autompg
autompg.head()
autompg.describe(include='all')

Unnamed: 0,mpg,cyl,displ,hp,weight,accel,yr,origin,name
count,392.000000,392.000000,392.00000,392.000000,392.000000,392.000000,392.000000,392.000000,392
unique,,,,,,,,,301
top,,,,,,,,,ford pinto
freq,,,,,,,,,5
mean,23.445918,5.471939,194.41199,104.469388,2977.584184,15.541327,75.979592,1.576531,
...,...,...,...,...,...,...,...,...,...
min,9.000000,3.000000,68.00000,46.000000,1613.000000,8.000000,70.000000,1.000000,
25%,17.000000,4.000000,105.00000,75.000000,2225.250000,13.775000,73.000000,1.000000,
50%,22.750000,4.000000,151.00000,93.500000,2803.500000,15.500000,76.000000,1.000000,
75%,29.000000,8.000000,275.75000,126.000000,3614.750000,17.025000,79.000000,2.000000,


In [18]:
from bokeh.charts import Bar

A basic bar chart takes the data (again a DataFrame) as the first value, as well as column names for:

* `label` - a column to group to label the x-axis
* `values` - a column to aggregate values for each group, to give the bar heights
* `agg` - the name of an aggregation to perform over the values (e.g., `"mean"`, `"max"`, etc.)

A simple example that also specifies some other properties such as `title` and `legend` is shown below:

In [20]:
p = Bar(autompg, label='cyl', values='mpg', agg='max', color='cyl',
        title="Max MPG by CYL", legend=None, tools='crosshair')
show(p)

By passing another column name as the `group` parameter, the aggregations can be further subdivided by the groups in that column, and the bars grouped visually. The example below demonstrates this, as well as adding a legend by specifying its location:

In [21]:
p = Bar(autompg, label='yr', values='mpg', agg='median', group='origin', 
        title="Median MPG by YR, grouped by ORIGIN", legend='top_left', tools='crosshair')
show(p)

Similarly, bars for subgroups can be stacked visually, by providing a column name for the `stack` parameter. Let's try that as an exercise.

In [23]:
# EXERCISE: change the chart above to stack the bars with title "Median MPG by YR, stacked by ORIGIN"
p = Bar(autompg, label='yr', values='mpg', agg='median', stack='origin', 
        title="Median MPG by YR, grouped by ORIGIN", legend='top_left', tools='crosshair')
show(p)

# Histogram

A high-level Histogram is provided by [`bokeh.charts.Histogram`]()

For this section, we will construct our own synthetic data set that has values generated from two different probability distributions. 

In [25]:
import pandas as pd
import numpy as np

# build some distributions
mu, sigma = 0, 0.5
normal = pd.DataFrame({'value': np.random.normal(mu, sigma, 1000), 'type': 'normal'})
lognormal = pd.DataFrame({'value': np.random.lognormal(mu, sigma, 1000), 'type': 'lognormal'})

In [28]:
normal.head()
normal.dtypes
normal.shape

(1000, 2)

In [29]:
# create a pandas data frame
df = pd.concat([normal, lognormal])
df[995:1005]

Unnamed: 0,type,value
995,normal,0.421019
996,normal,0.536102
997,normal,0.73327
998,normal,-0.351141
999,normal,0.91887
0,lognormal,0.745025
1,lognormal,1.542816
2,lognormal,0.990685
3,lognormal,0.422799
4,lognormal,2.426287


In [30]:
from bokeh.charts import Histogram

A basic histogram takes the data as the first parameter, and a column name as the `values` parameter. Optionally, you can also specify the number of bins to use by giving a value for the `bins` parameter. The example below shows the distribution of ***all*** the values (both the "normal" and "lognormal" values). 

In [31]:
hist = Histogram(df, values='value', bins=30)
show(hist)

It's also possible to generate multiple histograms at once by grouping the data. The column to group by is specified by the `color` parameter (and the histogram for each group is colored differently automatically). Let's try that as an exercise.

In [36]:
# EXERCISE: generate histograms for each "type" of distribution, and add a legend to the top left.
hist = Histogram(df, values='value', label='type', bins=30, color = 'type', legend="top_left")
show(hist)

# Box Plot

A high-level box plot is provided by [`bokeh.charts.BoxPlot`]()

For this section we will use the "iris" data set again. 

In [37]:
from bokeh.charts import BoxPlot

A basic box plot takes the data as the first value, as well as column names for:

* `label` - a column to group to label the x-axis
* `values` - a column to aggregate values for each group

A simple example that also specifies some other properties such as `title` and `legend` is shown below:

In [38]:
p = BoxPlot(flowers, label='species', values='petal_width', tools='crosshair', color='#aa4444',
            xlabel='', ylabel='petal width, mm', title='Distributions of petal widths')
show(p)

Instead of a single color, the box and whiskers groups can be colored by grouping one of the columns. This is done by passing a column name as the `color` parameter. Let's try that as an exercise.

In [40]:
# EXERCISE: color the boxes by "species" and add a legend to the top left
p = BoxPlot(flowers, label='species', values='petal_width', color = 'species', tools='crosshair', 
            xlabel='', ylabel='petal width, mm', legend='top_left', title='Distributions of petal widths')
show(p)

---

# Further reading



http://nbviewer.jupyter.org/github/bokeh/bokeh/tree/0.11.1/examples/charts/file/

http://nbviewer.jupyter.org/github/bokeh/bokeh/tree/0.11.1/examples/howto/charts/

http://nbviewer.jupyter.org/github/bokeh/bokeh-demos/blob/master/presentations/2016-03-pydata-strata/notebooks/Charts.ipynb

http://nbviewer.jupyter.org/github/bokeh/bokeh-demos/blob/master/presentations/2016-03-pydata-strata/notebooks/Charts%20Demo.ipynb