# Table of Contents
* [Learning Objectives:](#Learning-Objectives:)
* [Bokeh Charts](#Bokeh-Charts)
	* [Scatter](#Scatter)
	* [BoxPlot](#BoxPlot)
	* [Bar](#Bar)
	* [Histogram](#Histogram)
	* [Legacy Charts](#Legacy-Charts)


# Learning Objectives:

After completion of this module, learners should be able to:

* generate statistical plots using the high-level Charts interface
* explain the mapping between Pandas DataFrames and Chart options
* plot TimeSeries data

In [None]:
from bokeh.io import output_notebook, show
output_notebook()

# Bokeh Charts

Charts provide a high-level interface to basic statistical plotting. Charts can simplify the generation of figures and provide more powerful data manipulation automatically through the use of Pandas DataFrames.

There are serveral Charts available with Bokeh version 0.11. Here, we'll cover
* `Scatter`
* `Bar`
* `Histogram`
* `Box`
* `Timeseries`

See the [reference documentation](http://bokeh.pydata.org/en/latest/docs/reference/charts.html) for the complete list of Charts available in version 0.11.

## Scatter

In this example we are going to import flower morphology data and search for correlations. The `flowers` data set is a Pandas DataFrame. To plot as many dimensions as possible we are going to let Bokeh assign the color of the glyph based on the species.

In [None]:
from data.bokeh.iris import flowers
flowers.head()

In [None]:
from bokeh.charts import Scatter

plot = Scatter(flowers, x='petal_length', y='petal_width',
               color='species',
               legend='top_left', title='Flower Morphology')
show(plot)

The glyph in a Scatter chart can be changed with the `marker` keyword. The full list of available markers for Scatter plots is available through the `bokeh.models.markers` module.

In [None]:
plot = Scatter(flowers, x='petal_length', y='petal_width', 
               color='species', marker='triangle',
               legend='top_left', title='Flower Morphology')
show(plot)

Because everything in Bokeh is an object changes to the style can be made by first selecting the object you want through the special `select` member function and passing the class name of the objects you want to select. `select` returns a list.

In this example the Scatter chart method did the work of adding three separate Triangle glyphs to the low-level `Plot` object.

* WARNING: In this example there is no easy way to distinguish between the three triangle objects. If more control is required it is best to use the Plotting or Model interfaces.

In [None]:
from bokeh.models.markers import Triangle
m_select = plot.select(Triangle)
m_select

In [None]:
t1=m_select[0]
t1.size = 30
t1.line_color='black'
show(plot)

## BoxPlot

Box plots are used to represent a statistical summary of data in 5 dimensions: minimum, first quartile, median, third quartile, and maximum. The red dots are the outliers. The `BoxPlot` chart does the work of determining the quantiles, mean and outliers.

Nice example on the [autompg dataset](http://bokeh.pydata.org/en/latest/docs/gallery/boxplot_chart.html)

In [None]:
from data.bokeh.iris import flowers

from bokeh.charts import BoxPlot
p = BoxPlot(
    flowers, label='species', values='petal_width',
    xlabel='', ylabel='petal width, mm', title='Distribution of petal widths',
    color='aqua',
)
show(p)

## Bar

By using a Bar chart we can aggregate data across columns in a DataFrame. We will use the Miles-per-Gallon data set, which has data from several models of cars made from 1970 to 1982.

`origin` refers to the region where the model was manufactured.

* `1` is US
* `2` is Europe
* `3` is Asia

In the call to Bar below `agg` is the aggregation algorightm. The possible algorightms are

* `sum`  (default)
* `mean`
* `count`
* `nunique`
* `median`
* `min`
* `max`

In [None]:
from data.bokeh.autompg import data as autompg
originStr = {1:'US', 2:'Europe', 3:'Asia'}
autompg['origin']=autompg['origin'].map(lambda x:originStr[x])
autompg.head()

In [None]:
from bokeh.charts import Bar
p = Bar(
    autompg, label='yr', values='mpg', agg='median', 
    group='origin', # Use the Pandas groupby method
    title="Median MPG by YR, grouped by ORIGIN", legend='top_left'
)
show(p)

This graph is a little easier to read as a `stacked` Bar graph.

In [None]:
from bokeh.charts import Bar
p = Bar(
    autompg, label='yr', values='mpg', agg='mean', 
    stack='origin', # Use the stack feature
    title="Mean MPG by YR, stacked by ORIGIN", legend='top_left'
)
show(p)

## Histogram

In Histograms the `color` keyword provides group-by functionality for color. Data will be binned separately for each unique entry in the chosen `color` column.

Nice example on the [autompg dataset](http://bokeh.pydata.org/en/latest/docs/gallery/histograms_chart.html)

In [None]:
import pandas as pd
import numpy as np

# build some distributions
mu, sigma = 0, 0.5
normal = pd.DataFrame({'value': np.random.normal(mu, sigma, 1000), 'type': 'normal'})
lognormal = pd.DataFrame({'value': np.random.lognormal(mu, sigma, 1000), 'type': 'lognormal'})

# create a pandas data frame
df = pd.concat([normal, lognormal])
df[995:1005]

In [None]:
from bokeh.charts import Histogram
hist = Histogram(df, values='value', color='type', bins=50, legend=True)
show(hist)

In [None]:
from data.bokeh.autompg import data as autompg

plot = Histogram(autompg, values='hp', color='cyl',
              title="HP Distribution (color grouped by CYL)",
              legend='top_right')

show(plot)

## Time Series

Time series plots require that a DataFrame be provided with at least one column of dtype `datetime64`.

A TimeSeries plot will be generated from stock data from Apple, Microsoft and IBM with support for DateTime indexes.

In [None]:
from data.bokeh.stocks import aapl,msft,ibm
from bokeh.charts import TimeSeries
import pandas as pd


stocks = pd.DataFrame( {'AAPL':aapl['Adj Close'],
                          'MSFT':msft['Adj Close'],
                          'IBM':ibm['Adj Close'],
                          'Date':aapl['Date']})

stocks.head()

In [None]:
plot = TimeSeries(stocks,
    x='Date', y=['AAPL','IBM','MSFT'],
    legend=True,
    title='Stocks', ylabel='Close Price')

show(plot)