<img src='../../img/anaconda-logo.png' align='left' style="padding:10px">
<br>
*Copyright Continuum 2012-2016 All Rights Reserved.*

# Bokeh Charts Exercises

## Table of Contents
* [Bokeh Charts Exercises](#Bokeh-Charts-Exercises)
	* [Set-Up](#Set-Up)
* [Box Plots Exercises](#Box-Plots-Exercises)
* [Bar Chart Exercises](#Bar-Chart-Exercises)
	* [Melting DataFrames](#Melting-DataFrames)
	* [Plotting the Data](#Plotting-the-Data)
	* [Concatenation](#Concatenation)
	* [Additional Plot](#Additional-Plot)


## Set-Up

In [1]:
from bokeh.io import output_notebook, show
output_notebook()

# Box Plots Exercises

The file `data/Top5000Population.csv` lists the populations of each of the top 5000 most populated cities in the US. Load the data into a DataFrame, and create a new DataFrame containing data only for cities in Maine, South Dakota and Rhode Island.

In [3]:
import pandas as pd
cities = pd.read_csv('data/Top5000Population.csv', thousands=',', encoding='iso-8859-1')

In [4]:
cities.head()

Unnamed: 0,city,state,population
0,New York,NY,8363710
1,Los Angeles,CA,3833995
2,Chicago,IL,2853114
3,Houston,TX,2242193
4,Phoenix,AZ,1567924


In [5]:
states = cities['state'].isin(['SD','ME','RI'])
subset_cities = cities.loc[states]
subset_cities.head(7)

Unnamed: 0,city,state,population
135,Providence,RI,171557
149,Sioux Falls,SD,154997
339,Warwick,RI,84483
372,Cranston,RI,79980
434,Pawtucket,RI,71765
488,Rapid City,SD,65491
514,Portland,ME,62561


Create a box plot of the `population` data for these three states.
* Each state has at least one city in this file. Most have many cities.
* Each bar plot will represent the distribution "across cities", or the distribution of "population sizes" of the cities in that state.

In [6]:
from bokeh.charts import BoxPlot
plot = BoxPlot(subset_cities, 'state', 'population', color='state')
show(plot)

# Bar Chart Exercises

Use the Olympic medals data set at `data/medals_messy.csv`.

Plot the total medals awarded for each county that won at least one medal.

**Hint**: `medals` is a *wide* DataFrame. Bokeh charts prefer *long* DataFrames.

In [7]:
import pandas as pd
from bokeh.charts.utils import df_from_json
from bokeh.sampledata.olympics2014 import data
medals = df_from_json(data)

In [8]:
medals.head()

Unnamed: 0,abbr,bronze,gold,silver,total,name
0,ALB,0,0,0,0,Albania
1,AND,0,0,0,0,Andorra
2,ARG,0,0,0,0,Argentina
3,ARM,0,0,0,0,Armenia
4,AUS,1,0,2,3,Australia


## Melting DataFrames

Use pandas `melt` to reduce the three columns `gold`, `silver`, and `bronze`, into one column `medal` which contains string values `gold`, `silver`, and `bronze`.

In [17]:
df = pd.melt(medals, 
             id_vars='name', 
             value_vars=['bronze','silver','gold'], 
             var_name='medal', value_name='count')
df.head()
df.sort_values('count', ascending=False)

Unnamed: 0,name,medal,count
81,United States,bronze,10
54,Netherlands,bronze,8
152,Russian Fed.,silver,8
100,Canada,silver,8
201,Germany,gold,8
230,Norway,gold,8
56,Norway,bronze,7
65,Russian Fed.,bronze,7
141,Netherlands,silver,6
239,Russian Fed.,gold,6


## Plotting the Data

Create a bar chart using only data for which `count` is 1 or higher. Plot the total count of medals (vertical) versus the country names (horizontal). Stack the bars using the `medal` column.

In [14]:
from bokeh.charts import Bar
plot = Bar( df[df['count']>0], 
            label='name', 
            values='count', 
            stack='medal', 
           legend=True)
show(plot)

## Concatenation

Even better would be if we had only 3 rows for each country, one for each medal type, and a new column "count" that listed the total count of that medal type. Create a new DataFrame that contains precisely that structure:

In [None]:
dfs = []
for i in ['bronze', 'silver', 'gold']:
    tmp = medals.loc[medals['total']>0, [i, 'name']].copy()
    tmp['medal'] = i
    tmp = tmp.rename(columns={i:'count'})
    dfs.append(tmp)
long_df = pd.concat(dfs)
long_df.head(15)

## Additional Plot

Use the new DataFrame generated above to create a new stacked bar chart, again using `medal` column to stack.

In [None]:
from bokeh.charts import Bar
plot = Bar(long_df, label='name', values='count', stack='medal', legend=True)
show(plot)

See the [Bokeh Gallery](http://bokeh.pydata.org/en/0.11.1/docs/gallery/stacked_bar_chart.html) for another solution.

---

*Copyright Continuum 2012-2016 All Rights Reserved.*