# GapMinder project 
https://demo.bokehplots.com/apps/gapminder
https://www.youtube.com/watch?v=9FlUFLmaWvY

### Import the data into a Pandas dataframe

In [1]:
import pandas as pd

In [2]:
data = pd.read_csv('./data/gapminder.csv',thousands=',',index_col='Year')

In [3]:
data.head()

Unnamed: 0_level_0,Country,life,population,income,region
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1800,Afghanistan,28.211,3280000.0,603.0,South Asia
1801,Afghanistan,28.200753,,603.0,South Asia
1802,Afghanistan,28.190507,,603.0,South Asia
1803,Afghanistan,28.18026,,603.0,South Asia
1804,Afghanistan,28.170013,,603.0,South Asia


### Add Bokeh objects to our environment

In [4]:
# Load Bokeh JavaScript to our current notebook
from bokeh.io import output_notebook
output_notebook()

In [5]:
#Import the function to show our bokeh objects
from bokeh.io import show

In [6]:
#Import the object used to construct visualizations
from bokeh.plotting import figure

### Construct our first figure...
#### First, wrangle the data for easy plotting
We are going to subset our data to just get a list of population in 2010, which we can do using Pandas `loc[2010]` to grab rows where the index = 2010, and then from that just grab the column labelled `Population`. The `head()` a the end displays the first 5 records.

In [7]:
#See how we grab just the 2010 population data
data.loc[2010]['population'].head()

Year
2010    27962207.0
2010     2901883.0
2010    36036159.0
2010       84419.0
2010    21219954.0
Name: population, dtype: float64

Now we construct a Bokeh figure from the data. Here we'll plot life expectancey (the `life` column) against income. We'll extract the data we want (as Pandas series) using the method above, again using just 2010 data.

In [8]:
#Create the figure object, calling it 'p'
p = figure()

#Plot life expectancy (y) against  income (x), showing data as circles
p.circle(x=data.loc[2010]['income'],y=data.loc[2010]['life'])

#Show the plot
show(p)

The plot is a bit tall. We can make it shorter by adding style settings to the figure...

In [9]:
p = figure(height=200)
p.circle_x(x=data.loc[2010]['income'],y=data.loc[2010]['life'])
show(p)

### RECAP so far...
* The **data** used to drive our plot are Pandas series, but Bokeh can use nearly any Python collection object: lists, tuples, NumPy arrays, etc. 
* The figure has other style attributes you can apply, see the Keyword Args section at https://bokeh.pydata.org/en/latest/docs/reference/plotting.html
* You can use other **markers** instead of circles; try 'line', 'cross', 'circle_x'. (See top of page linked above.) Each type of marker has its own set of properties and settings. For example, see those for the circle marker: https://bokeh.pydata.org/en/latest/docs/reference/plotting.html#bokeh.plotting.figure.Figure.circle

► For more exercises on plotting, see the `02-plotting` notebook in the tutorial folder. 

#### More detail..
Below we apply some style settings to the figure (`height`, `x_axis_type`, and `title`) and also some style settings to our 2010 life v. income (`color` and `alpha`) as well as add a second set of data to our plot...

In [11]:
#We can stack multiple plots on top of each other
p = figure(height=200,
           x_axis_type='log',
           title='1980 vs 2010')
p.circle(x=data.loc[2010]['income'],
         y=data.loc[2010]['life'],
         size=5,color='blue',
         alpha=0.25)
p.cross(x=data.loc[1980]['income'],
        y=data.loc[1980]['life'],
        size=10,color='red',
        alpha=0.25)
show(p)

#### Using a dicitonary of plot settings
The below produces the same as the above, but instead of passing in a set of individual stylings as keyword variables, we can pass a dictionary of values. This is useful as we can re-use this dictionary in other plots...

In [12]:
PLOT_OPS = {'height':200,
            'x_axis_type':'log',
            'title':'1980 v. 2010'
           }

p = figure(**PLOT_OPS)
p.circle(x=data.loc[2010]['income'],y=data.loc[2010]['life'],size=5,color='blue',alpha=0.25)
p.cross(x=data.loc[1980]['income'],y=data.loc[1980]['life'],size=10,color='red', alpha=0.25)
show(p)