<a href="https://colab.research.google.com/github/unburied/DS-Unit-1-Sprint-2-Data-Wrangling-and-Storytelling/blob/master/LS_DS_224_Sequence_your_narrative.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

_Lambda School Data Science_

# Sequence your narrative

Today we will create a sequence of visualizations inspired by [Hans Rosling's 200 Countries, 200 Years, 4 Minutes](https://www.youtube.com/watch?v=jbkSRLYSojo).

Using this [data from Gapminder](https://github.com/open-numbers/ddf--gapminder--systema_globalis/):
- [Income Per Person (GDP Per Capital, Inflation Adjusted) by Geo & Time](https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis/master/ddf--datapoints--income_per_person_gdppercapita_ppp_inflation_adjusted--by--geo--time.csv)
- [Life Expectancy (in Years) by Geo & Time](https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis/master/ddf--datapoints--life_expectancy_years--by--geo--time.csv)
- [Population Totals, by Geo & Time](https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis/master/ddf--datapoints--population_total--by--geo--time.csv)
- [Entities](https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis/master/ddf--entities--geo--country.csv)
- [Concepts](https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis/master/ddf--concepts.csv)

Objectives
- sequence multiple visualizations
- combine qualitative anecdotes with quantitative aggregates

Links
- [Hans Rosling’s TED talks](https://www.ted.com/speakers/hans_rosling)
- [Spiralling global temperatures from 1850-2016](https://twitter.com/ed_hawkins/status/729753441459945474)
- "[The Pudding](https://pudding.cool/) explains ideas debated in culture with visual essays."
- [A Data Point Walks Into a Bar](https://lisacharlotterost.github.io/2016/12/27/datapoint-in-bar/): a thoughtful blog post about emotion and empathy in data storytelling

## Make a plan

#### How to present the data?

Variables --> Visual Encodings
- Income --> x
- Lifespan --> y
- Region --> color
- Population --> size
- Year --> animation frame (alternative: small multiple)
- Country --> annotation

Qualitative --> Verbal
- Editorial / contextual explanation --> audio narration (alternative: text)


#### How to structure the data?

| Year | Country | Region   | Income | Lifespan | Population |
|------|---------|----------|--------|----------|------------|
| 1818 | USA     | Americas | ###    | ##       | #          |
| 1918 | USA     | Americas | ####   | ###      | ##         |
| 2018 | USA     | Americas | #####  | ###      | ###        |
| 1818 | China   | Asia     | #      | #        | #          |
| 1918 | China   | Asia     | ##     | ##       | ###        |
| 2018 | China   | Asia     | ###    | ###      | #####      |


## Upgrade Seaborn

Make sure you have at least version 0.9.0.

In Colab, go to **Restart runtime** after you run the `pip` command.

In [0]:
!pip install --upgrade seaborn

In [0]:
import seaborn as sns
sns.__version__

## More imports

In [0]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

## Load & look at data

In [0]:
income = pd.read_csv('https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis/master/ddf--datapoints--income_per_person_gdppercapita_ppp_inflation_adjusted--by--geo--time.csv')

In [0]:
lifespan = pd.read_csv('https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis/master/ddf--datapoints--life_expectancy_years--by--geo--time.csv')

In [0]:
population = pd.read_csv('https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis/master/ddf--datapoints--population_total--by--geo--time.csv')

In [0]:
entities = pd.read_csv('https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis/master/ddf--entities--geo--country.csv')

In [0]:
concepts = pd.read_csv('https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis/master/ddf--concepts.csv')

In [0]:
income.shape, lifespan.shape, population.shape, entities.shape, concepts.shape

In [0]:
income.head()

In [0]:
lifespan.head()

In [0]:
population.head()

In [0]:
pd.options.display.max_columns = 500
entities.head()

In [0]:
concepts.head()

## Merge data

https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf

In [0]:
df = income.merge(population.merge(lifespan))
df.head()

In [0]:
df = pd.merge(df, entities[['country','name', 'world_6region']], 
              left_on = 'geo', right_on ='country')

In [0]:
df.sample()

In [0]:
df.drop(columns = ['country'], inplace = True)
df.head()

In [0]:
 headers = {
    'geo' : 'Alpha-3 code',
    'time': 'year', 
    'income_per_person_gdppercapita_ppp_inflation_adjusted': 'income', 
    'life_expectancy_years': 'lifespan', 
    'population_total': 'population', 
    'name': 'country', 
    'world_6region': 'region'
}
df = df.rename(columns = headers) 

In [0]:
df.head()

## Explore data

In [0]:
df.describe()

In [0]:
countries = df['country'].value_counts().to_dict()
countries = list(countries.keys())
countries


In [0]:
##takes a feature from df and returns a list containing the difference of
##min max for each feature based on country
def growth(feature):  
  life_growth = []
  pop_growth = []
  income_growth = []

  for country in countries:
    subset = df[df['country'] == country]

    growth = subset.lifespan.max() - subset.lifespan.min()
    life_growth.append(growth)

    growth = subset.population.max() - subset.population.min()
    pop_growth.append(growth)

    growth = subset.income.max() - subset.income.min()
    income_growth.append(growth)

  zip_life = sorted(list(zip(countries, life_growth)),
                    key = lambda v: v[1], reverse = True)
  zip_pop = sorted(list(zip(countries, pop_growth)),
                   key = lambda v: v[1], reverse = True)
  zip_inc = sorted(list(zip(countries, income_growth)),
                   key = lambda v: v[1], reverse = True)


  if feature == 'lifespan':
    return zip_life
  elif feature == 'population':
    return zip_pop
  elif feature == 'income':
    return zip_inc
  else:
    return 'failed'
  

In [0]:
growth('income')[:10]

In [0]:
growth('population')[:10]

In [0]:
growth('lifespan')[:10]

## Plot visualization

In [0]:
lifespan = pd.DataFrame(growth('lifespan'), columns = ['countries', 'lifespan'])
lifespan.head()

In [0]:
plt.figure(figsize = (20,5))
sns.barplot(x='countries', y='lifespan', data = lifespan[:20]);
plt.title('Lifespan Growth by Contry over 200 Years');

In [0]:
income = pd.DataFrame(growth('income'), columns = ['countries', 'income'])
income.head()

In [0]:
plt.figure(figsize = (20,5))
sns.barplot(x='countries', y='income', data = income[:20]);
plt.title('Income Growth by Contry over 200 Years');

In [0]:
population = pd.DataFrame(growth('population'), columns = ['countries', 'population'])
population.head()


In [0]:
plt.figure(figsize = (20,5))
sns.barplot(x='countries', y='population', data = population[:20]);
plt.title('Population Growth by Contry over 200 Years');

## Analyze outliers

In [97]:
df[df['country'] == 'India'].min(), df[df['country'] == 'India'].max() 


(Alpha-3 code            IND
 year                   1800
 income                  904
 population        168574895
 lifespan               8.12
 country               India
 region           south_asia
 income_growth        -10598
 pop_growth        -98220085
 life_growth          -46.59
 dtype: object, Alpha-3 code            IND
 year                   2018
 income                 6890
 population       1354051854
 lifespan               69.1
 country               India
 region           south_asia
 income_growth           414
 pop_growth         18568724
 life_growth            16.5
 dtype: object)

In [96]:
df[df['country'] == 'China'].min(), df[df['country'] == 'China'].max() 

(Alpha-3 code                   CHN
 year                          1800
 income                         530
 population               321675013
 lifespan                     22.13
 country                      China
 region           east_asia_pacific
 income_growth               -22382
 pop_growth                -2563754
 life_growth                 -48.66
 dtype: object, Alpha-3 code                   CHN
 year                          2018
 income                       16018
 population              1415045928
 lifespan                     76.92
 country                      China
 region           east_asia_pacific
 income_growth                  874
 pop_growth               303477804
 life_growth                  10.15
 dtype: object)

In [0]:
df[df['country'] == 'United Arab Emirates'].min(), df[df['country'] == 'United Arab Emirates'].max() 

In [0]:
df[df['country'] == 'Qatar'].min(), df[df['country'] == 'Qatar'].max() 

In [0]:
df[df['country'] == 'Brunei'].min(), df[df['country'] == 'Brunei'].max() 

## Plot multiple years

In [0]:
pip install plotly_express

In [0]:
import plotly_express as px


In [0]:

df["Alpha-3 code"] = df['Alpha-3 code'].str.upper()

In [95]:
df.head()

Unnamed: 0,Alpha-3 code,year,income,population,lifespan,country,region,income_growth,pop_growth,life_growth
0,AFG,1800,603,3280000,28.21,Afghanistan,south_asia,0,0,0.0
1,AFG,1801,603,3280000,28.2,Afghanistan,south_asia,0,0,-0.01
2,AFG,1802,603,3280000,28.19,Afghanistan,south_asia,0,0,-0.01
3,AFG,1803,603,3280000,28.18,Afghanistan,south_asia,0,0,-0.01
4,AFG,1804,603,3280000,28.17,Afghanistan,south_asia,0,0,-0.01


In [0]:
df.describe()

In [0]:
def configure_plotly_browser_state():
  import IPython
  display(IPython.core.display.HTML('''
        <script src="/static/components/requirejs/require.js"></script>
        <script>
          requirejs.config({
            paths: {
              base: '/static/base',
              plotly: 'https://cdn.plot.ly/plotly-latest.min.js?noext',
            },
          });
        </script>
        '''))

In [92]:
#change features here
#'population'
#'income'
#'lifespan'
feature = 'population'

tidy = df[['year','country','Alpha-3 code', feature]]
tidy = tidy.groupby(['year','country', 'Alpha-3 code']).aggregate(np.sum).reset_index()
tidy.head()

Unnamed: 0,year,country,Alpha-3 code,lifespan
0,1800,Afghanistan,AFG,28.21
1,1800,Albania,ALB,35.4
2,1800,Algeria,DZA,28.82
3,1800,Angola,AGO,26.98
4,1800,Antigua and Barbuda,ATG,33.54


In [93]:
configure_plotly_browser_state()

px.choropleth(tidy, locations = 'Alpha-3 code', color = feature,
             hover_name = 'country', animation_frame = 'year',
             color_continuous_scale = px.colors.sequential.Reds)

                     


## Point out a story

Discovered that although some countries were just overrall richer, some countries grew at a much faster rate than the richest cuontries. Also China and India had HUGE growth in terms of population. I would have to guess this is due to the huge start they already had. But to see by how much was amazing. India grow nearly 1000% on its on in just 200 years!

# ASSIGNMENT
Replicate the lesson code

# STRETCH OPTIONS

## 1. Animate!
- [Making animations work in Google Colaboratory](https://medium.com/lambda-school-machine-learning/making-animations-work-in-google-colaboratory-new-home-for-ml-prototyping-c6147186ae75)
- [How to Create Animated Graphs in Python](https://towardsdatascience.com/how-to-create-animated-graphs-in-python-bb619cc2dec1)
- [The Ultimate Day of Chicago Bikeshare](https://chrisluedtke.github.io/divvy-data.html) (Lambda School Data Science student)

## 2. Work on anything related to your portfolio site / project