# Python 101
## Part X.
---

## More pandas are coming!

In [None]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
BASE_URI = "./data/"

---
## Act I: Read demographic data

- Read data from a csv file

In [None]:
df = pd.read_csv(BASE_URI + 'population.csv')

- Select the columns we need

In [None]:
columns = ['Data Source'] + [str(year) for year in range(1990, 2011)]
pop = df[columns].dropna()
pop.head()

- Rename `'Data Source'` to `'country'` and set it as index.

In [None]:
pop = pop.rename(columns={'Data Source': 'country'}).set_index('country')
pop.head()

- Select a few countries

In [None]:
countries = [ u'United Kingdom', u'Hungary', u'France', u'Germany']
subpop = pop[pop.index.isin(countries)]
subpop

- Transpose the dataframe

In [None]:
subpop = subpop.transpose()
subpop.head()

- Plot!

In [None]:
subpop.plot();

---
## Act II: Read data about alcohol

- Read the data

In [None]:
df = pd.read_csv(BASE_URI + 'alcohol.csv')
df.head()

- Select the columns, and rename them

In [None]:
columns = {
    'Country': 'country',
    'Year': 'year',
    'Beverage Types': 'type',
    'Display Value': 'alcohol'
}
alc = df[columns.keys()].rename(columns=columns).dropna()
alc.head()

- Select the same country subset

In [None]:
subalc = alc[alc['country'].isin(countries)]
subalc.head()

We only care about the combined consumption:
- filter the dataframe, select rows where `type` is `'All'`
- remove the now defunct column `type`

In [None]:
subalc = subalc[subalc['type'] == 'All']
del subalc['type']
subalc.head()

- Pivot the dataframe

In [None]:
subalc = subalc.pivot(index='year', columns='country', values='alcohol')
subalc.head()

- Plot!

In [None]:
subalc.plot();

---
## Act III: Merge data

- Check index types

In [None]:
print 'subpop index type:', subpop.index.dtype
print 'subalc index type:', subalc.index.dtype

subpop's index type is unicode, change it to integer

In [None]:
subpop.index = subpop.index.astype(np.int)

- Check index lengths

In [None]:
set(subpop.index.values).symmetric_difference(set(subalc.index.values))

Remove missing index values

In [None]:
subpop = subpop.loc[subpop.index < 2011]
subalc = subalc.loc[subalc.index < 2011]

- Join the two dataframe!

In [None]:
merged = subpop.join(subalc, rsuffix='_alc')
merged.head()

- Plot data into separate coordinate systems

In [None]:
merged[['Hungary', 'Hungary_alc']].plot(subplots=True);

- Compute the total actual alcohol consumption

In [None]:
for country in countries:
    merged[country+'_consumption'] = merged[country] * merged[country+'_alc']

- and plot it!

In [None]:
merged[[c + '_consumption' for c in countries]].plot()

---

## Let's do some...

<img align="left" width=150 src="pics/magic.gif">
<br style="clear:left;"/>

### Act IV: Cool library of the week: <a href="https://github.com/JosPolfliet/pandas-profiling">pandas-profiling</a>
#### Generate detailed reports from pandas dataframes
- import it
- generate report

In [None]:
import pandas_profiling
pandas_profiling.ProfileReport(alc)

---
## Final Act: your play time!
## It's your turn - write the missing code snippets!

#### 1.  Plot the top 5 alcohol consuming country in 1990 and their consumption

#### 2. Compare the average alcohol consumption in France, Germany, UK and Hungary by plotting

#### 3.a Update the data by downloading the latest entries from the WHO site:
- <a href="http://apps.who.int/gho/data/view.main.POP2040ALL">population</a>
- <a href="http://apps.who.int/gho/data/view.main.52145">alcohol part I.</a>
- <a href="http://apps.who.int/gho/data/view.main.52160">alcohol part II.</a>

#### 3.b Generate the same plots!