<div class="alert alert-block alert-warning">
<b>Exercise 0:</b> Create a new folder and notebook called <b>Intro-Data-Analysis-Pandas-Exercises</b> where you will perform all the exercises below. Make sure to copy any code you need from this notebook to that one. You will use that folder to create a new GitHub repo with the code, html, and slides as usual.
</div>

<div class="alert alert-block alert-warning">
<b>Exercise 1:</b> Create a new dataframe <code>pop</code> with population data downloaded from <a href="https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations)">Wikipedia</a>. Make sure to clean the data so it can be used further.
</div>

In [6]:
# Let's import pandas and some other basic packages we will use 
from __future__ import division
%matplotlib inline
import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from IPython.display import IFrame

## Now, let's import the table of countries' Population from [Wikipedia](https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations))

In [7]:
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations)'
IFrame(url, width=800, height=400)

In [9]:
pop = pd.read_html(url, encoding='utf-8')[0]
pop

Unnamed: 0,Country / Area,UN continentalregion[4],UN statisticalsubregion[4],Population(1 July 2018),Population(1 July 2019),Change
0,China[a],Asia,Eastern Asia,1427647786,1433783686,+0.43%
1,India,Asia,Southern Asia,1352642280,1366417754,+1.02%
2,United States,Americas,Northern America,327096265,329064917,+0.60%
3,Indonesia,Asia,South-eastern Asia,267670543,270625568,+1.10%
4,Pakistan,Asia,Southern Asia,212228286,216565318,+2.04%
...,...,...,...,...,...,...
229,Falkland Islands (United Kingdom),Americas,South America,3234,3377,+4.42%
230,Niue,Oceania,Polynesia,1620,1615,−0.31%
231,Tokelau (New Zealand),Oceania,Polynesia,1319,1340,+1.59%
232,Vatican City[z],Europe,Southern Europe,801,799,−0.25%


## Again we need to clean the data a little bit

In [11]:
pop.columns = ['Country/Territory', 'UN Region', 'UN Statistical subregion', 'population 2018',
                      'population 2019','Change']
pop.head()

Unnamed: 0,Country/Territory,UN Region,UN Statistical subregion,population 2018,population 2019,Change
0,China[a],Asia,Eastern Asia,1427647786,1433783686,+0.43%
1,India,Asia,Southern Asia,1352642280,1366417754,+1.02%
2,United States,Americas,Northern America,327096265,329064917,+0.60%
3,Indonesia,Asia,South-eastern Asia,267670543,270625568,+1.10%
4,Pakistan,Asia,Southern Asia,212228286,216565318,+2.04%


## Let's eliminate the "[*]" in the country names

In [22]:
pop['country_name'] = pop['Country/Territory']
pop.loc[pop['Country/Territory'].str.find('[')!=-1, 'country_name'] = pop.loc[pop['Country/Territory'].str.find('[')!=-1, 'Country/Territory'].apply(lambda x: x[:x.find('[')])
pop.loc[pop['Country/Territory'].str.find('(')!=-1, 'country_name'] = pop.loc[pop['Country/Territory'].str.find('(')!=-1, 'Country/Territory'].apply(lambda x: x[:x.find('(')])
pop.head()

Unnamed: 0,Country/Territory,UN Region,UN Statistical subregion,population 2018,population 2019,Change,country_name
0,China[a],Asia,Eastern Asia,1427647786,1433783686,+0.43%,China
1,India,Asia,Southern Asia,1352642280,1366417754,+1.02%,India
2,United States,Americas,Northern America,327096265,329064917,+0.60%,United States
3,Indonesia,Asia,South-eastern Asia,267670543,270625568,+1.10%,Indonesia
4,Pakistan,Asia,Southern Asia,212228286,216565318,+2.04%,Pakistan


# Let's make sure years and Population columns are treated as numbers

In [23]:
pop.dtypes 

Country/Territory           object
UN Region                   object
UN Statistical subregion    object
population 2018              int64
population 2019              int64
Change                      object
country_name                object
dtype: object

<div class="alert alert-block alert-warning">
<b>Exercise 2:</b> Merge the <code>isocodes</code> and <code>pop</code> dataframes.
</div>

In [12]:
url = 'https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes'
isocodes = pd.read_html(url, encoding='utf-8')[0]
isocodes = isocodes.droplevel(0, axis=1)
mycols = isocodes.columns
mycols = [c[:c.find('[')] for c in mycols]
isocodes.columns = mycols
isocodes['Alpha-2 code original'] = isocodes['Alpha-2 code']
isocodes['Alpha-2 code'] = isocodes['Subdivision code links'].apply(lambda x: x[x.find(':')+1:])
isocodes.head()

Unnamed: 0,Country name,Official state name,Sovereignty,Alpha-2 code,Alpha-3 code,Numeric code,Subdivision code links,Internet ccTLD,Alpha-2 code original
0,Afghanistan,The Islamic Republic of Afghanistan,UN member state,AF,AFG,4,ISO 3166-2:AF,.af,.mw-parser-output .monospaced{font-family:mono...
1,Åland Islands,Åland,Finland,AX,ALA,248,ISO 3166-2:AX,.ax,AX
2,Albania,The Republic of Albania,UN member state,AL,ALB,8,ISO 3166-2:AL,.al,AL
3,Algeria,The People's Democratic Republic of Algeria,UN member state,DZ,DZA,12,ISO 3166-2:DZ,.dz,DZ
4,American Samoa,The Territory of American Samoa,United States,AS,ASM,16,ISO 3166-2:AS,.as,AS


In [25]:
pop_merged = isocodes.merge(pop, left_on='Country name', right_on='country_name')
pop_merged

Unnamed: 0,Country name,Official state name,Sovereignty,Alpha-2 code,Alpha-3 code,Numeric code,Subdivision code links,Internet ccTLD,Alpha-2 code original,Country/Territory,UN Region,UN Statistical subregion,population 2018,population 2019,Change,country_name
0,Afghanistan,The Islamic Republic of Afghanistan,UN member state,AF,AFG,004,ISO 3166-2:AF,.af,.mw-parser-output .monospaced{font-family:mono...,Afghanistan,Asia,Southern Asia,37171921,38041754,+2.34%,Afghanistan
1,Albania,The Republic of Albania,UN member state,AL,ALB,008,ISO 3166-2:AL,.al,AL,Albania,Europe,Southern Europe,2882740,2880917,−0.06%,Albania
2,Algeria,The People's Democratic Republic of Algeria,UN member state,DZ,DZA,012,ISO 3166-2:DZ,.dz,DZ,Algeria,Africa,Northern Africa,42228408,43053054,+1.95%,Algeria
3,Andorra,The Principality of Andorra,UN member state,AD,AND,020,ISO 3166-2:AD,.ad,AD,Andorra,Europe,Southern Europe,77006,77142,+0.18%,Andorra
4,Angola,The Republic of Angola,UN member state,AO,AGO,024,ISO 3166-2:AO,.ao,AO,Angola,Africa,Middle Africa,30809787,31825295,+3.30%,Angola
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
150,Uzbekistan,The Republic of Uzbekistan,UN member state,UZ,UZB,860,ISO 3166-2:UZ,.uz,UZ,Uzbekistan,Asia,Central Asia,32476244,32981716,+1.56%,Uzbekistan
151,Vanuatu,The Republic of Vanuatu,UN member state,VU,VUT,548,ISO 3166-2:VU,.vu,VU,Vanuatu,Oceania,Melanesia,292680,299882,+2.46%,Vanuatu
152,Yemen,The Republic of Yemen,UN member state,YE,YEM,887,ISO 3166-2:YE,.ye,YE,Yemen,Asia,Western Asia,28498683,29161922,+2.33%,Yemen
153,Zambia,The Republic of Zambia,UN member state,ZM,ZMB,894,ISO 3166-2:ZM,.zm,ZM,Zambia,Africa,Eastern Africa,17351708,17861030,+2.94%,Zambia


<div class="alert alert-block alert-warning">
<b>Exercise 3:</b> Merge the dataframes we have created so far to have a unique dataframe that has ISO codes, GDP per capita, and population data.
</div>

<div class="alert alert-block alert-warning">
<b>Exercise 4:</b> Use the <code>os</code> package to create folders to export data and figures. 
Since you will be using the names of these folders a lot, save their names in variables called <code>path</code>, <code>pathout</code>, and <code>pathgraphs</code>, where <code>path = './data/'</code>, <code>pathout = './data/'</code>, and <code>pathgraphs = './graphs/'</code>
</div>

<div class="alert alert-block alert-warning">
    <b>Exercise 5:</b> Save the dataframe created in Exercise 3 as a <b>CSV, XLSX, and Stata</b> file into the <code>pathout</code> folder. Use a variable called <code>filename = 'Wiki_Data'</code> so you can use similar code to save all file types. Notice only the filetype will change.
</div>

<div class="alert alert-block alert-warning">
<b>Exercise 6:</b> Create plots showing the relation between GDP per capita and Population. Create all 4 types of possible regression plots and save them as <b> PNG, PDF, and JPG</b> files. Make sure to save them in the folder you created for <b>graphs</b>
</div>

<div class="alert alert-block alert-warning">
<b>Exercise 7:</b> Create plots showing the relation between GDP per capita and Population Growth. Create all 4 types of possible regression plots and save them as <b> PNG, PDF, and JPG</b> files. Make sure to save them in the folder you created for <b>graphs</b>
</div>

<div class="alert alert-block alert-warning">
<b>Exercise 8:</b> Using the notebook create slides for presenting your work and results. Once you have your slides, create a new public repo, publish it, and make sure to create a READ.ME file that show links to the notebook, html, and slides. Also, create the gh-pages branch to have a working slides webpage.
</div>