# Pandas Dataframes

© Explore Data Science Academy

## Instructions to Students
- **Do not add or remove cells in this notebook. Do not edit or remove the `### START FUNCTION` or `### END FUNCTION` comments. Do not add any code outside of the functions you are required to edit. Doing any of this will lead to a mark of 0%!**
- Answer the questions according to the specifications provided.
- Use the given cell in each question to to see if your function matches the expected outputs.
- Do not hard-code answers to the questions.
- The use of stackoverflow, google, and other online tools are permitted. However, copying fellow student's code is not permissible and is considered a breach of the Honour code below. Doing this will result in a mark of 0%.
- Good luck, and may the force be with you!

## Honour Code

I **Thabisile, Obi**, confirm - by submitting this document - that the solutions in this notebook are a result of my own work and that I abide by the   <a href="https://drive.google.com/open?id=1FXCIf425JLRx3JQi-ltSWppj8BCF3Np1" target="_blank">EDSA Student Manifesto</a>.

Non-compliance with the honour code constitutes a material breach of contract.

### Import the required libraries

In [1]:
import numpy as np
import pandas as pd

### Data

You will need these dataframes in order to answer the following questions.

In [2]:
country_map_df = pd.read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/AnalyseProject/country_code_map.csv', index_col='Country Code')
population_df = pd.read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/AnalyseProject/world_population.csv', index_col='Country Code')
meta_df = pd.read_csv('https://raw.githubusercontent.com/Explore-AI/Public-Data/master/AnalyseProject/metadata.csv', index_col='Country Code')

_**Dataframe specifications:**_

The dataframes provide information about the population of the world for various years. Some things to note:
* All dataframes have a `Country Code` as an index, which is a three letter code referring to a country.
* The `country_map_df` data maps the `Country Code` to a `Country Name`.
* The `population_df` data contains information on the population for a given country between the years of 1960 and 2017.
* The `meta_df` data contains meta information about each country, including it's geographical region, it's income group, and a comment on the country as a whole.

In [3]:
country_map_df.head()

Unnamed: 0_level_0,Country Name
Country Code,Unnamed: 1_level_1
ABW,Aruba
AFG,Afghanistan
AGO,Angola
ALB,Albania
AND,Andorra


In [4]:
population_df.head()

Unnamed: 0_level_0,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017
Country Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ABW,54211.0,55438.0,56225.0,56695.0,57032.0,57360.0,57715.0,58055.0,58386.0,58726.0,...,101353.0,101453.0,101669.0,102053.0,102577.0,103187.0,103795.0,104341.0,104822.0,105264.0
AFG,8996351.0,9166764.0,9345868.0,9533954.0,9731361.0,9938414.0,10152331.0,10372630.0,10604346.0,10854428.0,...,27294031.0,28004331.0,28803167.0,29708599.0,30696958.0,31731688.0,32758020.0,33736494.0,34656032.0,35530081.0
AGO,5643182.0,5753024.0,5866061.0,5980417.0,6093321.0,6203299.0,6309770.0,6414995.0,6523791.0,6642632.0,...,21759420.0,22549547.0,23369131.0,24218565.0,25096150.0,25998340.0,26920466.0,27859305.0,28813463.0,29784193.0
ALB,1608800.0,1659800.0,1711319.0,1762621.0,1814135.0,1864791.0,1914573.0,1965598.0,2022272.0,2081695.0,...,2947314.0,2927519.0,2913021.0,2905195.0,2900401.0,2895092.0,2889104.0,2880703.0,2876101.0,2873457.0
AND,13411.0,14375.0,15370.0,16412.0,17469.0,18549.0,19647.0,20758.0,21890.0,23058.0,...,83861.0,84462.0,84449.0,83751.0,82431.0,80788.0,79223.0,78014.0,77281.0,76965.0


In [5]:
meta_df.head()

Unnamed: 0_level_0,Region,Income Group,Special Notes
Country Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ABW,Latin America & Caribbean,High income,Mining is included in agriculture\r\r\r\nElect...
AFG,South Asia,Low income,Fiscal year end: March 20; reporting period fo...
AGO,Sub-Saharan Africa,Lower middle income,
ALB,Europe & Central Asia,Upper middle income,
AND,Europe & Central Asia,High income,WB-3 code changed from ADO to AND to align wit...


Using this information, answer the questions below:

### Question 1

Write a function that returns the summed population total in a given geographic region for a given year.

_**Function Specifications:**_
* Should take as input a year as an `int` and region as a `str`.
* Should return an `int` corresponding to the population.

In [6]:
### START FUNCTION
def total_pop_in_region(year,region):
    combo = pd.merge(meta_df, population_df, on=['Country Code'])
    df = combo[combo['Region'] == region]

    for col in df:
        colsum = df[str(year)].sum(axis=0)
        return colsum

### END FUNCTION

In [None]:
total_pop_in_region(1960,'East Asia & Pacific')

_**Expected Outputs:**_
```python
total_pop_in_region(1960,'East Asia & Pacific') == 1029332591.0
total_pop_in_region(1970,'South Asia') == 712740919.0
```

### Question 2

Write a function that returns the global yearly population `Growth`, grouped by the `Income Group` and `Year`.

_**Function Specifications**_
* Should not take any inputs.
* The years are currently presented as the heading of each row in the population table. The table will have to be melted to produce the appropriate format. You can use `df.melt` to do this, where the variable name should be `Year` and the value name should be `Growth`.
* Should group by the `Year` and `Income Group`.
* Should only have one column named `Growth`.
* The `Income Group` and the `Year` should be indices.
* The `Growth` is calculated by taking the yearly difference and dividing it by the total population for each year, multiplied by 100.
* Should return a `DataFrame`.

In [8]:
### START FUNCTION
def population_difference_by_income():
    combo = pd.merge(meta_df, population_df, on=['Country Code'])

    del combo['Region']
    del combo['Special Notes']

    df = pd.melt(combo,id_vars= ['Income Group'], var_name ='Year', value_name ='Growth')
    df1 = df.groupby(['Income Group', 'Year']).sum()
    ad = df1.diff()/df1*100
    return ad

### END FUNCTION

In [None]:
population_difference_by_income().head()

_**Expected Output:**_
```python
population_difference_by_income().head()
```
> <table class="dataframe" border="1">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>Growth</th>
    </tr>
    <tr>
      <th>Income Group</th>
      <th>Year</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="5" valign="top">High income</th>
      <th>1960</th>
      <td>NaN</td>
    </tr>
    <tr>
      <th>1961</th>
      <td>1.450978</td>
    </tr>
    <tr>
      <th>1962</th>
      <td>1.261630</td>
    </tr>
    <tr>
      <th>1963</th>
      <td>1.235893</td>
    </tr>
    <tr>
      <th>1964</th>
      <td>1.207633</td>
    </tr>
  </tbody>
</table>

### Question 3

Using the function you just created, write a function that returns the average population _growth_ over all years for a given income group. 

_**Function Specifications:**_
* Should take as input a `str` as the income group.
* Should raise a `ValueError` if the input is not a valid income group.
* Should return a `float`, rounded to 2 decimal places.

In [0]:
### START FUNCTION
def ave_growth_by_income(income_group): 
        
    df = population_difference_by_income()
    
    if income_group not in df.index.get_level_values('Income Group'):
        raise ValueError()
        
    df[df < 0] = np.nan
    num = np.round(float((df.loc[income_group].mean())), 2)
    return num

### END FUNCTION

In [None]:
ave_growth_by_income('Low income')

_**Expected Outputs:**_
```python
ave_growth_by_income('High income') == 0.81
ave_growth_by_income('Low income') == 2.55
```