# [A] Deliverables

## Population Data Frame

To begin analyzing the population in the Syrian Arab Republic, let's first write a function that creates a population data frame indexed by country and year, with columns giving counts of people in different age-sex groups.

We need to install the necessary packages and extract data from the *Population Estimates and Projections* indicator.

In [15]:
"""
If necessary, uncomment and install:
"""
!pip install wbdata
!pip install cufflinks
!pip install iso3166

import iso3166 #iso3166.countries.get('country details')
import wbdata
import cufflinks as cf
import pandas as pd
import numpy as np
import plotly
import matplotlib.pyplot as plt
import seaborn as sns
cf.go_offline()



In [16]:
SOURCE = 40 # "Population estimates and projections

indicators = wbdata.get_indicator(source=SOURCE)
indicators

id                 name
-----------------  -------------------------------------------------------------------
SH.DTH.0509        Number of deaths ages 5-9 years
SH.DTH.1014        Number of deaths ages 10-14 years
SH.DTH.1519        Number of deaths ages 15-19 years
SH.DTH.2024        Number of deaths ages 20-24 years
SH.DTH.IMRT        Number of infant deaths
SH.DTH.IMRT.FE     Number of infant deaths, female
SH.DTH.IMRT.MA     Number of infant deaths, male
SH.DTH.MORT        Number of under-five deaths
SH.DTH.MORT.FE     Number of under-five deaths, female
SH.DTH.MORT.MA     Number of under-five deaths, male
SH.DTH.NMRT        Number of neonatal deaths
SH.DYN.0509        Probability of dying among children ages 5-9 years (per 1,000)
SH.DYN.1014        Probability of dying among adolescents ages 10-14 years (per 1,000)
SH.DYN.1519        Probability of dying among adolescents ages 15-19 years (per 1,000)
SH.DYN.2024        Probability of dying among youth ages 20-24 years (per 1,000)

Now that we have all the necessary data, we can begin writing our function. The function **pop_dataframe** takes in inputs, *year*, *group*, *age_lower*, *age_upper*, and *location*. One limitation to our function is that **age_lower** and **age_upper** must be set to a multiple of 5. 

In [17]:
year = 2020 #1960 to 2020
group = 'all' #males/females/all
age_lower = 0 #rounds to nearest 5 
age_upper = 80
location = 'Syrian Arab Republic'
def pop_dataframe(year = '2018', group = 'all', age_lower = 0, age_upper = 100, location ='world'):
    country_code = "WLD"
    if location != 'world':
        country_code = iso3166.countries.get(location).alpha3
    age_ranges = []
    for i in range(age_lower, age_upper, 5):
        age_ranges.append(f"{i:02d}"+f"{i+4:02d}")
    age_ranges.append("80UP")
    
    female_variables = {"SP.POP."+age_range+".FE":"Females "+age_range for age_range in age_ranges}
    female_df = wbdata.get_dataframe(female_variables,country=country_code)
    female_data = female_df.query("date=='{}'".format(year)).sum(axis=0).tolist()
    
    male_variables = {"SP.POP."+age_range+".MA":"Males "+age_range for age_range in age_ranges}
    male_df = wbdata.get_dataframe(male_variables,country=country_code)
    male_data = male_df.query("date=='{}'".format(year)).sum(axis=0).tolist()
    
    age_input = [i[:2]+'-'+i[2:] for i in age_ranges]
    
    df = pd.DataFrame({
        'Country': location,
        'Year': year,
        'Age': age_input,
        'Female': female_data,
        'Male': male_data,
    })
    
    df['People'] = df['Female'] + df['Male']
    
    return df

Great! Now that we have our function, we can test it out with different inputs. Since we are analyzing the Syrian Arab Republic and the effects of the Syrian war, we can look at the population in Syria in 2011, when the war began.

In [18]:
pop_dataframe(year = 2011, group = 'all', age_lower = 0, age_upper = 80, location = 'Syrian Arab Republic')

Unnamed: 0,Country,Year,Age,Female,Male,People
0,Syrian Arab Republic,2011,00-04,1409663.0,1468425.0,2878088.0
1,Syrian Arab Republic,2011,05-09,1229224.0,1281990.0,2511214.0
2,Syrian Arab Republic,2011,10-14,1155916.0,1217287.0,2373203.0
3,Syrian Arab Republic,2011,15-19,1050888.0,1108494.0,2159382.0
4,Syrian Arab Republic,2011,20-24,1037743.0,1077484.0,2115227.0
5,Syrian Arab Republic,2011,25-29,989813.0,998113.0,1987926.0
6,Syrian Arab Republic,2011,30-34,822739.0,810866.0,1633605.0
7,Syrian Arab Republic,2011,35-39,632588.0,617023.0,1249611.0
8,Syrian Arab Republic,2011,40-44,509993.0,497910.0,1007903.0
9,Syrian Arab Republic,2011,45-49,433185.0,424893.0,858078.0


As the war escalated, many Syrians seeked asylum in neighboring countries. Let's explore the population data in Syria five years into the war by using our *pop_dataframe* function again.

In [19]:
pop_dataframe(year = 2016, group = 'all', age_lower = 0, age_upper = 80, location = 'Syrian Arab Republic')

Unnamed: 0,Country,Year,Age,Female,Male,People
0,Syrian Arab Republic,2016,00-04,935963.0,978003.0,1913966.0
1,Syrian Arab Republic,2016,05-09,968641.0,1006416.0,1975057.0
2,Syrian Arab Republic,2016,10-14,861840.0,901748.0,1763588.0
3,Syrian Arab Republic,2016,15-19,848246.0,894250.0,1742496.0
4,Syrian Arab Republic,2016,20-24,792248.0,840146.0,1632394.0
5,Syrian Arab Republic,2016,25-29,800179.0,843093.0,1643272.0
6,Syrian Arab Republic,2016,30-34,763503.0,769201.0,1532704.0
7,Syrian Arab Republic,2016,35-39,627092.0,606879.0,1233971.0
8,Syrian Arab Republic,2016,40-44,479610.0,459305.0,938915.0
9,Syrian Arab Republic,2016,45-49,389610.0,374465.0,764075.0


Analyzing our table, we can see that population numbers in 2016 for each age bin decreased from the numbers in 2011 when the war first began. This could be due to the large number of Syrians that fled the country, casualties of the war, or other reasons. Our project will further explore the potential determinants for population changes in Syria.

## Population Statistics

Rather than outputting a table every time we want to retrieve population data from a certain year, group, or age range, we should also create a function, *population*, that directly outputs the data we are interested in. 