### Background

#### Urban migration and city development rates are increasing rapidly around the world. However, while governments invest heavily into programs that pose high returns on investment, slum settlements, a dimension of urbanization that is in need of the most support, are seldom considered. 

#### As populations increasingly flock to cities, inhabitants of rural regions are pressured to follow the same trajectory for economic opportunity. However, due to rapid population booms, the demand for housing often exceeds that which the cities can supply. The market is competitive and housing costs are often unaffordable to the lower income bracket of migrants. This has given rise to large settlements of migrants in the ouskirts of many major cities in developing countries around the world. Migrants often settle in undesireable, publicly owned lands and build shelters on an ad hoc basis with materials considered unsuitable for housing. National and local governments have generally provide little to no support for these populations.


In [47]:
import csv
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [48]:
# Reading World Bank Data: Global Slum Population Percentages

slum_df = pd.read_csv('API_EN.POP.SLUM.UR.ZS_DS2_en_csv_v2_10034237.csv')
slum_percentage_dict = slum_df.to_dict()

slum_percentage_dict.keys()

dict_keys(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code', '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968', '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977', '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', 'Unnamed: 62'])

In [49]:
slum_percentage_dict['Country Name']

{0: 'Aruba',
 1: 'Afghanistan',
 2: 'Angola',
 3: 'Albania',
 4: 'Andorra',
 5: 'Arab World',
 6: 'United Arab Emirates',
 7: 'Argentina',
 8: 'Armenia',
 9: 'American Samoa',
 10: 'Antigua and Barbuda',
 11: 'Australia',
 12: 'Austria',
 13: 'Azerbaijan',
 14: 'Burundi',
 15: 'Belgium',
 16: 'Benin',
 17: 'Burkina Faso',
 18: 'Bangladesh',
 19: 'Bulgaria',
 20: 'Bahrain',
 21: 'Bahamas, The',
 22: 'Bosnia and Herzegovina',
 23: 'Belarus',
 24: 'Belize',
 25: 'Bermuda',
 26: 'Bolivia',
 27: 'Brazil',
 28: 'Barbados',
 29: 'Brunei Darussalam',
 30: 'Bhutan',
 31: 'Botswana',
 32: 'Central African Republic',
 33: 'Canada',
 34: 'Central Europe and the Baltics',
 35: 'Switzerland',
 36: 'Channel Islands',
 37: 'Chile',
 38: 'China',
 39: "Cote d'Ivoire",
 40: 'Cameroon',
 41: 'Congo, Dem. Rep.',
 42: 'Congo, Rep.',
 43: 'Colombia',
 44: 'Comoros',
 45: 'Cabo Verde',
 46: 'Costa Rica',
 47: 'Caribbean small states',
 48: 'Cuba',
 49: 'Curacao',
 50: 'Cayman Islands',
 51: 'Cyprus',
 52: 'C

In [50]:
# Rows needed: 
# 107: 'India',
# 119: 'Kenya',
# 152: 'Mexico',
# 182: 'Pakistan',
# 261: 'South Africa',

INDIA = 107
KENYA = 119
MEXICO = 152
PAKISTAN = 182
SOUTH_AFRICA = 261

# Columns needed:
# Values for 'country name' and years '1990', '1995', '2000', '2005', '2007', '2009', '2014'.
# Assign values for each column to a variable
_1990 = slum_percentage_dict['1990']
_1995 = slum_percentage_dict['1995']
_2000 = slum_percentage_dict['2000']
_2005 = slum_percentage_dict['2005']
_2007 = slum_percentage_dict['2007']
_2009 = slum_percentage_dict['2009']
_2014 = slum_percentage_dict['2014']

# Testing to check value for India in 1990 (column '1990', row 107) 
print(_1990[INDIA]) 

type(_2014)

54.9


dict

In [51]:
# Combine list variables into one list
# list of dictionaries
slum_yearly_values = [_1990, _1995, _2000, _2005, _2007, _2009, _2014]

# Create a function to extract values pertaining to each country from each year
def output_country_slumpercents(country):
    country_values_list = []
    
    for year in slum_yearly_values:
        country_values_list.append(year[country])

    return country_values_list

# Run country index through function 
# Assign array of output values to new variables to plot on the x-axis 
india_slumpercents = output_country_slumpercents(INDIA)
kenya_slumpercents = output_country_slumpercents(KENYA)
mexico_slumpercents = output_country_slumpercents(MEXICO)
pakistan_slumpercents = output_country_slumpercents(PAKISTAN)
southafrica_slumpercents = output_country_slumpercents(SOUTH_AFRICA)

# Create list of years to plot on y-axis
years = [1990, 1995, 2000, 2005, 2007, 2009, 2014]

# Test print
print(mexico_slumpercents) 

[23.1, 21.5, 19.9, 14.4, 14.4, 14.4, 11.1]


In [52]:
import plotly
import plotly.plotly as py
import plotly.graph_objs as go
plotly.offline.init_notebook_mode(connected=True)

# Trace structure = {'type': 'scatter', 'x': years, 'y': country_values, 'name': name}
# Create a for loop to iterate through country tuples (name, country_values) to create a list of traces for every country
slumpercent_traces = []
for country in [("Kenya", kenya_slumpercents), ("Pakistan", pakistan_slumpercents), ("India", india_slumpercents), ("South Africa", southafrica_slumpercents), ("Mexico", mexico_slumpercents)]:
    slumpercent_traces.append({'type': 'scatter', 'x': years, 'y': country[1], 'name': country[0]})

# slumpercent_traces = [india_trace, kenya_trace, mexico_trace, pakistan_trace, southafrica_trace]
layout = go.Layout(
    title="Percentage of Total Population Living in Slums (1990-2014)",
    barmode='stack',
    xaxis=dict(
        title='Years',
        titlefont=dict(
            family='Helvetica, monospace',
            size=12,
        )
    ),
    yaxis=dict(
        title='Population (%)',
        titlefont=dict(
            family='Helvetica, monospace',
            size=12,
        )
    )
)

fig = go.Figure(data=slumpercent_traces, layout=layout)
plotly.offline.iplot(fig)


In [53]:
# Reading World Bank Data: Global Total Populations

totalpop_df = pd.read_csv('API_SP.POP.TOTL_DS2_en_csv_v2_10079323.csv')
country_population_dict = totalpop_df.to_dict()

country_population_dict.keys()

dict_keys(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code', '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968', '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977', '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', 'Unnamed: 62'])

In [54]:
# Rows needed:
# INDIA = 107
# KENYA = 119
# MEXICO = 152
# PAKISTAN = 182
# SOUTH_AFRICA = 261

# Columns needed:
country_populations_1990 = country_population_dict['1990']
country_populations_1995 = country_population_dict['1995']
country_populations_2000 = country_population_dict['2000']
country_populations_2005 = country_population_dict['2005']
country_populations_2007 = country_population_dict['2007']
country_populations_2009 = country_population_dict['2009']
country_populations_2014 = country_population_dict['2014']

country_populations_yearly_values = [country_populations_1990, country_populations_1995, country_populations_2000, country_populations_2005, country_populations_2007, country_populations_2009, country_populations_2014]


In [55]:
def output_country_totalpops(country):
    country_values_list = []
    
    for year in country_populations_yearly_values:
        country_values_list.append(year[country])
    
    return country_values_list

india_totalpops = output_country_totalpops(INDIA)
kenya_totalpops = output_country_totalpops(KENYA)
mexico_totalpops = output_country_totalpops(MEXICO)
pakistan_totalpops = output_country_totalpops(PAKISTAN)
southafrica_totalpops = output_country_totalpops(SOUTH_AFRICA)

print(india_totalpops)


[870133480.0, 960482795.0, 1053050912.0, 1144118674.0, 1179681239.0, 1214270132.0, 1293859294.0]


In [56]:
# years = [1990, 1995, 2000, 2005, 2007, 2009, 2014]

totalpop_traces = []
for country in [("India", india_totalpops), ("Pakistan", pakistan_totalpops), ("Mexico", mexico_totalpops), ("South Africa", southafrica_totalpops), ("Kenya", kenya_totalpops)]:
    totalpop_traces.append({'type': 'scatter', 'x': years, 'y': country[1], 'name': country[0]})

layout = go.Layout(
    title="Total Population Growth (1990-2014)",
    barmode='stack',
    xaxis=dict(
        title='Years',
        titlefont=dict(
            family='Helvetica, monospace',
            size=12,
        )
    ),
    yaxis=dict(
        title='Population (in millions)',
        titlefont=dict(
            family='Helvetica, monospace',
            size=12,
        )
    )
)

fig = go.Figure(data=totalpop_traces, layout=layout)
plotly.offline.iplot(fig)


In [57]:
# Create a function to that takes in 'slumpercents' and 'totalpops' 
# to calculate slum population values (percentages*total populations/100)
def output_country_slumpops(list_1, list_2):
    country_slumpops = []
    for i in range(0,len(list_1)):
        country_slumpops.append(int((list_1[i]*list_2[i])/100))
    return country_slumpops
    
india_slumpops = output_country_slumpops(india_slumpercents, india_totalpops)
kenya_slumpops = output_country_slumpops(kenya_slumpercents, kenya_totalpops)
mexico_slumpops = output_country_slumpops(mexico_slumpercents, mexico_totalpops)
pakistan_slumpops = output_country_slumpops(pakistan_slumpercents, pakistan_totalpops)
southafrica_slumpops = output_country_slumpops(southafrica_slumpercents, southafrica_totalpops)    

# Test
print(india_slumpops)


[477703280, 462952707, 437016128, 398153298, 378677677, 356995418, 310526230]


In [58]:
slumpop_traces = []
for country in [("India", india_slumpops), ("Pakistan", pakistan_slumpops), ("Kenya", kenya_slumpops), ("South Africa", southafrica_slumpops, ("Mexico", mexico_slumpops))]:
    slumpop_traces.append({'type': 'scatter', 'x': years, 'y': country[1], 'name': country[0]})

layout = go.Layout(
    title="Slum Population Growth (1990-2014)",
    barmode='stack',
    xaxis=dict(
        title='Years',
        titlefont=dict(
            family='Helvetica, monospace',
            size=14,
        )
    ),
    yaxis=dict(
        title='Population (in millions)',
        titlefont=dict(
            family='Helvetica, monospace',
            size=14,
        )
    )
)

fig = go.Figure(data=slumpop_traces, layout=layout)
plotly.offline.iplot(fig)



In [59]:
gdp_df = pd.read_csv('API_NY.GDP.MKTP.CD_DS2_en_csv_v2_10080925.csv')
gdp_dict = gdp_df.to_dict()

gdp_dict.keys()

dict_keys(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code', '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968', '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977', '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', 'Unnamed: 62'])

In [60]:
gdp_1990 = gdp_dict['1990']
gdp_1995 = gdp_dict['1995']
gdp_2000 = gdp_dict['2000']
gdp_2005 = gdp_dict['2005']
gdp_2007 = gdp_dict['2007']
gdp_2009 = gdp_dict['2009']
gdp_2014 = gdp_dict['2014']

gdp_yearly_values = [gdp_1990, gdp_1995, gdp_2000, gdp_2005, gdp_2007, gdp_2009, gdp_2014]


def output_country_gdp(country):
    country_values_list = []
    
    for year in gdp_yearly_values:
        country_values_list.append(int(year[country]))

    return country_values_list

india_gdp = output_country_gdp(INDIA)
kenya_gdp = output_country_gdp(KENYA)
mexico_gdp = output_country_gdp(MEXICO)
pakistan_gdp = output_country_gdp(PAKISTAN)
southafrica_gdp = output_country_gdp(SOUTH_AFRICA)

gdp_traces = []
for country in [("India", india_gdp), ("Mexico", mexico_gdp), ("South Africa", southafrica_gdp), ("Pakistan", pakistan_gdp), ("Kenya", kenya_gdp)]:
    gdp_traces.append({'type': 'scatter', 'x': years, 'y': country[1], 'name': country[0]})

layout = go.Layout(
    title='Economic Growth (1990-2014)',
    xaxis=dict(
        title='Years',
        titlefont=dict(
            family='Helvetica, monospace',
            size=14,
            color='#7f7f7f'
        )
    ),
    yaxis=dict(
        title='GDP ($)',
        titlefont=dict(
            family='Helvetica, monospace',
            size=14,
            color='#7f7f7f'
        )
    )
)
fig = go.Figure(data=gdp_traces, layout=layout)
plotly.offline.iplot(fig, filename='styling-names')

#plotly.offline.iplot(gdp_traces)

In [61]:
# Visualization of GDP, Population, and Slum Population Growth Rates in each country 

# Values to visualize 

def output_growthrates(list):
    growthrates = []
    for i in range(0,len(list)-1):
        growthrates.append((list[i+1]-list[i])/list[i])
    return growthrates

def output_3_growthrates(country_gdp, country_totalpop, country_slumpop):
    country_gdp_growthrate = output_growthrates(country_gdp)
    country_population_growthrate = output_growthrates(country_totalpop)
    country_slum_growthrate = output_growthrates(country_slumpop)
    return country_gdp_growthrate, country_population_growthrate, country_slum_growthrate

india = 'India'
kenya = 'Kenya'
mexico = 'Mexico'
pakistan = 'Pakistan'
southafrica = 'South Africa'

india_growthrates = output_3_growthrates(india_gdp,india_totalpops,india_slumpops)
kenya_growthrates = output_3_growthrates(kenya_gdp,kenya_totalpops,kenya_slumpops)
mexico_growthrates = output_3_growthrates(mexico_gdp,mexico_totalpops,mexico_slumpops)
pakistan_growthrates = output_3_growthrates(pakistan_gdp,pakistan_totalpops,pakistan_slumpops)
southafrica_growthrates = output_3_growthrates(southafrica_gdp,southafrica_totalpops,southafrica_slumpops)


def chart_3_growthrates(country, country_growthrates):
    country_gdp_growthrate = go.Scatter(
        x = years,
        y = country_growthrates[0],
        mode = 'lines+markers',
        name = 'GDP'
    )
    country_population_growthrate = go.Scatter(
        x = years,
        y = country_growthrates[1],
        mode = 'lines+markers',
        name = 'Total Population'
    )
    country_slum_growthrate = go.Scatter(
        x = years,
        y = country_growthrates[2],
        mode = 'lines+markers',
        name = 'Slum Population'
    )
    
    data = [country_gdp_growthrate, country_population_growthrate, country_slum_growthrate]

    layout = dict(
        title = country + ' : GDP to Population Analysis',
        xaxis=dict(
            zeroline = False,
            title='Years',
            titlefont=dict(
                family='Helvetica, monospace',
                size=14,
            )
        ),
        yaxis=dict(
            zeroline = False,
            title='Growth Rate (%)',
            titlefont=dict(
                family='Helvetica, monospace',
                size=14,
            )
        )
    )
    fig = dict(data=data, layout=layout)

    return plotly.offline.iplot(fig)

chart_3_growthrates(india, india_growthrates)
chart_3_growthrates(kenya, kenya_growthrates)
chart_3_growthrates(mexico, mexico_growthrates)
chart_3_growthrates(pakistan, pakistan_growthrates)
chart_3_growthrates(southafrica, southafrica_growthrates)


# FINAL PROJECT 

In [62]:
# Population growth and GDP growth have been recognized to have a positive correlation.
# I hypothesize that slum population growth and GDP growth will have a negative correlation on the assumption that
# the wealthier a nation state becomes the more resources and economic opportunities there will be to lift people out
# of poor living conditions and into the middle class.

# 1.  slum population % —> urban population —> GDP growth
#     1. visualize: 
#         1. slum population growth (numbers) against GDP growth regression analysis
#         2. ... or slum percentage growth rate against GDP growth rate???


india_regression = (india , india_gdp, india_slumpops)
mexico_regression = (mexico ,mexico_gdp, mexico_slumpops)
southafrica_regression = (southafrica ,southafrica_gdp, southafrica_slumpops)
pakistan_regression = (pakistan,pakistan_gdp, pakistan_slumpops)
kenya_regression = (kenya ,kenya_gdp, kenya_slumpops)

def chart_gdp_population(country, country_regression):
    country_regression = go.Scatter(
        x = country_regression[1],
        y = country_regression[2],
        mode = 'lines+markers',
        name = 'Population Growth vs GDP Growth'
    )
    
    data = [country_regression]

    layout = dict(
        title = country,
        xaxis=dict(
            zeroline = False,
            title='GDP ($)',
            titlefont=dict(
                family='Helvetica, monospace',
                size=14,
            )
        ),
        yaxis=dict(
            zeroline = False,
            title='Slum Population',
            titlefont=dict(
                family='Helvetica, monospace',
                size=14,
            )
        )
    )
    fig = dict(data=data, layout=layout)

    return plotly.offline.iplot(fig)

chart_gdp_population(india, india_regression)
chart_gdp_population(kenya, kenya_regression)
chart_gdp_population(mexico, mexico_regression)
chart_gdp_population(pakistan, pakistan_regression)
chart_gdp_population(southafrica, southafrica_regression)



# def chart_population_and_gdp_traces(country, country_gdp, country_slumpops:
#     for 

# population_and_gdp_traces = []
# for country in [("India", india_gdp, india_slumpops), ("Mexico", mexico_gdp, mexico_slumpops), ("South Africa", southafrica_gdp, southafrica_slumpops), ("Pakistan", pakistan_gdp, pakistan_slumpops), ("Kenya", kenya_gdp, kenya_slumpops)]:
#     population_and_gdp_traces.append({'type': 'scatter', 'x': country[1], 'y': country[2], 'name': country[0]})

# layout = go.Layout(
#     title='Population Growth x Economic Growth (1990-2014)',
#     xaxis=dict(
#         title='GDP ($)',
#         titlefont=dict(
#             family='Helvetica, monospace',
#             size=14,
#             color='#7f7f7f'
#         )
#     ),
#     yaxis=dict(
#         title='Slum Population',
#         titlefont=dict(
#             family='Helvetica, monospace',
#             size=14,
#             color='#7f7f7f'
#         )
#     )
# )
# fig = go.Figure(data=population_and_gdp_traces, layout=layout)
# plotly.offline.iplot(fig, filename='styling-names')

In [None]:
from scipy import stats

x = india_gdp
y = india_slumpops
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)

print("r-squared:", r_value**2)

plt.plot(x, y, 'o', label='original data')
plt.plot(x, intercept + slope*x, 'r', label='fitted line')
plt.legend()
plt.show()

In [42]:
#     2. anomalies due to: 
#         1. slum population is increasing because total population is increasing? 
#         2. even though GDP growth is increasing it is lower than median of the world? (
#         3. (decrease in rural economy, increase in rate of industrialization (or service work?) causing faster urban migration)
#         4. (relative increase/decrease of investment in sustainable urban development)

In [None]:
# 2. project how many years it will take in order to decrease slum population (India) 
#     1. forecast using regression for India


In [None]:
# 3. OR assuming that slum population growth + GDP growth have a correlation, what is the GDP growth that will be needed to reverse the trend for countries with increasing slum population (rest of the countries)?? (maybe not)
#     1. research how to calculate trend reversal***


In [None]:
# 4. (next analyze—> how much money is each country investing in sustainable urban development (World Bank))