# World Development Indicators (1960 - 2014): Analysis

## Introduction

### Objective:

* Uncover global economic trends

* Find and explain discrpeancies between the economic development of different nations.
    * Compare the economies of the three most advanced NICs (Brazil, China, India) with the rest of the highest-GDP countries and the world
    * Assess the relationship between GDP growth with foreign investment and population demographics

### Methodology

* Data extraction from SQLite database using SQLite3 connection and storage within pandas DataFrames

* Data vizualization using plotly (treemaps, bar charts, chloropleth maps, line charts etc.)

* Correlation measurement using Pearson (linear) and Spearman (non-linear) metrics provided by scipy.

### Key Findings

* From the 2014 data (latest year included), there is a negative correlation between the country's income (per capita) and the rate of GDP growth on average.

* Higher income countries were more adversely affected (i.e. larger decrease in GDP) during the 2008 global financial crisis.

* Three of the Top 10 Highest-GDP countries (2014) are not classed as high income countries: Brazil, China, and India. These three are the only newly industrialized countries (NICs) in the top 10, and the differences in economic strength and development, between each other and the rest of the world, were examined:
    * Brazil's GDP growth has greater than all other groups during the 1960s and 70s; from the 1980s onwards, China followed by India have much higher GDP growth than the others

    * Strong evidence of a linear correlation between foreign investment and GDP growth in China, Brazil, and the Rest of the Top 10, particularly China where a spike in foreign investment (% of GDP) in 1992-94 coincided with the initiation of exponential GDP growth since then.

    * The relatively low significance of the services sector to the economies of China and India (compared with the Rest of the Top 10) is likely a contributing factor to the much smaller impact the 2008 financial crisis had upon their economies; in fact, both countries experinced GDP growth.
    
    * Clear evidence of a linear correlation between the working age population and GDP growth in China, Brazil, and the Rest of the Top 10; however, the linear correlation coefficeint for the Rest of the Top 10 data is significantly greater than that of China or Brazil, indicating that greater economic value per worker in the Rest of the Top 10.

## Initial Set-up

### Install relevant packages and connect to database

In [1]:
import pandas as pd
import sqlite3
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np
from scipy.stats import pearsonr, spearmanr

# Connect to the database or create one if it doesn't exist
connection = sqlite3.connect('indicators.sqlite')

# Create a cursor object to interact with the database
cursor = connection.cursor()


In [2]:
'''<--- RUN CELL BEFORE CONTINUING --->'''


# NIC country codes
NICs = "'BRA', 'CHN', 'IND'"

# query returns non-NIC top 10 GDP country codes
other_top_10 = f"""
SELECT c.CountryCode FROM Indicators i
JOIN Country c ON i.CountryCode = c.CountryCode
WHERE i.IndicatorCode = 'NY.GDP.MKTP.CD' AND Year = 2014 AND c.Region != 'UNKNOWN' AND c.CountryCode NOT IN ({NICs})
ORDER BY i.Value DESC
LIMIT 7
"""

# query returns country codes outside top 10 GDP countries
rest_of_world = f"""
SELECT c.CountryCode FROM Indicators i
JOIN Country c ON i.CountryCode = c.CountryCode
WHERE c.CountryCode NOT IN ({NICs}) AND c.CountryCode NOT IN ({other_top_10}) AND c.Region != 'UNKNOWN'
"""

def NIC_comp(IndCode: str, IndName: str):
    
    """ Function that returns (by years) the given indicator data for each country/group
    """

    # obtain annual GDP growth for NICs
    query = f"""
    SELECT c.TableName as Country, i.Year as Year, i.Value as {IndName} FROM Indicators i
    JOIN Country c ON i.CountryCode = c.CountryCode
    WHERE i.IndicatorCode = '{IndCode}' AND c.CountryCode IN ({NICs})
    GROUP BY Country, Year
    ORDER BY Year ASC
    """

    NIC_var = pd.read_sql_query(query, connection)

    # extract average annual GDP growth rate for rest of current top 10
    query = f"""
    SELECT i.Year as Year, AVG(i.Value) as {IndName} FROM Indicators i
    JOIN Country c ON i.CountryCode = c.CountryCode
    WHERE i.IndicatorCode = '{IndCode}' AND c.CountryCode IN ({other_top_10})
    GROUP BY Year
    ORDER BY Year ASC
    """

    oth_top_10_var = pd.read_sql_query(query, connection)
    oth_top_10_var['Country'] = 'Rest of Top 10'
    oth_top_10_var = oth_top_10_var[['Country', 'Year', IndName]]


    # obtain average annual GDP growth for rest of world
    query = f"""
    SELECT i.Year as Year, AVG(i.Value) as {IndName} FROM Indicators i
    JOIN Country c ON i.CountryCode = c.CountryCode
    WHERE i.IndicatorCode = '{IndCode}' AND c.CountryCode IN ({rest_of_world})
    GROUP BY Year
    ORDER BY Year ASC
    """

    rest_of_world_var = pd.read_sql_query(query, connection)
    rest_of_world_var['Country'] = 'Rest of World'
    rest_of_world_var = rest_of_world_var[['Country', 'Year', IndName]]

    # Combine dataframes and pivot such that each rows are by year and columns by country/group
    comb_dataframe = pd.concat([NIC_var, oth_top_10_var, rest_of_world_var], axis=0)
    comb_dataframe = pd.pivot(comb_dataframe, index='Year', columns='Country', values=IndName).reset_index()

    # Add decade column to dataframe
    bins = list(np.arange(1959,2029,10))
    labels = ["1960s", "1970s", "1980s", "1990s", "2000s", "2010s"]
    comb_dataframe['Decade'] = pd.cut(comb_dataframe['Year'], bins=bins, labels=labels)

    return comb_dataframe

## Exploratory Data Analysis

### Indicator Assessment

Determine how many unique indicators are present

In [13]:
query = """
SELECT COUNT(DISTINCT IndicatorName) FROM Indicators
"""

indicators_count = cursor.execute(query).fetchone()[0]
print(f"Number of Indicators: {indicators_count}")

Number of Indicators: 1344


From a visual survey of the IndicatorName data in the Indicators table, the the most relevant indicators can be categorized into the following groups: Social, Economic, Environmental, Infrastructure.

Next step: determine how many indicators of each category are present

In [14]:
ind_categories = ['Social', 'Economic', 'Environment', 'Infrastructure']
ind_count = []

for cat in ind_categories:
    query = f"SELECT COUNT(DISTINCT IndicatorName) FROM Series WHERE Topic LIKE '{cat}%'"
    ind_count = cursor.execute(query).fetchone()[0]
    print(f"No. {cat} Indicators: {ind_count}")


No. Social Indicators: 148
No. Economic Indicators: 506
No. Environment Indicators: 122
No. Infrastructure Indicators: 36


#### Comments

* Significant number of each type of indicator, particularly high number of economic indicators; manual assessment of the most useful indicators is required.
* The primary focus of the investigation is economic, so the economic indicators should be the primary focus and other types used in conjunction with economic to discern cuses/effects of economic development

### Global Economic Analysis

Gross Dometic Product (GDP) is one the the most useful metrics for national economic strength and will therefore be the primary focus of the economic indicator analysis.

To start with, the overall GDP and GDP growth rate by country/region should be evaluated to provide insight into current and future economic trends.

In [19]:
# obtain GDP for each year by country

query = """
SELECT c.TableName as Country, i.Year as Year, i.Value as GDP FROM Indicators i
JOIN Country c ON i.CountryCode = c.CountryCode
WHERE i.IndicatorCode = 'NY.GDP.MKTP.CD' AND c.Region != 'UNKNOWN'
ORDER BY Year, Country
"""

global_gdp = pd.read_sql_query(query, connection)

query = """
SELECT c.TableName as Country, i.Year as Year, i.Value as GDP_Growth FROM Indicators i
JOIN Country c ON i.CountryCode = c.CountryCode
WHERE i.IndicatorCode = 'NY.GDP.MKTP.KD.ZG' AND c.Region != 'UNKNOWN'
ORDER BY Year, Country
"""

global_growth = pd.read_sql_query(query, connection)

In [20]:
# Create the choropleth map
fig1 = px.choropleth(
    global_gdp,
    locations='Country',
    locationmode='country names',
    color='GDP',
    animation_frame='Year',
    color_continuous_scale='Portland',
    projection='natural earth',
    
)

# Set the titles
fig1.update_layout(
    title_text='GDP by Country (1960 - 2014)',
    coloraxis_colorbar_title='GDP (US$)',
    template='plotly_dark'
    )

fig2 = px.choropleth(
    global_growth,
    locations='Country',
    locationmode='country names',
    color='GDP_Growth',
    animation_frame='Year',
    color_continuous_scale='Portland',
    projection='natural earth'
)

# Set the titles
fig2.update_layout(
    title_text='GDP Growth by Country (1961 - 2014)',
    coloraxis_colorbar_title='GDP Growth (%)',
    template='plotly_dark'
    )

# Show the plot
fig1.show()
fig2.show()

In [35]:
# extract GDP and GDP growth by global region

query = """
SELECT c.Region as Region, i.IndicatorCode as Code, i.Year as Year, AVG(i.Value) as Avg_GDP_Growth FROM Indicators i
JOIN Country c ON i.CountryCode = c.CountryCode
WHERE Region != 'UNKNOWN' AND Code IN ('NY.GDP.MKTP.KD.ZG', 'NY.GDP.MKTP.CD')
GROUP BY Code, Region, Year
ORDER BY Year
"""

region_gdp = pd.read_sql_query(query, connection)
region_gdp

region_growth = region_gdp[region_gdp['Code'] == 'NY.GDP.MKTP.KD.ZG']
region_gdp = region_gdp[region_gdp['Code'] == 'NY.GDP.MKTP.CD']
region_gdp.rename(columns={'Avg_GDP_Growth': 'Avg_GDP'}, inplace=True)


In [36]:
# pivot table so rows are by year and columns by region
region_gdp = pd.pivot(region_gdp, index='Year', columns='Region', values='Avg_GDP').reset_index()
region_gdp.head()

Region,Year,East Asia & Pacific,Europe & Central Asia,Latin America & Caribbean,Middle East & North Africa,North America,South Asia,Sub-Saharan Africa
0,1960,11215660000.0,15749640000.0,2281674000.0,2398129000.0,194826000000.0,8021161000.0,834864300.0
1,1961,11248060000.0,16765200000.0,2422961000.0,2525226000.0,201385700000.0,8554637000.0,866590000.0
2,1962,11493860000.0,18329010000.0,3622591000.0,2456026000.0,215724300000.0,9132457000.0,910917500.0
3,1963,12811340000.0,20211630000.0,3633303000.0,2743311000.0,227784500000.0,10270090000.0,1045830000.0
4,1964,14725320000.0,22382830000.0,4043312000.0,3028766000.0,244930200000.0,11767530000.0,1018065000.0


In [37]:
region_growth = pd.pivot(region_growth, index='Year', columns='Region', values='Avg_GDP_Growth').reset_index()
region_growth.head()

Region,Year,East Asia & Pacific,Europe & Central Asia,Latin America & Caribbean,Middle East & North Africa,North America,South Asia,Sub-Saharan Africa
0,1961,2.928777,5.40387,5.237807,2.92281,3.381097,4.419129,2.516917
1,1962,4.078484,4.87342,5.834027,9.267897,5.895593,3.719071,5.117477
2,1963,7.360783,5.528608,3.889551,8.798937,3.550592,3.72783,3.407647
3,1964,5.622965,6.573064,6.678619,5.480712,7.869819,7.482887,4.62662
4,1965,6.827285,4.878703,5.05987,44.893175,5.932182,2.144726,4.831957


In [38]:
# plot line chart displaying time evolution of GDP growth by region

plots = [(region_gdp, 'Average GDP', '(US$)'), (region_growth, 'Average GDP Growth', '(%)')]

for plot in plots:

    fig = go.Figure()

    for col in plot[0].columns[1:]:
        fig.add_trace(go.Scatter(x=plot[0]['Year'], y=plot[0][col], mode='lines', name=col))

    fig.update_layout(
        title=f"{plot[1]} by Region",
        yaxis_title=f'{plot[1]} {plot[2]}',
        xaxis_title='Year',
        template='plotly_dark'
    )

    # Show the plot
    fig.show()

#### Observations

* North America's GDP has increased exponetially above that of the rest of the world primarily due to the relatively massive economic growth of the USA post-WW2. Notably, the relatively small number of countries in North America compared to other regions results in the average GDP of North America being very heavily influenced by the USA, which is the highest GDP country by a considerable margin.

* However, North America impacted the most negatively during the 2008 finanical crisis as they experienced the biggest % drop in annual GDP.

* From 1960 to 1980, the Middle East & North Africa consistently experienced the highest annual GDP % growth because of increasing demand (and autonomy) for oil and so greater industrialization within the region.

#### GDP Growth by Income Group

To determine whether the wealth disparity between high income (per capita) countries and low income countries is increasing, the current GDP growth of each group may provide valuable insight

In [30]:
# find average GDP growth % by income group

query = """
SELECT c.IncomeGroup as Income_Group, AVG(i.Value) as Average_GDP_Growth
FROM Indicators i JOIN Country c ON i.CountryCode = c.CountryCode
WHERE i.IndicatorCode = 'NY.GDP.MKTP.KD.ZG' AND Year = 2014 AND c.Region != 'UNKNOWN'
GROUP BY Income_Group
ORDER By Average_GDP_Growth DESC
LIMIT 10
"""

inc_group_gdp = pd.read_sql_query(query, connection)
inc_group_gdp

Unnamed: 0,Income_Group,Average_GDP_Growth
0,Low income,4.86914
1,Lower middle income,4.307527
2,Upper middle income,2.803407
3,High income: OECD,1.953577
4,High income: nonOECD,1.624537


In [31]:
# Plot bar chart displaying gdp growth by income group
fig = px.bar(inc_group_gdp, x='Income_Group', y='Average_GDP_Growth', color='Average_GDP_Growth')
fig.update_layout(
    template="plotly_dark",
    xaxis_title="Income Group",
    yaxis_title="Annual GDP Growth (%)",
    title="Average Annual GDP Growth by Income Group",
    coloraxis_colorbar_title='Average Growth (%)'
)
fig.show()

##### Comment

* On average, annual GDP % growth is inversely proportional to income (per capita), suggesting that smaller economies are growing (relatively) faster than large economies
    * Whether this can be sustained over a long period of time is unclear; although, historical GDP growth trends may provide insight

In [32]:
# time evolution of average GDP growth by income group

query = """
SELECT c.IncomeGroup as Income_Group, i.Year as Year, AVG(i.Value) as Avg_GDP_Growth
FROM Indicators i JOIN Country c ON i.CountryCode = c.CountryCode
WHERE i.IndicatorCode = 'NY.GDP.MKTP.KD.ZG' AND c.Region != 'UNKNOWN'
GROUP BY Income_Group, Year
ORDER BY Year ASC
"""

avg_growth_time = pd.read_sql_query(query, connection)

# pivot dataframe such that each income group is represented as feature
avg_growth_time = pd.pivot(avg_growth_time, index='Year', columns='Income_Group', values='Avg_GDP_Growth').reset_index()
avg_growth_time.head()

Income_Group,Year,High income: OECD,High income: nonOECD,Low income,Lower middle income,Upper middle income
0,1961,5.70358,5.193432,1.266049,3.998395,3.110058
1,1962,5.166553,6.781221,4.952834,5.616442,4.750656
2,1963,5.90924,4.926467,1.732134,3.818408,6.871431
3,1964,6.774027,6.231664,2.066522,6.744265,6.62853
4,1965,5.216967,23.722879,4.241729,5.354967,5.741169


In [33]:
# plot line chart displaying time evolution of GDP growth by group

fig = go.Figure()

for col in avg_growth_time.columns[1:]:
    fig.add_trace(go.Scatter(x=avg_growth_time['Year'], y=avg_growth_time[col], mode='lines', name=col))

fig.update_layout(
    title="Average Annual GDP Growth by Income Group (1961 - 2014)",
    yaxis_title='Average Annual GDP Growth (%)',
    xaxis_title='Year',
    template='plotly_dark'
)

fig.show()

##### Observations

* Until the 1980s, low income countries almost always had lower annual GDP % growth compared to high income countries

* Since the 2008 finanical crisis, low and lower middle income countries have had higher annual GDP % growth; interestingly, these groups were the only ones that, on average, had positive GDP growth in 2008.

* During the 1960s and early 70s, the discrepancy between low income countries and high income non-OECD countries was particularly high, indicative of larger disparities in effective economic growth at that time between MEDCs and LEDCs.

#### Highest Countries by GDP (current US$) and annual GDP growth

In [9]:
# extract the top 10 countries by GDP (USD) in latest year (2014)
query = """
SELECT c.ShortName as Country, c.IncomeGroup as Income_Group, i.Value as GDP_US$ FROM Indicators i
JOIN Country c ON i.CountryCode = c.CountryCode
WHERE i.IndicatorCode = 'NY.GDP.MKTP.CD' AND Year = 2014 AND c.Region != 'UNKNOWN'
GROUP BY Country
ORDER BY GDP_US$ DESC
LIMIT 10
"""

gdp_top_10 = pd.read_sql_query(query, connection)
gdp_top_10

Unnamed: 0,Country,Income_Group,GDP_US$
0,United States,High income: OECD,17419000000000.0
1,China,Upper middle income,10354830000000.0
2,Japan,High income: OECD,4601461000000.0
3,Germany,High income: OECD,3868291000000.0
4,United Kingdom,High income: OECD,2988893000000.0
5,France,High income: OECD,2829192000000.0
6,Brazil,Upper middle income,2346076000000.0
7,Italy,High income: OECD,2141161000000.0
8,India,Lower middle income,2048517000000.0
9,Russia,High income: nonOECD,1860598000000.0


##### Comments

* Three of the top 10 highest GDP countries are not defined as high income countries: Brazil, China, and India. These countries are Newly Industrizalized Countries (NICs) whose rapid industrialization during the latter half of the twentieth century has resulted in economic prowess comparable to the other high-GDP countries.

    * A prudent exploration point is to examine the causes and effects of the rapid industrizalization of these NICs to determine the benefits and risks to less-developed countries following a similar path currently.

In [57]:
# Create the treemap chart

fig = px.treemap(gdp_top_10, path=['Country'], values='GDP_US$', color='GDP_US$')
fig.update_layout(
    title="Top 10 Countries by 2014 GDP (US$)",
    coloraxis_colorbar_title="GDP (US$)",
    template='plotly_dark'
)

# Show the plot
fig.show()

In [29]:
# extract top 10 countries by latest GDP growth

query = """
SELECT c.ShortName as Country, c.IncomeGroup as Income_Group, i.Year as Year, i.Value as GDP_Growth FROM Indicators i
JOIN Country c ON i.CountryCode = c.CountryCode
WHERE i.IndicatorCode = 'NY.GDP.MKTP.KD.ZG' AND Year = 2014 AND c.Region != 'UNKNOWN'
GROUP BY Country
ORDER BY GDP_Growth DESC
LIMIT 10
"""

growth_top_10 = pd.read_sql_query(query, connection)
growth_top_10

Unnamed: 0,Country,Income_Group,Year,GDP_Growth
0,Turkmenistan,Upper middle income,2014,10.299983
1,Ethiopia,Low income,2014,10.279187
2,Dem. Rep. Congo,Low income,2014,9.046596
3,Côte d'Ivoire,Lower middle income,2014,8.546468
4,Papua New Guinea,Lower middle income,2014,8.533902
5,Myanmar,Lower middle income,2014,8.499664
6,Uzbekistan,Lower middle income,2014,8.1
7,Palau,Upper middle income,2014,7.951975
8,Mongolia,Upper middle income,2014,7.823899
9,Lao PDR,Lower middle income,2014,7.51527


In [49]:
# Plot horizontal bar chart displaying most recent fastest growing GDP countries
fig = px.bar(growth_top_10, x='GDP_Growth', y='Country', orientation='h', color='GDP_Growth')
fig.update_layout(
    xaxis_title='Annual GDP Growth in 2014 (%)',
    title="Top 10 Countries by Annual GDP Growth in 2014",
    coloraxis_colorbar_title='Growth (%)',
    template='plotly_dark'
    
)
fig.show()

##### Observations:

* Large disparity between the highest-GDP country (USA) and the others in the top 10

* 7 of the top 10 highest GDP growth countries are of low/low-middle income; in fact, 50% of the top 10 are low-middle income countries, suggesting their GDP growth could be the largest on average - this assertion requires further investigation

### Development of large-GDP NICs

As mentioned previously, Brazil, China, and India are NICs that have become three of the highest GDP countries in the world through recent expeditious industrialization. The focus of the rest of the investigation will be probing the potential causes and effects of the rapid economic growth of these countries.

#### GDP Growth Rate

Firstly, the annual GDP growth rate should be examined to determine the timeframes within which the NICs developed most rapidly.

In [3]:
# obtain NIC (and other groups) GDP and GDP growth data
NIC_gdp_growths = NIC_comp('NY.GDP.MKTP.KD.ZG', 'GDP_Growth')
NIC_gdp = NIC_comp('NY.GDP.MKTP.CD', 'GDP')

In [12]:
plots = [(NIC_gdp, 'Average GDP', '(US$)'), (NIC_gdp_growths, 'Average Annual GDP Growth', '(%)')]

# output line chart for each data set (GDP and GDP growth)
for plot in plots:

    fig = go.Figure()

    for col in plot[0].columns[1:6]:
        fig.add_trace(go.Scatter(x=plot[0]['Year'], y=plot[0][col], mode='lines', name=col))

    fig.update_layout(
        title=f"{plot[1]}",
        yaxis_title=f'{plot[1]} {plot[2]}',
        xaxis_title='Year',
        template='plotly_dark'
    )

    # Show the plot
    fig.show()

In [13]:
# output bar chart for average annual GDP growth by decade
NIC_growth_dec = NIC_gdp_growths.groupby('Decade').mean().drop('Year', axis=1).reset_index()

fig = px.bar(NIC_growth_dec, x='Decade', y=['Brazil', 'China', 'India', 'Rest of Top 10', 'Rest of World'], barmode='group')
fig.update_layout(
    title='Average Annual GDP (%) Growth by Decade',
    yaxis_title='Annual GDP Growth (%)',
    template='plotly_dark',
    legend_title_text='Country'
)
fig.show()

#### Observations

* From 1960 to 1990, the rest of the top 10 has far superior raw GDP growth than the NICs; however, from 1990 to 2014, the GDPs of Brazil and India have increased by amounts comparable to the rest of the top 10 and greatly beyond the rest of the world. On the other hand, China's GDP has increased exponentially since 1990 and surpassed the average GDP of the rest of the top 10 during the 2008 financial crisis.

* In terms of annual average GDP % growth, all the NICs have experienced highly fluctuating annual growth. For example, China experienced high fluctuations ranging from -27% to +20% during the 1960s and early 70s. Interestingly, China's annual GDP % growth has been consistently positive and less fluctuating following the death of long-term dictator Mao Zedong in 1976, potentially indicating more successful development from the new regime(s).

* China and India had lower average annual GDP % growth during the 1960s than the the other groups, but since the 1980s both have had far superior percentage growth, particulalry China whose % growth has consistently been ~ 3 to 4 times that of the rest of the top 10 and the rest of the world.

* During the 2008 crisis, the Brazil and the rest of the top 10 experienced recession; however, the India, China, and the rest of the world (on average) experienced GDP growth thus indicating that their economies were more robust to the challenges of the crisis and/or the ongoing industrizalition in China and India was far greater than any effects of the global crisis.

#### Next Steps

Having established the massive GDP growth, both in current US$ and annual percentage growth, of the NICs relative to the rest of the top 10 and the world, the factors affecting economic growth should be explored, such as:

* Foreign Investment
* Varying importance of different economic sectors
* Changes to Working Age Population

#### Foreign Investment

Foreign investment into goods and services provided by the NICs are likely an important factor contributing to sustained economic growth; thus, analysis of net foreign investment into the NICs could validate this assumption and so should be explored.

In [4]:
for_inv = 'BX.KLT.DINV.WD.GD.ZS'

NIC_for_inv = NIC_comp(for_inv, 'Foreign_Investment')

# remove absent records (none for China until 1982)
NIC_for_inv = NIC_for_inv[NIC_for_inv['Year'] >= 1982].reset_index().drop(columns='index', axis=1)

NIC_for_inv.head()

Country,Year,Brazil,China,India,Rest of Top 10,Rest of World,Decade
0,1982,1.033079,0.211251,0.035293,0.329445,1.324115,1980s
1,1983,0.791424,0.27779,0.00254,0.363358,1.132707,1980s
2,1984,0.762592,0.487442,0.008912,0.227129,0.955586,1980s
3,1985,0.646354,0.539548,0.044841,0.407238,1.091881,1980s
4,1986,0.128665,0.627498,0.046469,0.469328,0.85787,1980s


In [8]:
# output line chart displaying time series data for Foreign Investment

fig = go.Figure()

for col in NIC_for_inv.columns[1:6]:
    fig.add_trace(go.Scatter(x=NIC_for_inv['Year'], y=NIC_for_inv[col], mode='lines', name=col))

fig.update_layout(
    title="Foreign Investment (% of GDP)",
    yaxis_title='Foreign Investment (% of GDP)',
    xaxis_title='Year',
    template='plotly_dark'
)

# Show the plot
fig.show()

#### Observations

* Overall, there has been an increase in average foreign investment globally due to the prevalence of modern globalism.

* The 1992-94 surge in foreign investment into China is likely asscoiated with large-scale outsourcing of global manufacturing to China; referring to the GDP time series line graphs, this corresponds with the onset of the expoential growth of China's GDP and so suggests that foreign investment was strongly linked to China's rapid economic development.

* For the majority of the 1982-2014 period, the average foreign investment (% of GDP) of the rest of the world was greater than the other groups, however, the previous GDP graphs demonstrate that the average GDP growth across this period was considerably less for the rest of the world group.

* Relatively large increase in foreign investment (> 5%) in the Rest of World countries since 2005.

Calculating correlation coefficients will provide further insight into the link between foreign investment and GDP growth

In [21]:
# extract post-1982 GDP data (as Foreign Investment data in China begins from 1982)
NIC_growth_1982 = NIC_gdp_growths[NIC_gdp_growths['Year'] >= 1982].reset_index().drop(columns='index')
NIC_gdp_1982 = NIC_gdp[NIC_gdp['Year'] >= 1982].reset_index().drop(columns='index')


# calculate annual percentage change in Foreign Investment for each country/group
for_inv_dol = NIC_gdp_1982.drop(columns=['Year', 'Decade'], axis=1) * NIC_for_inv.drop(columns=['Year', 'Decade'], axis=1)
for_inv_pct = for_inv_dol.pct_change()[1:]

In [22]:
# Create subplots with 2 rows and 3 columns
fig = make_subplots(rows=2, cols=3, vertical_spacing=0.3, horizontal_spacing=0.1)
axis_text_size = dict(size=10)

# Add traces to the subplots

for i, column in enumerate(for_inv_pct.columns):

    if i < 3:
        row = 1
    else:
        row = 2
    if i > 2:
        col = i - 2
    else:
        col = i + 1

    fig.add_trace(go.Scatter(x=for_inv_pct[column], y=NIC_growth_1982[column][1:],
                             mode='markers', name=column), row=row, col=col)
    
fig.update_layout(
    title='Annual GDP % Change as a function of Foreign Investment % Change',
    template='plotly_dark'
)

fig.update_yaxes(
    title_text='Annual % Change in GDP',
    title_font=axis_text_size
)

fig.update_xaxes(
    title_text='Annual % Change in Foreign Investment',
    title_font=axis_text_size
)

fig.show()


In [23]:
# dataframe to store correlation data 
for_inv_corr = pd.DataFrame(columns=['Country', 'Correlation (Linear)', 'P-Value (Linear)', 'Correlation (Non-Linear)', 'P-Value (Non-Linear)'])

# iterate over eahc country/group
for col in for_inv_pct.columns:
    
    # measure linear (pearson) and non-linear (spearman) correlation scores
    linear = pearsonr(NIC_growth_1982[col][1:], for_inv_pct[col])
    non_linear = spearmanr(NIC_growth_1982[col][1:], for_inv_pct[col])

    # create new dataframe country/group correlation metrics then append to overall correlation dataframe
    new_corr = pd.DataFrame([{'Country': col, 'Correlation (Linear)': linear[0], 'P-Value (Linear)': linear[1],
                              'Correlation (Non-Linear)': non_linear[0], 'P-Value (Non-Linear)': non_linear[1]}])
    for_inv_corr = pd.concat([for_inv_corr, new_corr], ignore_index=True)

# output correlation measures
for_inv_corr

Unnamed: 0,Country,Correlation (Linear),P-Value (Linear),Correlation (Non-Linear),P-Value (Non-Linear)
0,Brazil,0.115572,0.528772,0.162757,0.373458
1,China,0.638697,8.4e-05,0.744868,1e-06
2,India,-0.190296,0.296848,-0.258065,0.153857
3,Rest of Top 10,0.447157,0.010291,0.497434,0.003771
4,Rest of World,0.420097,0.016674,0.409091,0.020079


In [15]:
# plot bar charts displaying correlation coefficients and p-values
for chart in ['Linear', 'Non-Linear']:

    fig = px.bar(for_inv_corr, y='Country', x=f'Correlation ({chart})',
                barmode='group', color=f'P-Value ({chart})', orientation='h')
    fig.update_layout(
        title=f'{chart} Correlation between annual % changes in Foreign Investment (US$) and GDP',
        xaxis_title='Correlation Coefficient',
        template='plotly_dark',
    )
    fig.show()

##### Comments

* In Brazil, there is no significant correlation between foreign investment and GDP growth suggesting that much of Brazil's growth was stimulated by internal investment

* In China, there is convincing evidence of correlation between foreign investment and GDP growth with moderately high corrrelation coefficiencts (both linear and non-linear) as well as p-values << 0.05 satisfying the 95% confidence interval (null hypothesis is rejected). These findings support the inference that Chinese economic growth was stimulated by foreign investment starting from the early 1990s

* For India, the correlation measures both suggest negative correlation between foreign investment and GDP growth; however, both p-values >> 0.05 and so the null hypothesis can not be rejected (i.e. the findings are not conclusive).

* The Rest of the Top 10 and the Rest of the World both exhibit moderately positive correlations between foreign investment and GDP growth with p-values < 0.05 such that these results satisfy the 95% confidence interval. Notably, the coeeficients are much smaller than that of China's, indicating that China's GDP growth is the most closely associated with foreign investment, of all the countries/groups analysed.

#### Evolving Composition of the Economy

Development of agricultural and industrial sectors are viable options for governments to pursue when seeking to bolster economic output. Thus, improved outputs from these sectors should be investigated to evaluate any links to GDP growth of NICs.

In [180]:
# indicators to be searched for
codes = {'Agriculture': 'NV.AGR.TOTL.ZS', 'Industry': 'NV.IND.TOTL.ZS',
        'Services': 'NV.SRV.TETC.ZS'}

# dictionary to store corresponding dataframe (for each indicator)
gdp_outputs = codes.copy()

# store dataframe with corresponding indicator key
for name, code in codes.items():
    gdp_outputs[name] = NIC_comp(code, f"{name}_Output")

# separate gdp data by country/group
regions = ['Brazil', 'China', 'India', 'Rest of Top 10', 'Rest of World']
country_outputs = {key: '' for key in regions}

for nation in country_outputs.keys():
    
    nation_data = gdp_outputs['Agriculture'][['Year', 'Decade']]
    
    for name in codes.keys():
        nation_data[name] = gdp_outputs[name][nation]
    
    country_outputs[nation] = nation_data.groupby('Decade').mean().reset_index().drop(columns='Year', axis=1)

In [187]:
# create stacked bar chart displaying GDP composition by sector for each country
for country, output in country_outputs.items():

    # Create the stacked bar chart using Plotly Express
    fig = px.bar(output, x='Decade', y=list(output.columns[1:]), 
                labels={'value': 'Average Percentage of GDP', 'variable': 'Sector'},
                barmode='stack'
                )
    
    fig.update_layout(
        template='plotly_dark',
        title=f'{country} Average GDP Composition by Decade'
    )

    # Show the plot
    fig.show()

##### Observations

* On average, agricultural sector has become a smaller component and the services sector a larger component of GDP across the globe since the 1960s.

* India has experinced the least % decrease in the significance of the agricultural output for overall GDP thus alluding to the relatively high importance of the agricultural sector despite industrialization. Nevertheless, India's Industry sector has generally contributed increasing proportions to GDP, rising from ~ 20% in the 1960s to ~ 30% in the 2010s.

* The contribution of industry to the GDP of China and the Rest of the World has remianed approximately constant (~ 45% and ~ 28% respectively) since the 1970s. The services sector in China was steadily approached the significance of industry and, in the 2010s, is contributes approximately the same to GDP as industry.

* The Rest of the Top 10's GDPs are generally dominated by the Services sector, with increasingly less contribution from Industry. Since general Industry (e.g. manufacturing, energy etc.) demands have increased, the resolution is the outsourcing of Industry to other areas such as China, India, and the Rest of the World were Industry has either remained prevalent or grown.

* The smaller significance of China and India's Services sector to their respective economies, compared to the other groups, is a likely reason why their GDPs rose during the 2008 crisis since their economies are more varied and therefore less dependent on the Services sector that was damaged extensively during the crisis.

#### Effect of Working Population on Economic Growth

Social factors may also be significant to economic development; one such factor is the number/proportion of working age population within each country. Whilst labor laws vary from nation to nation regarding working ages, the broad definition applied by the World Bank is 15 to 64 years; this definition will be applied throughout the analysis.

In [11]:
# obtain working population proportion data from database
working_population = NIC_comp('SP.POP.1564.TO.ZS', 'Goods_Imports')

# display line chart displaying time series data for working population proportion
fig = go.Figure()

for col in working_population.columns[1:6]:
    fig.add_trace(go.Scatter(x=working_population['Year'], y=working_population[col], mode='lines', name=col))

fig.update_layout(
    title="Working Age (15 - 64 years) Fraction of Total Population",
    yaxis_title='% of Population',
    xaxis_title='Year',
    template='plotly_dark'
)

# Show the plot
fig.show()

##### Observations

* Since 1960, there has been a general increase in working age population proportion across all groups other than the Rest of the Top 10.

* The increase in the other groups is likely attributed to growing populations, improvements to healthcare, shift towards less physical (and dangerous) occupations.

* For the Rest of the Top 10, there has been a decrease in working age populaltion proportion since the 1988 due to a combination of declining birth rates and superior healthcare (increasing elderly population).

The next step is to determine whether there is a correlation between the annual percentage changes in working population and GDP, such that the economic significance of each worker can be ascertained.

In [12]:
# calculate annual percentage chnage in working population numbers
total_pop = NIC_comp('SP.POP.TOTL', 'Working_Pop').drop(columns=['Year', 'Decade'], axis=1)
pop_work = total_pop * working_population.drop(columns=['Year', 'Decade'], axis=1)
pop_change = pop_work.pct_change()[1:]

# dataframe to store correlation data
pop_corr = pd.DataFrame(columns=['Country', 'Correlation (Linear)', 'P-Value (Linear)', 'Correlation (Non-Linear)', 'P-Value (Non-Linear)'])

# iterate over each country/group
for col in pop_change.columns:
    
    # calculate correlation scores
    linear = pearsonr(NIC_gdp_growths[col], pop_change[col])
    non_linear = spearmanr(NIC_gdp_growths[col], pop_change[col])

    # store country/group correlation scores in dataframe
    new_corr = pd.DataFrame([{'Country': col, 'Correlation (Linear)': linear[0], 'P-Value (Linear)': linear[1],
                              'Correlation (Non-Linear)': non_linear[0], 'P-Value (Non-Linear)': non_linear[1]}])
    pop_corr = pd.concat([pop_corr, new_corr], ignore_index=True)

# output dataframe
pop_corr

Unnamed: 0,Country,Correlation (Linear),P-Value (Linear),Correlation (Non-Linear),P-Value (Non-Linear)
0,Brazil,0.38861,0.003685,0.444254,0.000765
1,China,0.344193,0.010819,0.044406,0.749848
2,India,-0.104876,0.450413,-0.142138,0.305233
3,Rest of Top 10,0.488964,0.000176,0.519649,5.7e-05
4,Rest of World,-0.018199,0.896079,-0.030608,0.826097


In [17]:
# Create subplots with 2 rows and 3 columns
fig = make_subplots(rows=2, cols=3, vertical_spacing=0.3, horizontal_spacing=0.1)
axis_text_size = dict(size=10)

# Add scatter plot traces to the subplots
for i, column in enumerate(pop_change.columns):

    # conditional statements determine row and column index
    if i < 3:
        row = 1
    else:
        row = 2
    if i > 2:
        col = i - 2
    else:
        col = i + 1

    fig.add_trace(go.Scatter(x=pop_change[column], y=NIC_gdp_growths[column],
                             mode='markers', name=column), row=row, col=col)
    
fig.update_layout(
    title='Annual GDP % Change as a function of Working Population % change',
    template='plotly_dark'
)

# set y-axis title parameters
fig.update_yaxes(
    title_text='Annual % Change in GDP',
    title_font=axis_text_size
)

# set x-axis title parameters
fig.update_xaxes(
    title_text='Annual % Change in Working Population',
    title_font=axis_text_size
)

fig.show()


In [24]:
# plot bar chart displaying correlation metrics
for chart in ['Linear', 'Non-Linear']:

    fig = px.bar(pop_corr, y='Country', x=f'Correlation ({chart})',
                barmode='group', color=f'P-Value ({chart})', orientation='h')
    fig.update_layout(
        title=f'{chart} Correlation between annual % changes in Working Age Population and GDP',
        xaxis_title='Correlation Coefficient',
        template='plotly_dark',
    )
    fig.show()

##### Comments

* India and the Rest of World demonstrate weak negative correlation between % change to working population and annual GDP % change; however, their p-values (both linear and non-linear) are >> 0.5 and so the null hypothesis can not be rejected such that the results are not conclusive.

* In China, Brazil, and the Rest of the Top 10, there is very strong evidence for positive linear correlations between changes to working age population and GDP; interestingly, the correlation coefficient is higher for the Rest of the Top 10 than China or Brazil indicating stronger link between working population and GDP growth (and perhaps greater economic value added per worker)