INTRODUCTION:

Are there any indicators that could help us predict the economic status of a country? My goal is to analyze the past 15-25 years worth of data regarding the GDP growth, unemployment rate and inflation rate of different areas of the world and see if there were any underlying indicators behind major recessions that could help us predict future recessions using machine learning. I would also like to explore the idea of picking statistics that aren't intuitively related and want to find out if there exists a correlation between the two. Furthermore with more information going around about how the education level around the world is going down around the world ever since covid, I want to study if it has a potential effect on the economy of the world. 

I plan to use the world bank api to gather information regarding the economic status of different countries in the world which will help me understand the trends in modern history. Below I demonstrate my ability to read through the world bank api and it's not very difficult to use. 

In [60]:
import requests
import pandas as pd

# merged_df = pd.DataFrame()

# Define indicators
gdp_indicator = "NY.GDP.MKTP.KD.ZG"  # GDP Growth (%)
unemployment_indicator = "SL.UEM.TOTL.ZS"  # Unemployment Rate (%)
inflation_indicator = "FP.CPI.TOTL.ZG"  # Inflation Rate (%)
fdi_indicator = "BX.KLT.DINV.WD.GD.ZS" # FDI Amount ($)

years = list(range(2000, 2023))

# Function that gets all data from World Bank API
def fetch_data(indicator):
    url = f"http://api.worldbank.org/v2/country/all/indicator/{indicator}?date={years[0]}:{years[-1]}&format=json&per_page=1000"
    response = requests.get(url)
    data = response.json()

    return pd.DataFrame([
        {"Country": entry["country"]["value"],
         "Code": entry["country"]["id"],
         "Year": entry["date"],
         "Value": entry["value"]}
        for entry in data[1]
    ]) if isinstance(data, list) and len(data) > 1 else pd.DataFrame()

# Get data for each indicator
gdp_df = fetch_data(gdp_indicator).rename(columns={"Value": "GDP Growth (%)"})
unemployment_df = fetch_data(unemployment_indicator).rename(columns={"Value": "Unemployment Rate (%)"})
inflation_df = fetch_data(inflation_indicator).rename(columns={"Value": "Inflation Rate (%)"})
fdi_df = fetch_data(fdi_indicator).rename(columns={"Value": "Foreign Direct Investment"})


 # Merge all the coloumns
merged_df = gdp_df.merge(unemployment_df, on=["Country","Code", "Year"], how="outer")
merged_df = merged_df.merge(inflation_df, on=["Country", "Code", "Year"], how="outer")
merged_df = merged_df.merge(fdi_df, on=["Country", "Code","Year"], how="outer")


# Save to CSV
merged_df.to_csv("foundationsOfDataScienceProject.csv")

merged_df[: 69]

Unnamed: 0,Country,Code,Year,GDP Growth (%),Unemployment Rate (%),Inflation Rate (%),Foreign Direct Investment
0,Africa Eastern and Southern,ZH,2022,3.553878,7.985202,10.773751,1.695914
1,Africa Eastern and Southern,ZH,2021,4.576393,8.577385,7.240978,5.012059
2,Africa Eastern and Southern,ZH,2020,-2.864293,8.191395,5.405162,1.361762
3,Africa Eastern and Southern,ZH,2019,2.194319,7.584419,4.653665,1.424519
4,Africa Eastern and Southern,ZH,2018,2.666632,7.360513,4.720805,1.272290
...,...,...,...,...,...,...,...
64,Arab World,1A,2004,9.076593,11.433002,3.632280,1.199791
65,Arab World,1A,2003,4.281254,12.456351,2.712592,1.144935
66,Arab World,1A,2002,1.229342,12.512530,1.832994,0.666657
67,Arab World,1A,2001,2.103103,12.505882,1.772204,0.693799


In [61]:
# Creating Side by Side Box plots for the GDP Growth of 8 countries.
# Countries Code include: 'ZH', 'ZI', '1A', 'S3', 'B8', 'V2', 'Z4', '4E'

import plotly.express as px
import matplotlib.pyplot as plt

codes = ['ZH', 'ZI', '1A', 'EU', 'B8', 'V2', 'Z4', '4E']
filtered_data = merged_df[merged_df['Code'].isin(codes)]

# filtered_data.head(69)

fig = px.box(filtered_data, 
             x = 'Country',
             y = 'GDP Growth (%)',
             title = 'Distribution of GDP Growth (%) by Country',
             labels = {'x': 'Country', 'y': 'GDP Growth (%)'},
             color = 'Country',
             )
fig.update_layout(
    plot_bgcolor = 'white')

fig.show()

The above data set is mostly clean and I hope in the future to continue adding more coloumns with more statstics. I hope to use machine learning to predict all of the statistics but 1, 5, and 10 years into the future. I have plenty of numerical features from GDP to unemployment rate to inflation rate and for categorical data I have the country as well as the year. I believe I can predict the values into the future using the machine learning method of logistic regression. Although I am not familiar with this machine learning method, I would like to explore the idea of using neural networks to hopefully categorize the different countries and group them together based on their economic statistics. 