# A study on global suicide rates and the various socio-economic and political factors that affect it

Authors:
   - Betül Yurtman
   - Sagar Kumar

Abstract: *"Executive Summary. For the final version: Write an executive summary here. Write it when you are mostly finished with the main report. It should summarize the question, with what data and how you answer it, and what the result is."*

## Introduction

Suicide is a complex global problem that affects numerous individuals, families and communities. It is often a leading cause of preventable deaths globally and is usually influenced by a mix of personal, socio-economic, and govermental factors. In order to garner efficient and effective public health policies, it is important to understand the driving factors for high and low suicide rates. 

This report will delve into global suicide rates and how they differ across various countries, years and genders. We will explore the complex relationship between suicide rates and socio-economic factors like GDP per capita, literaly rate, unemployment rate, political stability, freedom of press and more.  This report will be guided by three leading questions:

	1. How do suicide rates differ across countries and over time, and what factors contribute to gender disparities?
	2. What roles do literacy rates and unemployment play in influencing suicide rates globally?
	3. How do socio-economic and governance factors correlate with suicide rates?

The raw data was collected from the world bank data bank. The datasets used were Gender Statistics and Environment Social and Governance (ESG). The Gender Statistics dataset includes variables like Population, GDP per capita, Human Capital Index (HCI), suicide and literacy rate and more. The ESG dataset includes further values like the Economic and Social Right performance score, unemployment rate, political stability, free press and more. We combined the data based on the country and year and applied some pre-processing in order to be able to handle and analyze the data better.

----A line or two here about predictions---

In [18]:
# Necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_rows', 1500)

## Data

In [19]:
# Import Data
# gs = Gender Statistics
data_gs_ = pd.read_excel("P_Data_Extract_From_Gender_Statistics.xlsx")
# esg = Environment Social and Governance
data_esg_ = pd.read_excel("P_Data_Extract_From_Environment_Social_and_Governance_(ESG)_Data.xlsx")

In [20]:
# We wil use the copy of the dataset to be able to read the dataset from the file just once.
# If we want to go back to the original version of the dataset, we can just run this part. 
data_gs = data_gs_.copy()
data_esg = data_esg_.copy()

In [21]:
data_gs

Unnamed: 0,Time,Time Code,Country Name,Country Code,"Population, female [SP.POP.TOTL.FE.IN]","Population, male [SP.POP.TOTL.MA.IN]","Population, total [SP.POP.TOTL]",GDP (current US$) [NY.GDP.MKTP.CD],GDP per capita (constant 2010 US$) [NY.GDP.PCAP.KD],"Inflation, consumer prices (annual %) [FP.CPI.TOTL.ZG]",Human Capital Index (HCI) (scale 0-1) [HD.HCI.OVRL],"Human Capital Index (HCI), Male (scale 0-1) [HD.HCI.OVRL.MA]","Human Capital Index (HCI), Female (scale 0-1) [HD.HCI.OVRL.FE]","Suicide mortality rate (per 100,000 population) [SH.STA.SUIC.P5]","Suicide mortality rate, female (per 100,000 female population) [SH.STA.SUIC.FE.P5]","Suicide mortality rate, male (per 100,000 male population) [SH.STA.SUIC.MA.P5]","Literacy rate, adult female (% of females ages 15 and above) [SE.ADT.LITR.FE.ZS]","Literacy rate, adult male (% of males ages 15 and above) [SE.ADT.LITR.MA.ZS]","Literacy rate, adult total (% of people ages 15 and above) [SE.ADT.LITR.ZS]"
0,2014,YR2014,Afghanistan,AFG,16172321.0,16543889.0,32716210.0,20497128555.697231,576.487817,4.673996,..,..,..,3.9,3.6,4.2,..,..,..
1,2014,YR2014,Albania,ALB,1439863.0,1449241.0,2889104.0,13228147516.116798,3855.760744,1.625865,..,..,..,5,3.2,6.6,..,..,..
2,2014,YR2014,Algeria,DZA,19004433.0,19755734.0,38760168.0,238942664192.589996,4687.288575,2.916927,..,..,..,2.8,2,3.5,..,..,..
3,2014,YR2014,American Samoa,ASM,26017.0,26199.0,52217.0,643000000,12494.980211,..,..,..,..,..,..,..,..,..,..
4,2014,YR2014,Andorra,AND,35473.0,36148.0,71621.0,3271685596.663211,38402.649261,..,..,..,..,..,..,..,..,..,..
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2170,,,,,,,,,,,,,,,,,,,
2171,,,,,,,,,,,,,,,,,,,
2172,,,,,,,,,,,,,,,,,,,
2173,Data from database: Gender Statistics,,,,,,,,,,,,,,,,,,


In [22]:
data_esg

Unnamed: 0,Time,Time Code,Country Name,Country Code,Economic and Social Rights Performance Score [SD.ESR.PERF.XQ],Voice and Accountability: Estimate [VA.EST],"Unemployment, total (% of total labor force) (modeled ILO estimate) [SL.UEM.TOTL.ZS]","School enrollment, primary (% gross) [SE.PRM.ENRR]",Access to clean fuels and technologies for cooking (% of population) [EG.CFT.ACCS.ZS],Government Effectiveness: Estimate [GE.EST],Political Stability and Absence of Violence/Terrorism: Estimate [PV.EST],Strength of legal rights index (0=weak to 12=strong) [IC.LGL.CRED.XQ]
0,2010,YR2010,Afghanistan,AFG,1.76468,-1.404467,7.921,102.903442,19.7,-1.478316,-2.579152,..
1,2010,YR2010,Albania,ALB,2.273091,0.123838,14.09,108.494461,65.8,-0.279453,-0.191483,..
2,2010,YR2010,Algeria,DZA,2.284476,-1.022331,9.96,113.377953,99.2,-0.395902,-1.259368,..
3,2010,YR2010,Andorra,AND,..,1.324308,..,90.115562,100,1.512716,1.278272,..
4,2010,YR2010,Angola,AGO,1.428207,-1.120525,16.551,111.872879,44.7,-1.138268,-0.226182,..
...,...,...,...,...,...,...,...,...,...,...,...,...
1930,,,,,,,,,,,,
1931,,,,,,,,,,,,
1932,,,,,,,,,,,,
1933,Data from database: Environment Social and Gov...,,,,,,,,,,,


In [23]:
# Remove the last 5 rows from both datasets, there are some empty rows and explanations about datasets at the end.
data_gs = data_gs.iloc[:-5, :]  # Exclude the last 5 rows
data_esg = data_esg.iloc[:-5, :]  # Exclude the last 5 rows

In [24]:
# Merging the two datasets.
data = pd.merge(
    data_gs,
    data_esg,
    on=["Country Name", "Time"],
    how="inner"
)

# Drop duplicate columns
columns_to_drop = ["Time Code_x", "Time Code_y", "Country Code_x", "Country Code_y"]
data.drop(columns=columns_to_drop, inplace=True)

In [25]:
data

Unnamed: 0,Time,Country Name,"Population, female [SP.POP.TOTL.FE.IN]","Population, male [SP.POP.TOTL.MA.IN]","Population, total [SP.POP.TOTL]",GDP (current US$) [NY.GDP.MKTP.CD],GDP per capita (constant 2010 US$) [NY.GDP.PCAP.KD],"Inflation, consumer prices (annual %) [FP.CPI.TOTL.ZG]",Human Capital Index (HCI) (scale 0-1) [HD.HCI.OVRL],"Human Capital Index (HCI), Male (scale 0-1) [HD.HCI.OVRL.MA]","Human Capital Index (HCI), Female (scale 0-1) [HD.HCI.OVRL.FE]","Suicide mortality rate (per 100,000 population) [SH.STA.SUIC.P5]","Suicide mortality rate, female (per 100,000 female population) [SH.STA.SUIC.FE.P5]","Suicide mortality rate, male (per 100,000 male population) [SH.STA.SUIC.MA.P5]","Literacy rate, adult female (% of females ages 15 and above) [SE.ADT.LITR.FE.ZS]","Literacy rate, adult male (% of males ages 15 and above) [SE.ADT.LITR.MA.ZS]","Literacy rate, adult total (% of people ages 15 and above) [SE.ADT.LITR.ZS]",Economic and Social Rights Performance Score [SD.ESR.PERF.XQ],Voice and Accountability: Estimate [VA.EST],"Unemployment, total (% of total labor force) (modeled ILO estimate) [SL.UEM.TOTL.ZS]","School enrollment, primary (% gross) [SE.PRM.ENRR]",Access to clean fuels and technologies for cooking (% of population) [EG.CFT.ACCS.ZS],Government Effectiveness: Estimate [GE.EST],Political Stability and Absence of Violence/Terrorism: Estimate [PV.EST],Strength of legal rights index (0=weak to 12=strong) [IC.LGL.CRED.XQ]
0,2014,Afghanistan,16172321.0,16543889.0,32716210.0,20497128555.697231,576.487817,4.673996,..,..,..,3.9,3.6,4.2,..,..,..,1.943449,-1.13544,7.91,109.115517,25.7,-1.359305,-2.411068,9
1,2014,Albania,1439863.0,1449241.0,2889104.0,13228147516.116798,3855.760744,1.625865,..,..,..,5,3.2,6.6,..,..,..,2.340072,0.143777,18.05,114.04332,74.4,-0.048918,0.485986,6
2,2014,Algeria,19004433.0,19755734.0,38760168.0,238942664192.589996,4687.288575,2.916927,..,..,..,2.8,2,3.5,..,..,..,2.312768,-0.813358,10.21,111.686699,99.5,-0.339202,-1.190535,2
3,2014,Andorra,35473.0,36148.0,71621.0,3271685596.663211,38402.649261,..,..,..,..,..,..,..,..,..,..,..,1.165965,..,88.235291,100,1.712283,1.286593,..
4,2014,Angola,13746371.0,13381965.0,27128337.0,135966802586.713196,3304.681148,7.280387,..,..,..,6.4,2.4,10.4,53.407211,79.974152,66.030113,1.565973,-1.14523,16.317,..,46.7,-1.055904,-0.333232,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1925,2010,"Venezuela, RB",14393135.0,14321888.0,28715022.0,393192354510.653076,..,28.187465,..,..,..,3.4,1,5.8,..,..,..,..,-0.870534,7.11,101.070229,96.8,-1.102203,-1.256237,..
1926,2010,Viet Nam,44362702.0,43048310.0,87411012.0,147201173196.97876,2028.605713,9.207466,0.656642,0.634342,0.680296,7,4.2,9.8,..,..,..,2.336811,-1.496968,1.11,99.457329,51.2,-0.242955,0.148408,..
1927,2010,"Yemen, Rep.",12224951.0,12518994.0,24743946.0,30906749533.221001,2547.640383,11.174834,..,..,..,5.7,4.6,6.8,..,..,..,..,-1.331231,12.793,84.01722,59.7,-1.031832,-2.423716,..
1928,2010,Zambia,7026189.0,6765898.0,13792086.0,20265559483.854828,1198.304817,8.501761,..,..,..,10.1,5.4,15,77.746643,88.684029,83.007668,1.619451,-0.243015,13.19,110.654739,16,-0.86207,0.515351,..


In [26]:
# Column names and new names
column_mapping = {
    "Population, female [SP.POP.TOTL.FE.IN]": "Female Population",
    "Population, male [SP.POP.TOTL.MA.IN]": "Male Population",
    "Population, total [SP.POP.TOTL]": "Total Population",
    "GDP (current US$) [NY.GDP.MKTP.CD]": "GDP (Current US$)",
    "GDP per capita (constant 2010 US$) [NY.GDP.PCAP.KD]": "GDP per Capita (2010 US$)",
    "Inflation, consumer prices (annual %) [FP.CPI.TOTL.ZG]": "Inflation (Annual %)",
    "Human Capital Index (HCI) (scale 0-1) [HD.HCI.OVRL]": "Human Capital Index",
    "Human Capital Index (HCI), Male (scale 0-1) [HD.HCI.OVRL.MA]": "Human Capital Index (Male)",
    "Human Capital Index (HCI), Female (scale 0-1) [HD.HCI.OVRL.FE]": "Human Capital Index (Female)",
    "Suicide mortality rate (per 100,000 population) [SH.STA.SUIC.P5]": "Suicide Rate (Total)",
    "Suicide mortality rate, female (per 100,000 female population) [SH.STA.SUIC.FE.P5]": "Suicide Rate (Female)",
    "Suicide mortality rate, male (per 100,000 male population) [SH.STA.SUIC.MA.P5]": "Suicide Rate (Male)",
    "Literacy rate, adult female (% of females ages 15 and above) [SE.ADT.LITR.FE.ZS]": "Literacy Rate (Female)",
    "Literacy rate, adult male (% of males ages 15 and above) [SE.ADT.LITR.MA.ZS]": "Literacy Rate (Male)",
    "Literacy rate, adult total (% of people ages 15 and above) [SE.ADT.LITR.ZS]": "Literacy Rate (Total)",
    "Economic and Social Rights Performance Score [SD.ESR.PERF.XQ]": "Economic Rights Score",
    "Voice and Accountability: Estimate [VA.EST]": "Voice and Accountability",
    "Unemployment, total (% of total labor force) (modeled ILO estimate) [SL.UEM.TOTL.ZS]": "Unemployment Rate",
    "School enrollment, primary (% gross) [SE.PRM.ENRR]": "Primary School Enrollment",
    "Access to clean fuels and technologies for cooking (% of population) [EG.CFT.ACCS.ZS]": "Clean Fuel Access",
    "Government Effectiveness: Estimate [GE.EST]": "Government Effectiveness",
    "Political Stability and Absence of Violence/Terrorism: Estimate [PV.EST]": "Political Stability",
    "Strength of legal rights index (0=weak to 12=strong) [IC.LGL.CRED.XQ]": "Legal Rights Index"
}

data.rename(columns=column_mapping, inplace=True)

data.head()

Unnamed: 0,Time,Country Name,Female Population,Male Population,Total Population,GDP (Current US$),GDP per Capita (2010 US$),Inflation (Annual %),Human Capital Index,Human Capital Index (Male),Human Capital Index (Female),Suicide Rate (Total),Suicide Rate (Female),Suicide Rate (Male),Literacy Rate (Female),Literacy Rate (Male),Literacy Rate (Total),Economic Rights Score,Voice and Accountability,Unemployment Rate,Primary School Enrollment,Clean Fuel Access,Government Effectiveness,Political Stability,Legal Rights Index
0,2014,Afghanistan,16172321.0,16543889.0,32716210.0,20497128555.69723,576.487817,4.673996,..,..,..,3.9,3.6,4.2,..,..,..,1.943449,-1.13544,7.91,109.115517,25.7,-1.359305,-2.411068,9
1,2014,Albania,1439863.0,1449241.0,2889104.0,13228147516.116798,3855.760744,1.625865,..,..,..,5,3.2,6.6,..,..,..,2.340072,0.143777,18.05,114.04332,74.4,-0.048918,0.485986,6
2,2014,Algeria,19004433.0,19755734.0,38760168.0,238942664192.59,4687.288575,2.916927,..,..,..,2.8,2,3.5,..,..,..,2.312768,-0.813358,10.21,111.686699,99.5,-0.339202,-1.190535,2
3,2014,Andorra,35473.0,36148.0,71621.0,3271685596.663211,38402.649261,..,..,..,..,..,..,..,..,..,..,..,1.165965,..,88.235291,100.0,1.712283,1.286593,..
4,2014,Angola,13746371.0,13381965.0,27128337.0,135966802586.7132,3304.681148,7.280387,..,..,..,6.4,2.4,10.4,53.407211,79.974152,66.030113,1.565973,-1.14523,16.317,..,46.7,-1.055904,-0.333232,1


In [27]:
# This part is for reproducibility. If someone downloads the data in a different column order, this way they will have the same column order with the report.
# We can also use this part to drop some columns. We dropped Country Code and Time Code here.
fixed_order = [
    "Time",
    "Country Name",
    "Female Population",
    "Male Population",
    "Total Population",
    "GDP (Current US$)",
    "GDP per Capita (2010 US$)",
    "Inflation (Annual %)",
    "Human Capital Index",
    "Human Capital Index (Male)",
    "Human Capital Index (Female)",
    "Suicide Rate (Total)",
    "Suicide Rate (Female)",
    "Suicide Rate (Male)",
    "Literacy Rate (Female)",
    "Literacy Rate (Male)",
    "Literacy Rate (Total)",
    "Primary School Enrollment",
    "Unemployment Rate",
    "Economic Rights Score",
    "Legal Rights Index",
    "Voice and Accountability",
    "Clean Fuel Access",
    "Government Effectiveness",
    "Political Stability",
]

data = data[fixed_order]

data.head()

Unnamed: 0,Time,Country Name,Female Population,Male Population,Total Population,GDP (Current US$),GDP per Capita (2010 US$),Inflation (Annual %),Human Capital Index,Human Capital Index (Male),Human Capital Index (Female),Suicide Rate (Total),Suicide Rate (Female),Suicide Rate (Male),Literacy Rate (Female),Literacy Rate (Male),Literacy Rate (Total),Primary School Enrollment,Unemployment Rate,Economic Rights Score,Legal Rights Index,Voice and Accountability,Clean Fuel Access,Government Effectiveness,Political Stability
0,2014,Afghanistan,16172321.0,16543889.0,32716210.0,20497128555.69723,576.487817,4.673996,..,..,..,3.9,3.6,4.2,..,..,..,109.115517,7.91,1.943449,9,-1.13544,25.7,-1.359305,-2.411068
1,2014,Albania,1439863.0,1449241.0,2889104.0,13228147516.116798,3855.760744,1.625865,..,..,..,5,3.2,6.6,..,..,..,114.04332,18.05,2.340072,6,0.143777,74.4,-0.048918,0.485986
2,2014,Algeria,19004433.0,19755734.0,38760168.0,238942664192.59,4687.288575,2.916927,..,..,..,2.8,2,3.5,..,..,..,111.686699,10.21,2.312768,2,-0.813358,99.5,-0.339202,-1.190535
3,2014,Andorra,35473.0,36148.0,71621.0,3271685596.663211,38402.649261,..,..,..,..,..,..,..,..,..,..,88.235291,..,..,..,1.165965,100.0,1.712283,1.286593
4,2014,Angola,13746371.0,13381965.0,27128337.0,135966802586.7132,3304.681148,7.280387,..,..,..,6.4,2.4,10.4,53.407211,79.974152,66.030113,..,16.317,1.565973,1,-1.14523,46.7,-1.055904,-0.333232


In [28]:
# Dataset Overview to analyse the data types, shape of the dataset and unique value count for each variable. 
def dataset_overview(dataframe):
    """
    Generates a summary of the dataset, including:
    - Total number of rows and columns / Shape of the dataset
    - Unique values per column
    - Data types for each column
    """
    # Shape of the dataset
    rows_columns = dataframe.shape
    # Unique value count for each variable
    unique_counts = dataframe.nunique()
    # Datatypes
    data_types = dataframe.dtypes
    
    # DataFrame for the results
    overview = pd.DataFrame({
        "Unique Values": unique_counts,
        "Data Type": data_types
    })
    
    print(f"Shape: {rows_columns}")
    return overview

overview = dataset_overview(data)

overview

Shape: (1930, 25)


Unnamed: 0,Unique Values,Data Type
Time,10,object
Country Name,193,object
Female Population,1927,float64
Male Population,1929,float64
Total Population,1930,float64
GDP (Current US$),1903,object
GDP per Capita (2010 US$),1887,object
Inflation (Annual %),1774,object
Human Capital Index,398,object
Human Capital Index (Male),347,object


In [29]:
columns_to_convert = [
    "GDP (Current US$)", "GDP per Capita (2010 US$)", "Inflation (Annual %)",
    "Human Capital Index", "Human Capital Index (Male)", "Human Capital Index (Female)",
    "Suicide Rate (Total)", "Suicide Rate (Female)", "Suicide Rate (Male)",
    "Literacy Rate (Female)", "Literacy Rate (Male)", "Literacy Rate (Total)",
    "Economic Rights Score", "Unemployment Rate", "Primary School Enrollment",
    "Clean Fuel Access", "Government Effectiveness", "Political Stability", "Legal Rights Index"
]

# Converting the numerical columns
for col in columns_to_convert:
    data[col] = pd.to_numeric(data[col], errors='coerce')

In [30]:
# Missing value analysis
def missing_value_summary(dataframe):
    """
    Generates a summary of missing values for the DataFrame.
    Returns a DataFrame showing the total missing values and percentage of missing values for each column.
    """
    # Total missing value count
    missing_counts = dataframe.isna().sum()
    # The ratio of missing values ​​to the total data set
    missing_percent = (missing_counts / len(dataframe)) * 100
    # DataFrame for the results
    missing_summary = pd.DataFrame({
        "Missing Values": missing_counts,
        "Percentage (%)": missing_percent
    })
    # Exclude columns with no missing values ​​and sort by missing values
    missing_summary = missing_summary[missing_summary["Missing Values"] > 0].sort_values(by="Missing Values", ascending=False)

    return missing_summary

missing_summary = missing_value_summary(data)

# Table for missing values
missing_summary


Unnamed: 0,Missing Values,Percentage (%)
Human Capital Index (Male),1568,81.243523
Human Capital Index (Female),1567,81.19171
Literacy Rate (Female),1556,80.621762
Literacy Rate (Male),1556,80.621762
Literacy Rate (Total),1556,80.621762
Human Capital Index,1513,78.393782
Legal Rights Index,635,32.901554
Primary School Enrollment,324,16.787565
Economic Rights Score,295,15.284974
Inflation (Annual %),157,8.134715


In [31]:
# Filling missing values with the average value for each country, if they have any non-missing value for the variable.
def fill_missing_values_by_country_mean(dataframe, column, country_column, year_column):
    """
    Fill missing values for a given column using the country-specific mean of the available data.
    If no data is available for a country, the function skips filling.
    """
    for country in dataframe[country_column].unique():
        # Filter data for the specific country
        country_data = dataframe[dataframe[country_column] == country]
        # Check if the column has at least one non-missing value for the country
        if not country_data[column].isna().all():
            mean_value = country_data[column].mean()
            # Fill missing values for that country with the mean value
            dataframe.loc[
                (dataframe[country_column] == country) & (dataframe[column].isna()),
                column
            ] = mean_value
        #else 
            # We can use this part to fill the missing values for the countries that has no non-missing value for the variables.

# Define which columns to fill
columns_to_fill = [
    "GDP (Current US$)", "GDP per Capita (2010 US$)", "Inflation (Annual %)",
    "Human Capital Index", "Human Capital Index (Male)", "Human Capital Index (Female)",
    "Suicide Rate (Total)", "Suicide Rate (Female)", "Suicide Rate (Male)",
    "Literacy Rate (Female)", "Literacy Rate (Male)", "Literacy Rate (Total)",
    "Economic Rights Score", "Unemployment Rate", "Primary School Enrollment",
    "Clean Fuel Access", "Government Effectiveness", "Political Stability", "Legal Rights Index"
]

# Fill missing values
for col in columns_to_fill:
    fill_missing_values_by_country_mean(data, col, country_column="Country Name", year_column="Time")

data.head()

Unnamed: 0,Time,Country Name,Female Population,Male Population,Total Population,GDP (Current US$),GDP per Capita (2010 US$),Inflation (Annual %),Human Capital Index,Human Capital Index (Male),Human Capital Index (Female),Suicide Rate (Total),Suicide Rate (Female),Suicide Rate (Male),Literacy Rate (Female),Literacy Rate (Male),Literacy Rate (Total),Primary School Enrollment,Unemployment Rate,Economic Rights Score,Legal Rights Index,Voice and Accountability,Clean Fuel Access,Government Effectiveness,Political Stability
0,2014,Afghanistan,16172321.0,16543889.0,32716210.0,20497130000.0,576.487817,4.673996,0.391245,0.400011,0.363431,3.9,3.6,4.2,17.017839,45.417099,31.448851,109.115517,7.91,1.943449,9.0,-1.13544,25.7,-1.359305,-2.411068
1,2014,Albania,1439863.0,1449241.0,2889104.0,13228150000.0,3855.760744,1.625865,0.597756,0.576212,0.621578,5.0,3.2,6.6,95.913776,98.180386,97.046135,114.04332,18.05,2.340072,6.0,0.143777,74.4,-0.048918,0.485986
2,2014,Algeria,19004433.0,19755734.0,38760168.0,238942700000.0,4687.288575,2.916927,0.528759,0.512211,0.546662,2.8,2.0,3.5,75.322968,87.422958,81.407837,111.686699,10.21,2.312768,2.0,-0.813358,99.5,-0.339202,-1.190535
3,2014,Andorra,35473.0,36148.0,71621.0,3271686000.0,38402.649261,,,,,,,,,,,88.235291,,,,1.165965,100.0,1.712283,1.286593
4,2014,Angola,13746371.0,13381965.0,27128337.0,135966800000.0,3304.681148,7.280387,0.360595,0.365902,0.355888,6.4,2.4,10.4,53.407211,79.974152,66.030113,112.516214,16.317,1.565973,1.0,-1.14523,46.7,-1.055904,-0.333232


In [32]:
# Let's see the variables that still has missing data.
missing_summary = missing_value_summary(data)

# Table for missing values
missing_summary

Unnamed: 0,Missing Values,Percentage (%)
Literacy Rate (Female),550,28.497409
Literacy Rate (Male),550,28.497409
Literacy Rate (Total),550,28.497409
Human Capital Index (Male),400,20.725389
Human Capital Index (Female),400,20.725389
Human Capital Index,300,15.544041
Unemployment Rate,150,7.772021
Economic Rights Score,110,5.699482
Suicide Rate (Male),100,5.181347
Inflation (Annual %),100,5.181347


**Including the continents using "Country Name" column.**

In [33]:
# Continent mapping
continent_mapping = {
    "Africa": [
        "Algeria", "Angola", "Benin", "Botswana", "Burkina Faso", "Burundi", "Cabo Verde",
        "Cameroon", "Central African Republic", "Chad", "Comoros", "Congo, Dem. Rep.",
        "Congo, Rep.", "Djibouti", "Egypt, Arab Rep.", "Equatorial Guinea", "Eritrea",
        "Eswatini", "Ethiopia", "Gabon", "Gambia, The", "Ghana", "Guinea", "Guinea-Bissau",
        "Kenya", "Lesotho", "Liberia", "Libya", "Madagascar", "Malawi", "Mali",
        "Mauritania", "Mauritius", "Morocco", "Mozambique", "Namibia", "Niger",
        "Nigeria", "Rwanda", "Senegal", "Seychelles", "Sierra Leone", "Somalia", "South Africa",
        "South Sudan", "Sudan", "Tanzania", "Togo", "Tunisia", "Uganda", "Zambia", "Zimbabwe",
        "Cote d'Ivoire", "Sao Tome and Principe"
    ],
    "Asia": [
        "Afghanistan", "Armenia", "Azerbaijan", "Bahrain", "Bangladesh", "Bhutan", "Brunei Darussalam",
        "Cambodia", "China", "Cyprus", "Georgia", "India", "Indonesia", "Iran, Islamic Rep.",
        "Iraq", "Israel", "Japan", "Jordan", "Kazakhstan", "Kuwait", "Kyrgyz Republic", "Lao PDR",
        "Lebanon", "Malaysia", "Maldives", "Mongolia", "Myanmar", "Nepal", "Oman", "Pakistan",
        "Philippines", "Qatar", "Saudi Arabia", "Singapore", "Sri Lanka", "Syrian Arab Republic",
        "Tajikistan", "Thailand", "Turkmenistan", "United Arab Emirates", "Uzbekistan", "Viet Nam",
        "West Bank and Gaza", "Yemen, Rep.", "Korea, Rep.", "Hong Kong SAR, China", "Macao SAR, China",
        "Timor-Leste"
    ],
    "Europe": [
        "Albania", "Andorra", "Austria", "Belarus", "Belgium", "Bosnia and Herzegovina", "Bulgaria",
        "Croatia", "Czechia", "Denmark", "Estonia", "Faroe Islands", "Finland", "France", "Germany",
        "Gibraltar", "Greece", "Hungary", "Iceland", "Ireland", "Italy", "Kosovo", "Latvia",
        "Liechtenstein", "Lithuania", "Luxembourg", "Malta", "Moldova", "Monaco", "Montenegro",
        "Netherlands", "North Macedonia", "Norway", "Poland", "Portugal", "Romania", "Russian Federation",
        "San Marino", "Serbia", "Slovak Republic", "Slovenia", "Spain", "Sweden", "Switzerland",
        "Turkiye", "Ukraine", "United Kingdom", "Isle of Man"
    ],
    "North America": [
        "Antigua and Barbuda", "Bahamas, The", "Barbados", "Belize", "Bermuda", "Canada", "Costa Rica",
        "Cuba", "Dominica", "Dominican Republic", "El Salvador", "Grenada", "Guatemala", "Haiti",
        "Honduras", "Jamaica", "Mexico", "Nicaragua", "Panama", "Puerto Rico", "St. Kitts and Nevis",
        "St. Lucia", "St. Vincent and the Grenadines", "Trinidad and Tobago", "United States",
        "Virgin Islands (U.S.)", "Cayman Islands", "British Virgin Islands", "St. Martin (French part)",
        "Greenland", "Sint Maarten (Dutch part)", "Turks and Caicos Islands"
    ],
    "South America": [
        "Argentina", "Bolivia", "Brazil", "Chile", "Colombia", "Ecuador", "Guyana", "Paraguay",
        "Peru", "Suriname", "Uruguay", "Venezuela, RB", "Curacao", "Aruba"
    ],
    "Oceania": [
        "Australia", "Fiji", "Kiribati", "Marshall Islands", "Micronesia, Fed. Sts.", "Nauru",
        "New Zealand", "Palau", "Papua New Guinea", "Samoa", "Solomon Islands", "Tonga", "Tuvalu",
        "Vanuatu", "American Samoa", "New Caledonia", "French Polynesia", "Guam", "Northern Mariana Islands"
    ]
}

# Flatten the continent mapping for easier access
country_to_continent = {
    country: continent
    for continent, countries in continent_mapping.items()
    for country in countries
}

# Map countries to continents
data['Continent'] = data['Country Name'].map(country_to_continent)
data[['Country Name', 'Continent']].head()

Unnamed: 0,Country Name,Continent
0,Afghanistan,Asia
1,Albania,Europe
2,Algeria,Africa
3,Andorra,Europe
4,Angola,Africa


## Analysis and Predictions



## Sagar's Code

## Betül's Code

## Conclusion


In this section: 
Derive answers to the question from your analysis

Identify limitations of your analysis

How reliable are your answers? 
