
# Project: Investigate a Dataset -  Suicide Data Analysis 
***Suicide Rates per Country from 1990 to 2016 and GDP, gini-index and cell phone use.*** 

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#wrangling">Data Wrangling</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

<a id='intro'></a>
## Introduction

Suicide is a global phenomenon and occurs throughout the lifespan. Close to 800 000 people die due to suicide every year, which is one person every 40 seconds (source: WHO). For this project, I am analyzing global suicide data from 1990 to 2016. In particular I am interested in finding trends amongst countries with the most and the least suicide per 100,000 people and how those countries differ in GDP, Gini-index (inequality index) and cell-phone use.

### Data used

For my analysis I choose datasets form Gapminder World (https://www.gapminder.org/data/). 

- Population Dataset (pop_df) -> <i>number of people</i>
- Cell Phone Usage Dataset (cell_df) -> <i>numbers of subscriptions</i>
- Suicide Dataset (suic_df) -> <i>suicide total deaths</i>
- GDP Dataset (gdp_df) -> <i>GDP per capita</i>
- Gini Index (gini_df) -> <i>(inequality index)</i>


### Questions for Analysis
<b>1. SUICIDE TREND AND GDP: Is GDP associated with suicide rate per country?</b><br />
        - what is the overall suicide trend globally?<br />
        - what is the overall suicide trend for top 10 countries\*?<br/>
        - what is the overall suicide trend for bottom 10 countries\*?<br/>
        - what is GDP (gross domestic product) in top or bottom 10 countries?<br/>
        - which countries are in top 10?<br />
        - which countries are in bottom 10?<br />

<b>2. SUICIDE TREND AND GINI INDEX: Is Gini Index associated with suicide rate per country</b><br />
        - is gini index associated with suicide rate?
        
*Top and bottom 10 countries were defined by most/least number of suicides per capita (per 100,000).       
        
*Gross domestic product is a monetary measure of the market value of all the final goods and services produced in a specific time period. Source: wikipedia*

*In economics, the Gini coefficient, sometimes called the Gini index or Gini ratio, is a measure of statistical dispersion intended to represent the income inequality or wealth inequality within a nation or any other group of people. A Gini coefficient of zero expresses perfect equality, where all values are the same (for example, where everyone has the same income). A Gini coefficient of one (or 100%) expresses maximal inequality among values (e.g., for a large number of people where only one person has all the income or consumption and all others have none, the Gini coefficient will be nearly one) Source: wikipedia*

### Description for investigation
In order to investigate those question I did the following:
- Download the data from Gapminder World.
- Examine dataset and handle missing values.
- Merge and reshape the data.
- Calculate suicide per 100,000 people and cell phone use per 100,000 people.
- Check the trends and correlations between variables.
- Group countries in two categories for comparison - top 10 countries with most suicide per capita and bottom 10 countries with least suicide per capita.
- Check summary statistics.
- Plot histogram for GDP in those two groups.
- Explore suicide trends in those two groups.
- Group Gini index into 3 categories in order to see which group has more suicide per capita.


##### Import dependencies

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
# Import seaborn library and ignore warning 
sns.set_theme()
import warnings
warnings.filterwarnings('ignore')

<a id='wrangling'></a>
## Data Wrangling

### General Properties

##### Import datasets

In [None]:
pop_df = pd.read_csv("Data/population_total.csv")
inet_df = pd.read_csv("Data/net_users_num.csv")
cell_df = pd.read_csv("Data/cell_phones_total.csv")
suic_df = pd.read_csv("Data/suicide_total_deaths.csv")
gini_df = pd.read_csv("Data/inequality_index_gini.csv")
gdp_df = pd.read_csv("Data/gdppercapita.csv")

##### Check imported datasets

In [None]:
# pop_df.head(3)
# gdp_df.head(3)
# inet_df.head(3)
# cell_df.head(3)
# gdp_df.head(3)
# gini_df.head(3)

##### Filtering dataset using iloc and numpy

In [None]:
pop_df = pop_df.iloc[:, np.r_[:1, 191:218]]
inet_df = inet_df.iloc[:, :28]
cell_df = cell_df.iloc[:, np.r_[:1, 31:58]]
gini_df = gini_df.iloc[:, np.r_[:1, 24:51]]
gdp_df = gdp_df.iloc[:, np.r_[:1, 191:218]]

*Suicide dataset contains data from 1990 to 2016; therefore I decided to perform data analysis for the years between 1990 to 2016 (included). I kept these columns using `iloc` method with `numpy`.*

##### Handling null values

In [None]:
# Check null values for internet and cell usage
#suic_df.isnull().sum()
#pop_df.isnull().sum()
#gdp_df.isnull().sum()
#cell_df.isnull().sum()
#inet_df.isnull().sum()

*After examining cell phone users dataset I noticed a positive trend for all countries; therefore I decided to use `ffill` method (fill forward) instead of filling with mean or 0. I believe a better way to fill null values would be average values between empty cells?
I fill null values for gini-index dataset with mean value for all years for specific country*

In [None]:
# fill first colum [1990] with null if empty 
# (I use this to fill the first column with 0. When usind ffill to populate 0 instesd of country name.)
inet_df['1990'].fillna(0, inplace=True)
cell_df['1990'].fillna(0, inplace=True)

In [None]:
# fill null values for internet and cell phone
inet_df = inet_df.fillna(method='ffill', axis=1)
cell_df = cell_df.fillna(method='ffill', axis=1)

In [None]:
# In order to fill null values across columns (mean for the rows) I took the following steps: 
# 1. set new index to country column:
gini_df.set_index(['country'], inplace = True)

In [None]:
# 2. using lambda function to fill null values across rows
gini_df = gini_df.apply(lambda row: row.fillna(row.mean()), axis=1)

In [None]:
# 3. reset index (to be consistent with other dataframes)
gini_df = gini_df.reset_index()

In [None]:
# 4. Check the dataframe
gini_df.head(3)

##### Unpivot dataframes

In [None]:
pop_df = pop_df.melt(id_vars = 'country', var_name = 'year', value_name = 'population_total', ignore_index=True)
inet_df = inet_df.melt(id_vars = 'country', var_name = 'year', value_name = 'internet_use', ignore_index=True)
cell_df = cell_df.melt(id_vars = 'country', var_name = 'year', value_name = 'cell_use', ignore_index=True)
suic_df = suic_df.melt(id_vars = 'country', var_name = 'year', value_name = 'suicide_total', ignore_index=True)
gini_df = gini_df.melt(id_vars = 'country', var_name = 'year', value_name = 'gini_index', ignore_index=True)
gdp_df = gdp_df.melt(id_vars = 'country', var_name = 'year', value_name = 'GDP', ignore_index=True)

*In order to compare different indicators I reshaped data with `melt` method.*

##### Merge datasets

In [None]:
# Merge datasets two by two - please see comments below
merged01 = pd.merge(pop_df, suic_df, how="inner", on=["country", "year"])
merged02 = pd.merge(cell_df, inet_df, how="inner", on=["country", "year"])
merged03 = pd.merge(gdp_df, gini_df, how="left", on=["country", "year"])
merged04 = pd.merge(merged01, merged02, how="inner", on=["country", "year"])

# Final merge
merged_df = pd.merge(merged03, merged04, how="inner", on=["country", "year"])

In [None]:
# Check merged dataset
merged_df.head(3)

*Some countries don't have data for the gini index, but I would still like to include them in my analysis. 
I will keep those null values and later create new dataframe for gini index analysis.*

##### Checking dataset (shape, null values, datatypes and duplicates) after merging

In [None]:
# Shape
merged_df.shape

In [None]:
# Duplicate rows
merged_df.duplicated().sum()

In [None]:
# Missing values - decide what to do with null values
merged_df.isnull().sum()

In [None]:
# Check individual datatypes - convert year to int
merged_df.dtypes

### Data Cleaning

#### Changing datatypes

In [None]:
# Change data types for internet use - scientific 
merged_df['internet_use'] = merged_df['internet_use'].astype(float)
merged_df['year'] = merged_df['year'].astype(int)
merged_df['cell_use'] = merged_df['cell_use'].astype(int)

In [None]:
# Round suicide & change to integer (suicide was in float)
merged_df['suicide_total'] = merged_df.suicide_total.round()
merged_df['suicide_total'] = merged_df['suicide_total'].astype(int)

In [None]:
# Check datatypes
merged_df.dtypes

#### Feature engeenering

In [None]:
# Calculate proportions of internet users, cell users, and the number of suicides % per country and year.
merged_df['internet_use_%'] = merged_df.internet_use/merged_df.population_total*100
merged_df['cell_use_%'] = merged_df.cell_use/merged_df.population_total*100
merged_df['suicide_total_%'] = merged_df.suicide_total/merged_df.population_total*100

In [None]:
# Calculate proportions per capita (per 100,000 people) for internet users, cell users, and the number of suicides.
merged_df['internet_use_pc'] = merged_df.internet_use/merged_df.population_total*100000
merged_df['cell_use_pc'] = merged_df.cell_use/merged_df.population_total*100000
merged_df['suicide_total_pc'] = merged_df.suicide_total/merged_df.population_total*100000

In [None]:
# Check newly created columns in whole dataframe.
merged_df.head(5)

In [None]:
# Dipsplay columns in a list for faster reordering (copy-paste)
merged_df.columns

In [None]:
# Reposition the columns for easier slicing
merged_df = merged_df[['country', 'year', 'GDP','gini_index', 'population_total', 'cell_use',
       'internet_use', 'suicide_total',  'internet_use_%', 'cell_use_%',
       'suicide_total_%', 'internet_use_pc', 'cell_use_pc',
       'suicide_total_pc']]

In [None]:
# Check what columns has null values for gini index:
gini_null = merged_df[merged_df.gini_index.isnull()]
gini_null.head(3)

In [None]:
# List the countries without data for gini index
gini_null.country.value_counts()

<a id='eda'></a>
## Exploratory Data Analysis

### Research Question 1  - Is GDP associated with suicide rate per country?

*In this analysis I will use columns per 100,000 people. I used `iloc` method to keep only columns created in feature engineering*

In [None]:
# Create df with columns per capita only
pc_df = merged_df.iloc[:, np.r_[:3, 11:14]]
pc_df.head(3)

### For all countries

#### FEW INVESTIGATING QUESTIONS

In [None]:
# Country and year with most suicides per capita - used idxmax to find the row where this information is:
most_suicides_pc = pc_df.suicide_total_pc.idxmax(axis=1)
most_suicides_pc

In [None]:
# Used iloc method to display data whit index found in the previous cell.
most_suicides_pc = pc_df.iloc[886,:]
most_suicides_pc

In [None]:
# Find a country and year with most suicides per capita - this is another way to get the same answer as cells above.
most_suicides_pc1 = pc_df.groupby(['country','year']).mean()['suicide_total_pc'].idxmax(axis=1)
most_suicides_pc1

In [None]:
# Year with most suicides per capita.
worst_year = pc_df.groupby(['year']).mean()['suicide_total_pc'].idxmax(axis=1)
worst_year

In [None]:
# Country and year with least suicides per capita.
least_year = pc_df.groupby(['country','year']).mean()['suicide_total_pc'].idxmin(axis=1)
least_year

In [None]:
# Year with least suicides per capita.
least_year = pc_df.groupby(['year']).mean()['suicide_total_pc'].idxmin(axis=1)
least_year

#### HISTOGRAM

In [None]:
# Plotting histograms 
pc_df.hist(figsize=(10,8), color='#1f77b4');

*Plotting histogram for the entire dataset provides a lot of insights. GDP is skewed to the right and the majority of countries have GDP below 20,000. Suicide is also skewed to the right with the majority of below 15 per 100,000 people.* 

#### CORRELATION

In [None]:
# Plotting correlation coeficient heat map.
pearsoncorr = pc_df.corr(method='pearson')
pearsoncorr
sns.heatmap(pearsoncorr, 
            xticklabels=pearsoncorr.columns,
            yticklabels=pearsoncorr.columns,
            cmap='RdBu_r',
            annot=True,
            linewidth=0.5);

*Correlation matrix is a quick way to explore correlations between variables. From the matrix, we can see that GDP and suicide have a strong negative correlation.* 

In [None]:
# Scatter plot between GDP and suicide
pc_df.plot(x='GDP', y='suicide_total_pc', kind='scatter', c='#1f77b4');
plt.title('Correlation between suicide and GDP');

*Since my analysis is focused on the association between suicide and GDP I decided to plot a scatter plot between GDP and suicide rate. The Scatter plot doesn't clearly show the correlation between those two variables. In the analysis below is further exploration for these two variables.*

#### SUICIDE TREND

In [None]:
# What is the overall trend in suicide globally
suicide_trend_all = merged_df.groupby(['year']).sum()['suicide_total']
suicide_trend_all.plot(alpha=.4, color='blue', linewidth=2.5);
plt.ylabel('Suicide per 100,000 people')
plt.xlabel('Year')
plt.title('Global Suicide Trend', fontsize=14);

*From the line chart, we can see the overall trend for suicide globally. The line represents all suicides (the count), grouped by year.*

#### CELL PHONE USE

In [None]:
# What is the overall trend in cell phone use?
cell_trend_all = merged_df.groupby(['year']).sum()['cell_use']
cell_trend_all.plot(alpha=.4, color='orange', linewidth=2.5);
plt.ylabel('Cell phone use per 100,000 people')
plt.xlabel('Year')
plt.title('Global Cell phone Use', fontsize=14);

In this analysis, I wanted to show if there is any trend or association between cell phone use and suicide. For example: in 1990 the cell phone became more and more popular and it might have an impact on the suicide rate. Next, smartphones became popular around the year 2010, and wanted to see if there is an association.  Unfortunately, this dataset is not suitable for this kind of research and more detailed data is needed. For example suicide data per age, smartphone sales, etc.

### Top 10 and bottom 10 countries with most/least suicides per capita

In [None]:
# Top 10 countries from 1990 - 2016, get top 10 countries and save in a list.
suicide_top10 = pc_df.groupby(['country']).mean()['suicide_total_pc'].nlargest(10).to_frame().index.tolist()

In [None]:
# Bottom 10 countries from 1990 - 2016, get bottom 10 countries and save in a list.
suicide_bottom10 = pc_df.groupby(['country']).mean()['suicide_total_pc'].nsmallest(11).to_frame().index.tolist()

In [None]:
# Remove Kuwait (recognized as outlier = GPA >100,000)
suicide_bottom10.remove("Kuwait")

*Note: I queried the top 11, because during my analysis I found an outlier - Kuwait, that had GDP far above the average (100,000). Therefore, I excluded this country from analysis and replace it with the next country in the bottom 10.*

In [None]:
# get the data frame for top 10
top_10_countries_suicide = pc_df[pc_df['country'].isin(suicide_top10)]
top_10_countries_suicide.head(2)

In [None]:
# get the data frame for top 10
bottom_10_countries_suicide = pc_df[pc_df['country'].isin(suicide_bottom10)]
bottom_10_countries_suicide.head(2)

### Compare top and bottom 10 countries

#### Summary statistics & GDP comparison

In [None]:
# Top 10
top_10_countries_suicide.describe()

In [None]:
# Bottom 10
bottom_10_countries_suicide.describe()

*From the summary statistics, I can see that GDP values in the top 10 countries are much more spread out than in the bottom 10 countries. The mean for the top 10 countries is 33.93 suicide per 100,000 while the mean for the bottom 10 countries is 2.37 per 100,000 people.*

In [None]:
#Plot the Box & Whiskers
x_labels = ["Top 10 Countries", "Bottom 10 Countries"]
DGP = [top_10_countries_suicide.GDP, bottom_10_countries_suicide.GDP]
fig, ax = plt.subplots(figsize=(8, 5))
ax.boxplot(DGP, labels=x_labels)

#Ticks & Labels
plt.xticks(fontsize=12)
plt.yticks(fontsize=10)

ax.set_yticks(np.arange(0, 70000, step=10000))
ax.set_title('GDP comparison between the top and bottom 10 countries in suicides 100,000 people',fontsize=14)
ax.set_ylabel('GDP',fontsize=12);
#ax.set_xlabel("name",fontsize=14);

*Box plots are another great way to graphically explore 5-number summary statistics. From the chart I can see that GDP values for bottom 10 countries are much more spread out than in top 10 countries.*

In [None]:
# histogram - GDP
bottom_10_countries_suicide.GDP.hist(alpha=.4, color='blue', label='bottom_10'), 
top_10_countries_suicide.GDP.hist(alpha=.4, color='red', label='top_10')
# legend
plt.legend()
# lebel and axis
plt.ylabel('GDP frequency')
plt.xlabel('GDP')
plt.title('GDP comparison between the top and bottom 10 countries in suicides', fontsize=14);

*From this histogram I can see that top 10 countries has GDP between 5,000 and 30,000 whereas the top bottom countries has GDP values more spread out ranging from 10,000 to 55,000.*

#### Suicide Trend

In [None]:
# Top 10 - suicide trend
trend_top10 = top_10_countries_suicide.groupby(['year']).mean()['suicide_total_pc']
trend_top10.plot(alpha=.4, color='red', label='Top 10', linewidth=2.5);
# legend
plt.legend()
# lebel and axis
plt.ylabel('Suicide per 100,000 people')
plt.xlabel('Year')
plt.title('Suicide trend in top 10 countries', fontsize=14);

*An interesting observation from a line chart when closely observe the suicide trend per 100,000 people. From the line chart, I can see a steep decline in suicides after the year 2000.*

In [None]:
# Bottom 10 - suicide trend
trend_bottom10 = bottom_10_countries_suicide.groupby(['year']).mean()['suicide_total_pc']
trend_bottom10.plot(alpha=.4, color='blue', label='bottom_10', linewidth=2.5);
# legend
plt.legend()
# lebel and axis
plt.ylabel('Suicide per 100,000 people')
plt.xlabel('Year')
plt.title('Suicide trend in bottom 10 countries', fontsize=14);

*The line chart for the bottom 10 countries in suicides per 100,000 people shows a different trend. There is an increasing trend in suicides from the year 1996.*

In [None]:
# Countries with the most suicide per capita.
x_axis = top_10_countries_suicide.country
y_axis = top_10_countries_suicide.suicide_total_pc
plt.xticks(rotation = 45)
plt.bar(x_axis, y_axis, label='suicide_total_pc', alpha=.2, color='red')
# legend
#plt.legend()
# lebel and axis
plt.ylabel('Suicide per 100,000 people')
plt.xlabel('Year')
plt.title('Countries with the most suicides per 100,000 people', fontsize=14);

*Bar chart grouped by countries displays top 10 countries in suicides. From the summary statistic above I can learn that the mean of suicides per 100,000 for these countries is 33.93 (in comparison with the bottom 10 countries where the mean is 2.37.)*

In [None]:
# Check the trend for Slovenia.
slovenia = top_10_countries_suicide.query('country == "Slovenia"')
russia = top_10_countries_suicide.query('country == "Russia"')
hungary = top_10_countries_suicide.query('country == "Hungary"')
slovenia.groupby(['year']).mean()['suicide_total_pc'].plot(alpha=.7, color='green', linewidth=2.5, label='Slovenia')
russia.groupby(['year']).mean()['suicide_total_pc'].plot(alpha=.7, color='orange', linewidth=2.5, label='Russia');
hungary.groupby(['year']).mean()['suicide_total_pc'].plot(alpha=.7, color='blue', linewidth=2.5, label='Hungary');
# legend
plt.legend()
# lebel and axis
plt.ylabel('Suicide per 100,000 people')
plt.xlabel('Year')
plt.title('Suicide trend in Slovenia, Russia and Hungary', fontsize=14);

*Unfortunately, I found out that my home country Slovenia is one of the top 10 countries in suicides. I wanted to take a closer look at this specific country using the `query` method. From the line chart, I can observe the declining trend from the year 1998 up until the year 2014 with a slight increase. Additionally, I plotted two more countries in this category for comparison.*

In [None]:
# Countries with the least suicide per capita.
x_axis = bottom_10_countries_suicide.country
y_axis = bottom_10_countries_suicide.suicide_total_pc
plt.xticks(rotation = 90)
plt.bar(x_axis, y_axis, label='suicide_total_pc', alpha=.2, color='blue')
# legend
#plt.legend()
# lebel and axis
plt.ylabel('Suicide per 100,000 people')
plt.xlabel('Year')
plt.title('Countries with the least suicides per 100,000 people', fontsize=14);

*Bar chart grouped by countries displays bottom 10 countries in suicides. From the summary statistic above I can learn that the mean of suicides per 100,000 for these countries is 2.37 (in comparison with the bottom 10 countries where the mean is 33.93)*

### Suicide trend line chart from 2010 - 2016

In [None]:
# Create df for only 2010 - 2016
main_df_reduced = merged_df.loc[merged_df.year > 2009,:]

In [None]:
# Suicide trend (here I used acctual number of suicides (not per capita))
# What is the overall trend in suicide globally
suicide_trend_all_reduced = main_df_reduced.groupby(['year']).sum()['suicide_total']
suicide_trend_all_reduced.plot(alpha=.4, color='purple', label='bottom_10', linewidth=2.5);
# lebel and axis
plt.ylabel('Total number of suicides')
plt.xlabel('Year')
plt.title('Global suicide trend from 2010 - 2016', fontsize=14);

*This is a line chart of total suicides from 2010 to 2016. It is a small section from the whole chart above (suicide trend from 1990-2016). Although the overall suicide trend is declining, seen from the chart above, this slice can tell a different story if we are looking at the suicide trend from 2010 and 2016.*

### Research Question 2  - Is gini index associated with suicide rate per country?

In [None]:
# Check the dataset
merged_df.head()

In [None]:
# Save df into new variable and drop null values
merged_df_gini = merged_df.dropna()

In [None]:
# Sumary statistics for gini-index
merged_df_gini.gini_index.describe()

In [None]:
# Create bins for gini index & labels
bin_edges = [20.7, 35, 44.6, 65.8] 
bin_names = ['lower inequality', 'medium inequality', 'higher inequality'] 
# Create new columns
merged_df_gini['gini_index_cat'] = pd.cut(merged_df_gini['gini_index'], bin_edges, labels=bin_names)
# Checks df
merged_df_gini.head()

In [None]:
# Plot created gini index bins and suicide rates
gini_bins = merged_df_gini.groupby('gini_index_cat').mean().suicide_total_pc
gini_bins.plot(kind='bar', alpha=.7, color='orange', linewidth=2.5);
plt.ylabel('Average suicide per 100,000 people')
plt.xlabel('Gini index categories')
plt.title('Gini index - inequality index for and suicide per capita', fontsize=14);

*For this bar chart I grouped the Gini index into 3 categories (lower inequality, medium inequality and higher inequality). From those bins, we can see that there might be a correlation between the Gini index and suicide. There are more suicides in the country with a lower inequality index (Gini index) than in countries with a higher inequality index.*


*In economics, the Gini coefficient, sometimes called the Gini index or Gini ratio, is a measure of statistical dispersion intended to represent the income inequality or wealth inequality within a nation or any other group of people. A Gini coefficient of zero expresses perfect equality, where all values are the same (for example, where everyone has the same income). A Gini coefficient of one (or 100%) expresses maximal inequality among values (e.g., for a large number of people where only one person has all the income or consumption and all others have none, the Gini coefficient will be nearly one) Source: wikipedia*

<a id='conclusions'></a>
## Conclusions

In this analysis, I focused on the global suicide trend from 1990 to 2016 and explore the potential association with GDP, Gini-index and cell phone use.<br/>
Regarding cell phone use I cannot draw any meaningful conclusions since the idea is beyond this dataset. I wanted to see if there is any trend or association between cell phone use and suicide. For example: in 1990 the cell phone became more and more popular and it might have an impact on the suicide rate. Next, smartphones became popular around the year 2010 and wanted to see if there is any association.  Unfortunately, this dataset is not suitable for this kind of research and more detailed data is needed. For example suicide data per age, smartphone sales, etc.<br/>

From the line chart, I learned about the suicide trends, globally, for the top 10 countries with the most suicide per capita and the bottom 10 countries with the least suicide per capita. From the line chats, I can see an overall decreasing trend with a slight increase in 2014. This trend is similar for the top 10 countries. The bottom 10 countries' suicide rate has a positive trend, which is the opposite of the top 10 countries. The mean for the top 10 countries is 33.93 suicide per 100,000 while the mean for the bottom 10 countries is 2.37 per 100,000 people.*

From the summary statistics, we can see that GDP values in the top 10 countries are much more spread out than in the bottom 10 countries. The histogram showed that the top 10 countries have GDP between 5,000 and 30,000 whereas the top and bottom countries have GDP values more spread out ranging from 10,000 to 55,000.

Grouping the Gini index into three categories showed that countries with a lower inequality rate have more suicide than countries with a higher inequality rate.

***Limitations***
This dataset has few limitations. Gini-index null values were filled with the mean of the Gini index for a specific country throughout the years. The original dataset had a lot of missing values and filling null values this way could cause the results to be too general. Another improvement could be done to add more variables to the dataset. For example suicide per age group, gender and smartphone sales in the 2000s. 

