# E.D.A.: CORRELATION BETWEEN  UNEMPLOYMENT & SUICIDE RATES IN 43 COUNTRIES FROM 2000 TO 2016  
By Daniel Del Valle González  2020-2021

## STEP 1 - DATA CLEANING & NORMALISING
## STEP 2 - DATA ANALISYS & TENDENCIES. TOP & BOTTOM COUNTRIES IN SUICIDE & UNEMPLOYMENT RATES
## STEP 3 - GRAPHIC VISUALISATION OF EACH DATAFRAME TENDENCIES

## STEP 1 - DATA CLEANING & NORMALISING

### We got 2 csv from:   
1 - https://www.kaggle.com/szamil/who-suicide-statistics csv for suicides from 2000.  
2 - https://stats.oecd.org/Index.aspx?QueryId=64198# csv for unemployment rates. I selected and added some countries that by defect were not included.  

We create 2 DataFrames from them:

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px
import plotly.graph_objs as go
import plotly.io as pio
import psutil
from utils.folders_tb import *
from utils.mining_data_tb import *
#from utils.visualization_tb import *

In [2]:
from utils.Unemployment import *
from utils.Suicide import *

## Both had to be cleaned and "synchronised" in many ways:

In [3]:
suicide = pd.read_csv("C:\\DATA_SCIENCE\\PROYECTO\\documentation\\who_suicide_statistics.csv")
unemployment = pd.read_csv("C:\\DATA_SCIENCE\\PROYECTO\\documentation\\unemployment_all_ratio.csv")

In [4]:
suic, unemp = intersector(df1=suicide, df2=unemployment, col1="country", col2="Country")      #creating dataframes with data about only their shared countries

### Unemployment CSV included "employment rate" and other values (columns). We only select those regarding to "unemployment rate"

In [5]:
unemp = only_desired(df=unemp, col1="Series", desired="Unemployment rate")  #selecting onlye "unemployment" stadistic values

### Unemployment DataFrame had some groups of countries that did not appear on the Suicides Rates DataFrame, so are discarded.

In [6]:
unemp = str_discarder(unemp, "Country", "OECD")                              #discarding groups of countries(no info in the other dataframe)
unemp = str_discarder(unemp, "Country", "Euro")

### Columns and values from both dataframes are to be renamed with similar names to allow easy contrast

In [7]:
unemp = column_renamer(unemp, ['SEX', 'Value', 'Time'], ['Gender', 'Unemploy_Rate', 'Year'])
suic = column_renamer(suic, ['sex'], ['gender'])

In [8]:
unemp = unemp[['Country', 'Gender', 'Age', 'Year', 'Unemploy_Rate']]      #selection only useful columns in uneployment dataframe

In [9]:
column_lower(suic)                                                        #normalising columns names
column_lower(unemp)                                                                    

In [10]:
value_renamer(unemp, "gender", "MEN", "male")                             #normalising "gender" column formats   
value_renamer(unemp, "gender", "WOMEN", "female")

In [11]:
value_discarder(suic, "age", "14")
value_discarder(suic, "age", "75+")
value_discarder(unemp, "gender", "MW") #discarding MW values as they are ambiguous, unaccurate summatories"""

In [12]:
str_replacer(unemp, "age", " to ", "-")

In [13]:
str_cleaner(suic, "age", "years ")

### Dataframes had data in different years ranges, so we had to cut them to fit:

In [14]:
unemp = unemp[unemp['year'] <= 2016]
suic = suic[suic['year'] >= 2000]

### On the "age" columns the ranges where different (fewer and broader in Suicides Df, narrower in Unemployment Df. Synchronisation was not an easy task)

In [15]:
unemp = unemp[(unemp['age'] == '15-24') | (unemp['age'] == '25-34') |(unemp['age'] == '35-44') |(unemp['age'] == '45-54') | (unemp['age'] == '55-64') |(unemp['age'] == '65-69') | (unemp['age'] == '70-74')]


In [16]:
unemp = unemp[(unemp['age'] == '15-24') | (unemp['age'] == '25-34') |(unemp['age'] == '35-44') |(unemp['age'] == '45-54') | (unemp['age'] == '55-64') |(unemp['age'] == '65-69') | (unemp['age'] == '70-74')]                                                                                               #discarding duplicated age ranges(some are include inside others) 

unemp.loc[(unemp["age"] == '35-44') | (unemp["age"] == '45-54'), 'age'] = '35-54'                                  #merging smaller ranges into a bigg one (and common with my other csv)
unemp.loc[(unemp["age"] == '55-64') | (unemp["age"] == '65-69') | (unemp["age"] == '70-74'), 'age'] = "55-74"

## STEP 2 - DATA ANALISYS & TENDENCIES. TOP & BOTTOM COUNTRIES IN SUICIDE & UNEMPLOYMENT RATES

### A common measure of suicide in stadistics is "suicies per 100k people", so we add a column containing those values for each slice of population, given that we have the info.

In [17]:
add_ratio(suic, "suic_100k", "suicides_no", "population", 100000, 2)   #creating a column with desired value; just 2 decimals is enough

### We create some sub-dataframes with info centered in distribution per Country:

In [18]:
suic_countries_mean = gr_meaner(suic, "country", "suic_100k")
unemp_countries_mean = gr_meaner(unemp, "country", "unemploy_rate")

### Then we create some sub-dataframes with info centered in Age: 

In [19]:
suic_ages_mean = gr_meaner(suic, "age", "suic_100k")
unemp_ages_mean = gr_meaner(unemp, "age", "unemploy_rate")

### We also create sub-dataframes for distribution in Gender: 

In [20]:
suic_genders_mean = gr_meaner(suic, "gender", "suicides_no")
unemp_genders_mean = gr_meaner(unemp, "gender", "unemploy_rate")

### Suicide countries of Interest:

In [None]:
most_per_100k , least_per_100k = most_least(suic, 'country', 'suic_100k', 5)

### Unemployment countries of Interest:

In [None]:
most_unemp, least_unemp = most_least(unemp, "country", "unemploy_rate", 5)

In [21]:
unemp_pivot_mean_gndr = pd.pivot_table(unemp, index = ['country', 'year', 'gender'], values = ['unemploy_rate']).round(2)

In [22]:
sp_un = unemp_pivot_mean_gndr.loc["Spain"]                              #unemployment evolution in Spain per gender : tendency is to equal

In [23]:
unemp_pivot_mean = pd.pivot_table(unemp, index = ['country', 'year'], values = ['unemploy_rate']).round(2)

## STEP 3 - GRAPHIC VISUALISATION OF EACH DATAFRAME TENDENCIES

In [24]:
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px
import plotly.graph_objs as go
import psutil
import plotly.io as pio
from utils.Unemployment import *
from utils.Suicide import *