<h1><center>World Death and Suicide Rates</center></h1>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns; sns.set_theme(palette='tab10')
from wordcloud import WordCloud
import geopandas


In [None]:
!pip install pycountry-convert
import pycountry-convert as pc
from IPython.display import clear_output
clear_output()

In [None]:
# visualization

def visualize_word_counts(counts, show=True):
    wc = WordCloud(max_font_size=130, min_font_size=25, colormap='tab20', background_color='black', 
                   prefer_horizontal=.95, width=2100, height=700, random_state=0)
    cloud = wc.generate_from_frequencies(counts)
    plt.figure(figsize=(15,12))
    plt.imshow(cloud, interpolation='bilinear')
    plt.axis('off')
    if show:
        plt.show()

# Data cleaning

We first read the csv and look at a few rows.

In [None]:
data = pd.read_csv('../input/suicide-rates-worldwide-20002019/data.csv')
data.head()

The actual data starts at row 1 (not row 0).

We should create new column names from the column names above, and row 0.

In [None]:
columns = ['Country', 'Year', 
           'ProbDyingBoth','ProbDyingMale','ProbDyingFemale',
           'SuicideBoth','SuicideMale','SuicideFemale']

values = data.iloc[1:,:].values
data = pd.DataFrame(values, columns=columns)

for col in columns[2:]:
    data[col] = data[col].map(lambda x: x.split('[')[0]).astype('float')

In [None]:
data['ProbDying_male_minus_female'] = data['ProbDyingMale'] - data['ProbDyingFemale']
data['Suicide_male_minus_female'] = data['SuicideMale'] - data['SuicideFemale']

In [None]:
data.sample(5)

Now it looks much better! But we still have some work to do! Some country names have some extra info like "`Iran (Islamic Republic of)`". We want to correct it.

We also want to find the continent of each country. We'll use `pycountry_convert` library for that purpose.

In [None]:
country_names_correction = {'Bolivia (Plurinational State of)': 'Bolivia', 
                            'Iran (Islamic Republic of)': 'Iran', 
                            'Micronesia (Federated States of)': 'Micronesia', 
                            'Republic of Korea': 'South Korea', 
                            'Venezuela (Bolivarian Republic of)': 'Venezuela'}

continent_code_to_name = {'AS':'Asia', 'SA':'South America', 'NA':'North America', 
                          'EU':'Europe', 'OC':'Oceania', 'AF':'Africa'}

In [None]:
def country_to_continent(country_name):
    if country_name == 'Timor-Leste':
        return 'Asia'
    country_code = pc.country_name_to_country_alpha2(country_name)
    continent_code = pc.country_alpha2_to_continent_code(country_code)
    return continent_code_to_name[continent_code]

In [None]:
data['Country'] = data['Country'].map(country_names_correction).fillna(data['Country'])
data['Continent'] = data['Country'].map(country_to_continent)

In [None]:
countries = {val:df for val, df in data.groupby('Country')} # a dictionary of country names to the data
country_avg = data.groupby('Country').mean().reset_index() # a dataframe of average by country (over the years)

In [None]:
countries['Germany'].head()

# Death and suicide rates by gender

It appears that in almost every country (except for a few), men die more often than women. Furthermore, in every country, men commit suicide more often than women.

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15,6))

ax[0].plot([0,60],[0,60], linewidth=3, color='black')
sns.scatterplot(data=country_avg, x='ProbDyingMale', y='ProbDyingFemale', s=100, ax=ax[0])
ax[0].set_title('Dying rates by gender')

ax[1].plot([0,100],[0,100], linewidth=3, color='black')
sns.scatterplot(data=country_avg, x='SuicideMale', y='SuicideFemale', s=100, ax=ax[1])
ax[1].set_title('Suicide rates by gender')

plt.show()

In [None]:
country_avg.sort_values(by='ProbDying_male_minus_female')

It appears that 

- In **Mali** (top row), women die more often than men.
- In **Belarus** (bottom row), men die far more often than women.

# Plotting with geopandas

Geopandas sometimes use different country names. We also need to correct them. For example, the country `Bosnia and Herzegovina` appears as `Bosnia and Herz.` in geopandas.

In [None]:
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))

In [None]:
geopandas_countryname_correction = {'Bosnia and Herz.': 'Bosnia and Herzegovina', 
                                    'Brunei': 'Brunei Darussalam', 
                                    'Central African Rep.': 'Central African Republic', 
                                    'Dem. Rep. Congo': 'Democratic Republic of the Congo', 
                                    'Dominican Rep.': 'Dominican Republic', 
                                    'Eq. Guinea': 'Equatorial Guinea', 
                                    'eSwatini': 'Eswatini', 
                                    'Laos': "Lao People's Democratic Republic", 
                                    'Macedonia': 'North Macedonia', 
                                    'Moldova': 'Republic of Moldova', 
                                    'Russia': 'Russian Federation', 
                                    'United Kingdom': 'United Kingdom of Great Britain and Northern Ireland', 
                                    'Tanzania': 'United Republic of Tanzania', 
                                    'Vietnam': 'Viet Nam', 
                                    'Syria': 'Syrian Arab Republic', 
                                    'S. Sudan': 'South Sudan', 
                                    'Solomon Is.': 'Solomon Islands', 
                                    'North Korea': "Democratic People's Republic of Korea"}

In [None]:
world['Country'] = world['name'].map(geopandas_countryname_correction).fillna(world['name'])
world_merged = world.merge(country_avg, on='Country', how='left')

In [None]:
world_merged.plot(column='ProbDyingBoth', legend=True, figsize=(15, 8), missing_kwds={"color": "lightgrey","label": "Missing values",},)
plt.title('Overall dying rates for both genders')
plt.show()

In [None]:
world_merged.plot(column='SuicideBoth', legend=True, figsize=(15, 8), missing_kwds={"color": "lightgrey","label": "Missing values",},)
plt.title('Overall suicide rates for both genders')
plt.show()

The above two plots show that
- In **Africa** and **South America**, people die more often, but they do not commit suicide. 
- In **Canada**, **Australia** and **nordic countries**, people don't die often, but the suicide rates are somewhat higher.
- Dying and suicide rates are both high in **Russia**.

In [None]:
visualize_word_counts(country_avg.set_index('Country')['ProbDyingBoth'].to_dict(), show=False)
plt.title('Dying rates for both genders')
plt.show()

In [None]:
visualize_word_counts(country_avg.set_index('Country')['SuicideBoth'].to_dict(), show=False)
plt.title('Suicide rates for both genders')
plt.show()

# Yearly changes

Let's look at the Top 10 countries with highest overall death and suicide rates. Are the rates increasing, decreasing or constant?

In [None]:
top10_death = country_avg.sort_values('ProbDyingBoth', ascending=False).head(10)
top10_death

In [None]:
top10_suicide = country_avg.sort_values('SuicideBoth', ascending=False).head(10)
top10_suicide

In [None]:
data_top10_death = data[data.Country.isin(top10_death['Country'])].iloc[::-1,:] # the iloc part reverses the dataframe, so 2000 comes before 2019

plt.figure(figsize=(15,6))
sns.lineplot(data=data_top10_death, x='Year', y='ProbDyingBoth', hue='Country')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()

There is no general trend for all Top 10 countries. 

However, when we look at the plots separately, it is easy to catch some patterns. 

For example, in Kiribati and Afganistan, the death rates are decreasing. In Eswatini, the death rates made a peak around 2007, then it started decreasing rapidly.

In [None]:
data_top10_suicide = data[data.Country.isin(top10_suicide['Country'])].iloc[::-1,:] # the iloc part reverses the dataframe, so 2000 comes before 2019

plt.figure(figsize=(15,6))
sns.lineplot(data=data_top10_suicide, x='Year', y='SuicideBoth', hue='Country')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()

Many countries with highest suicide rates follow a somewhat constant or slightly decreasing trend over the years. 

An exception Lesotho. In this country, suicide rates made a peak at 2014, and now it is decreasing again.