# Human Development, Gender Equality, and Suicide

Student names: Mitch Boontjes, Lloyd de Rouw, Julian

Team number: J4

In [1]:
# Load image from link
url = 'https://i0.wp.com/epthinktank.eu/wp-content/uploads/2022/06/AdobeStock_456540956.jpeg?fit=4865%2C3000&ssl=1'

# Display image from URL with smaller size and subtitle
from IPython.display import Image, display

# Set the desired image width and height
width = 600
height = 300

# Set the subtitle text
subtitle = "© European Parliamentary Research Service"

# Create an Image instance with the URL
image = Image(url=url, width=width, height=height)

# Display the image and subtitle
display(image)
print(subtitle)

© European Parliamentary Research Service


## Introduction

Democracy, (mental) health access, educational access are living conditions that are associated to be indicators of mental health. Many would say that an increase of these factors should lead to an increase of mental health, yet deeper analysis of these factors might lead to surprising conclusions. This data story will revolve around the inherent correlations between human development and suicide. Human development is a broad term, but it shall be analysed it based on three common indices, and seperated into three perspectives, which will be supported with 2/3 visualizations each. The Red Thread throughout the data story is the Humand Development Index (HDI), which is an index from 0 to 1 (low-high) that is based on education, life expectancy, and GDP per capita (income). Perspective 1 (Awareness Perspective) will revolve around how awareness (through phychiatrists/100k civilians) and the HDI in general influence national suicide rates. Perspective 2 (Sex perspectives) will delve deeper into correlations between the sexes (male/female), and analyse how an increase of the HDI might influence both. Aside from this, the Gender Development Index (GDI) will be introduced and analysed in the same way. Perspective 3 (Political Perspective) focuses on how political influences (government type, democracy, political violence) could influence national suicide (including a correlation between political indices and the HDI).

## Dataset and Preprocessing

# Dataset 1
Database 1 contains data of 185 countries, revolved around the Human Development Index (HDI)  of each country, measured in 2015 specifically. The HDI is a value between 0 and 1 (low-high) that indicates human development by looking at health, education, and standard of living. The database contains each variable that is considered for the HDI value, but we are only interested in the HDI values themselves. Thus, as for preprocessing, all columns except the country and its HDI value are removed. https://www.kaggle.com/datasets/undp/human-development?select=human_development.csv

Database 2 is in terms of data-context identical to the first one, yet it revolves around the Gender Development Index (GDI), instead of the HDI. The GDI is also a value between 0 and 1 (low-high), that indicates equality in human development specifically between male and female. To clarify, the higher the gender equality, the higher the GDI value will be. Similarly to the preprocessing of database one, will all rows except the country and its GDI value be removed. https://www.kaggle.com/datasets/undp/human-development?select=human_development.csv

Database 3 shows the suicide rate per 100k/civilians for 182 countries from the year 2000 up to 2019. This dataset is however reduced to the year 2015 only, as databases 1 and 2 are measured in that year. Databases 1,2,3 are then merged together into Dataset 1, and the columns are renamed to more practical names. Rows that include one or more ‘empty’ values are removed from the dataset. Dataset 1 is used for the Development Perspective. https://www.kaggle.com/datasets/sandragracenelson/suicide-rate-of-countries-per-every-year

# Dataset 2&3
Database 4 contains suicide rates for 183 countries for four years (2000, 2010, 2015, 2016). Each country accounts for three rows of data, one for male suicides, one for female, and one for both. This database is first merged together with the HDI and GDI statistics from Dataset 1. Then it is seperated into a female suicide dataset (dataset 2) and a male suicide dataset (dataset 3). This had to be done by changing the tags for 'female' and 'male' to FeM and Male (using str.contains() and str.replace()), as database 4 must have included some white spacing in between the tags, making it inpossible to simply seperate for 'female' and 'male'. Datasets 2&3 are used for the Gender Perspective exclusively. https://www.kaggle.com/datasets/twinkle0705/mental-health-and-suicide-rates?select=Age-standardized+suicide+rates.csv

# Dataset 4
Database 5 is a huge dataset that kept track of 200 countries and their reign information (41 columns) for each year available up to 2021. As we are interested in human development-related statistics, we select the variables government type, political violence, and the democracy boolean (1.0 for democracy, 0.0 for non-democracy). Seven countries from database 5 are renamed so that they can be correctly merged with dataset 1. For each country, the most recently included datapoint (by year) is selected, and then the year is dropped from the dataset. Finally some columns are renamed for clarity, and so that they can be merged with Dataset 1 in the Main Dataset. https://www.kaggle.com/datasets/janzasadny/rulers-elections-and-irregular-governance

# Main Dataset 
The Main Dataset merges Dataset 1 and 4 together, which leaves a total of 140 countries and their data of HDI, GDI, Average Suicides from 2015, democracy status, government type, and political violence. This dataset is used for the Political Perspective.


* Important to note that we have tagged unedited data structures as 'Databases 1-5'. Edited datasets are mentioned as 'Datasets 1-5-Main'.

# Imports & Installs

In [None]:
!pip install -U numpy
!pip install -U plotly
!pip install -U pandas
!pip install -U matplotlib
!pip install -U seaborn
!pip install -U geopandas
!pip install -U matplotlib 
!pip install -U ipywidgets

/bin/bash: /home/lloyd/miniconda3/lib/libtinfo.so.6: no version information available (required by /bin/bash)
[0m/bin/bash: /home/lloyd/miniconda3/lib/libtinfo.so.6: no version information available (required by /bin/bash)


## Awareness Perspective 

This perspective focuses on the relationship between improvements of living conditions through the HDI. Our conclusion is stated as the following: 'Increases in healthly living conditions and mental health awareness will lead to a decrease in suicide.'
The living conditions are measured for each country seperately based on their corresponding HDI, and the mental health awareness will be measured through the amount of psychiatrists/100k civilians. 

### Arguments for the Awareness Perspective

All factors of the human development index (education, life expectancy, gdp per capita) should contribute to less reasons for suicide. Access to education might set goals and purposes in life, which are known factors to an increase of a 'will to live'. Furthermore, higher life expectancies quite obviously indicate healthier lifestyles, less dangerous surroundings and better healthcare, which should result in more 'comfortable' lifes. Lastly, gdp per capita indicate wealthier lives, which often lead to more healthcare access and comfort in general.

In [None]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go

df = pd.read_csv('databases/IV DATASET 1.csv')
hdi = df['HDI 2015']
average_suicide = df['Average suicide 2015']

# Convert to NumPy arrays
hdi = np.array(hdi)
average_suicide = np.array(average_suicide)

# Exclude NaN and inf values
valid_indices = np.isfinite(hdi) & np.isfinite(average_suicide)
hdi = hdi[valid_indices]
average_suicide = average_suicide[valid_indices]

# Calculate mean values
mean_hdi = np.mean(hdi)
mean_average_suicide = np.mean(average_suicide)

# Calculate the least squares regression line
A = np.vstack([hdi, np.ones(len(hdi))]).T
m, c = np.linalg.lstsq(A, average_suicide, rcond=None)[0]

# Create scatter plot
fig = go.Figure(data=go.Scatter(x=hdi, y=average_suicide, mode='markers', name='Countries'))

# Add the least squares trendline
fig.add_trace(go.Scatter(x=hdi, y=m * hdi + c, mode='lines', name='Least Squares Trendline'))

# Set labels and title
fig.update_layout(
    xaxis_title='HDI',
    yaxis_title='Average Suicide',
    title='HDI vs. Average Suicide'
)

# Show the plot
fig.show()

> *Figure 1: Human Development Index c.a. 2015 on X-axis. Average suicides per 100k civilians (as measured in 2015) on Y-axis. Blue dots represent the countries and their corresponding data. (Least Squared) Trendline shows the (small) negative correlation between the HDI and average suicides. In the charts you can make out that there exist a relationship between the HDI and the average suicide rate.

The trendline in the scatterplot above proves the hypothesis that we had formed, which is that an increase in HDI will lead to a decrease in average suicides. The trendline might not seem steep at first sight, but when zoomed in on the lower half of the graph, the correlation becomes more evident. Starting from around 15 suicides/100k at a HDI of 0.35, the trendline ends at 8.2 suicides/100k at a HDI of 0.94 , which is nearly twice at less.

In [None]:
import geopandas as gpd
import pandas as pd
import plotly.express as px
from ipywidgets import interact

# Step 3: Load the shapefile or GeoJSON file
shapefile_path = 'countries_map/countries.shp'
shapefile_data = gpd.read_file(shapefile_path)

# Step 4: Load the CSV data
csv_file_path = 'databases/IV DATASET 1.csv'
csv_data = pd.read_csv(csv_file_path)

# Renaming Countries in shapefile data that they match to our .csv file
shapefile_data.loc[shapefile_data['NAME'] == 'United States of America', 'NAME'] = 'United States'
shapefile_data.loc[shapefile_data['NAME'] == 'Russia', 'NAME'] = 'Russian Federation'
shapefile_data.loc[shapefile_data['NAME'] == 'Dem. Rep. Congo', 'NAME'] = 'Congo (Democratic Republic of the)'
shapefile_data.loc[shapefile_data['NAME'] == 'Iran', 'NAME'] = 'Iran (Islamic Republic of)'
shapefile_data.loc[shapefile_data['NAME'] == 'Tanzania', 'NAME'] = 'Tanzania (United Republic of)'
shapefile_data.loc[shapefile_data['NAME'] == 'South Korea', 'NAME'] = 'Korea (Republic of)'
shapefile_data.loc[shapefile_data['NAME'] == 'Venezuela', 'NAME'] = 'Venezuela (Bolivarian Republic of)'
shapefile_data.loc[shapefile_data['NAME'] == 'Bolivia', 'NAME'] = 'Bolivia (Plurinational State of)'
shapefile_data.loc[shapefile_data['NAME'] == 'Venezuela', 'NAME'] = 'Venezuela (Bolivarian Republic of)'
shapefile_data.loc[shapefile_data['NAME'] == 'Laos', 'NAME'] = "Lao People's Democratic Republic"

# Step 5: Merge shapefile data with CSV data using country names
merged_data = shapefile_data.merge(csv_data, left_on='NAME', right_on='Country', how='left')

# Step 6: Remove outliers based on IQR
Q1 = merged_data['Average suicide 2015'].quantile(0.25)
Q3 = merged_data['Average suicide 2015'].quantile(0.90)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
filtered_data = merged_data[
    (merged_data['Average suicide 2015'] >= lower_bound) &
    (merged_data['Average suicide 2015'] <= upper_bound)
]

# Define the function to update the map based on the selected index
def update_map(index):
    if index == 0:
        column_name = 'HDI 2015'
        title = 'HDI 2015 Worldwide'
    elif index == 1:
        column_name = 'Average suicide 2015'
        title = 'Average Suicides/100k Worldwide, 2015'
    
    fig = px.choropleth(
        filtered_data,
        geojson=filtered_data.geometry,
        locations=filtered_data.index,
        color=column_name,
        color_continuous_scale='YlOrRd',
        range_color=(0, 1),  # Update the range from 0 to 1
        projection="natural earth"
    )
    
    fig.update_layout(
        title=title,
        coloraxis_colorbar=dict(
            title=column_name,
            len=0.8,
            thickness=20,
            ypad=0,
            yanchor="top",
            ticks="outside",
            tickvals=[0, 1],  # Update the tick values to 0 and 1
            ticktext=['0', '1']  # Update the tick text labels to '0' and '1'
        ),
        geo=dict(
            showframe=False,
            showcoastlines=False,
            projection_type="natural earth"
        )
    )

    fig.show()

# Use the interact function to create the interactive widget
interact(update_map, index=[('HDI', 0), ('Average Suicides/100k', 1)])

> *Figure 2: The visualization above shows a world map with the HDI and the average suicide rate of countries with available data. It is possible to switch between a HDI and a suicide rate map. The darker shade of red the country is, the higher the HDI or suicide rate is.

In the world map, a clear trend can be observed wherein countries with higher Human Development Index (HDI) tend to exhibit higher suicide rates. This correlation can be attributed to the availability of education and economic opportunities within a country. Regions such as West Europe, North America, and Oceania, which appear darker on the HDI world map, are widely recognized for their advanced educational systems and robust economic opportunities. A higher HDI often indicates access to quality education and favorable economic conditions, which enable individuals to establish meaningful objectives and goals in their lives. Consequently, people in these regions are less likely to experience suicidal thoughts.

Conversely, countries that appear lighter on the HDI map, predominantly in Africa, the Middle East, and Latin America, often exhibit lower levels of economic opportunities and education. The lack of economic prospects and limited access to quality education can hinder individuals in setting goals and finding purpose in their lives. This, in turn, contributes to a higher prevalence of individuals experiencing difficulties in finding meaning and fulfillment, potentially increasing the likelihood of suicidal thoughts.

Therefore, the observed relationship between HDI and suicide rates suggests that the availability of education and economic opportunities plays a crucial role in individuals' well-being and mental health. Enhancing educational systems and promoting economic growth in regions with lower HDI values could potentially contribute to reducing the incidence of suicidal thoughts and improving overall quality of life.

**Overall, the relationship between HDI and suicide rates is complex. There are many other factors that contribute to this relationship, such as cultural or social aspects.**


 # GENDER PERSPECTIVE

This part of the data story, will look at the relationship between the Gender Development Index (GDI), the HDI, and suicide rates. Equility between male and female living conditions is generally considered as a sign of a country's development, and is thus within the scope of our analysis. The Gender Development Index (GDI) stipulates the differences in the HDI for men and women. Contrary to the HDI, the GDI is able to raise above 1.0, which indicates a higher HDI for women, whereas a score below 1.0 means a higher HDI for men. Despite the idea that the GDI supports development, we expect there to be large differences in how they effect male- and female suicides.

## Arguments for the Gender Perspective.

Firstly, the most logical correlation we expect to see is that with an increase of GDI, the female suicides will marginally drop. The explanation for this seems obvious, as with a global history of better living conditions for men, female improvement in quality of life should follow with an increase of a higher GDI, which would lead to a drop in suicides. For men, however, the suicides are expected to increase as, aside from better living conditions for women, an increase of GDI might also mean that living conditions for men are decreasing. Aside from this, with an upcoming of equality, men's traditional roles in society are challenged, which could lead to confusion in purpose in life and thereby more depressive and suicidal thoughts.

Secondly, the increase in HDI is expected to be irrelevant to differences between male- and female suicides, as they improve living conditions for both sexes.

In [None]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots

df = pd.read_csv('databases/DATABASE 4.csv')
df1 = pd.read_csv('databases/IV DATASET 1.csv')
df1 = df1[['Country', 'GDI 2015', 'HDI 2015']]
df = pd.merge(df, df1, on='Country')

df['Sex'] = df['Sex'].str.replace('female', 'F').str.replace('male', 'M')

female_suicides = df[df['Sex'].str.contains('fem', case=False)]
male_suicides = df[df['Sex'].str.contains('male', case=False)]

female_suicides_2015 = female_suicides[['Country', '2015', 'GDI 2015', 'HDI 2015']]
male_suicides_2015 = male_suicides[['Country', '2015', 'GDI 2015', 'HDI 2015']]

# Merge female and male datasets on Country
merged_data = pd.merge(female_suicides_2015, male_suicides_2015, on='Country', suffixes=('_Female', '_Male'))

# Select the relevant columns for HDI, GDI, and suicide rates
data = merged_data[['Country', 'HDI 2015_Female', 'HDI 2015_Male', 'GDI 2015_Female', 'GDI 2015_Male', '2015_Female', '2015_Male']]

# Calculate trendlines for HDI (female and male)
z_hdi_female = np.polyfit(data['HDI 2015_Female'], data['2015_Female'], 1)
p_hdi_female = np.poly1d(z_hdi_female)

z_hdi_male = np.polyfit(data['HDI 2015_Male'], data['2015_Male'], 1)
p_hdi_male = np.poly1d(z_hdi_male)

# Calculate trendlines for GDI (female and male)
z_gdi_female = np.polyfit(data['GDI 2015_Female'], data['2015_Female'], 1)
p_gdi_female = np.poly1d(z_gdi_female)

z_gdi_male = np.polyfit(data['GDI 2015_Male'], data['2015_Male'], 1)
p_gdi_male = np.poly1d(z_gdi_male)

# Create plotly scatter plots for GDI and HDI
fig = make_subplots(rows=1, cols=2, subplot_titles=['Relationship between GDI and Suicide Rate', 'Relationship between HDI and Suicide Rate'])

# Scatter plot for GDI
fig.add_trace(go.Scatter(
    x=data['GDI 2015_Female'],
    y=data['2015_Female'],
    mode='markers',
    marker=dict(color='red'),
    name='Female'
), row=1, col=1)

fig.add_trace(go.Scatter(
    x=data['GDI 2015_Male'],
    y=data['2015_Male'],
    mode='markers',
    marker=dict(color='blue'),
    name='Male'
), row=1, col=1)

# Trendline for GDI
gdi_range = np.linspace(data['GDI 2015_Female'].min(), data['GDI 2015_Female'].max(), 100)
fig.add_trace(go.Scatter(
    x=gdi_range,
    y=p_gdi_female(gdi_range),
    mode='lines',
    line=dict(color='red', dash='dash'),
    showlegend=False
), row=1, col=1)

fig.add_trace(go.Scatter(
    x=gdi_range,
    y=p_gdi_male(gdi_range),
    mode='lines',
    line=dict(color='blue', dash='dash'),
    showlegend=False
), row=1, col=1)

fig.update_xaxes(title_text='GDI 2015', row=1, col=1)
fig.update_yaxes(title_text='Average Suicide Rate (2015)', row=1, col=1)

# Scatter plot for HDI
fig.add_trace(go.Scatter(
    x=data['HDI 2015_Female'],
    y=data['2015_Female'],
    mode='markers',
    marker=dict(color='red'),
    name='Female'
), row=1, col=2)

fig.add_trace(go.Scatter(
    x=data['HDI 2015_Male'],
    y=data['2015_Male'],
    mode='markers',
    marker=dict(color='blue'),
    name='Male'
), row=1, col=2)

# Trendline for HDI
hdi_range = np.linspace(data['HDI 2015_Female'].min(), data['HDI 2015_Female'].max(), 100)
fig.add_trace(go.Scatter(
    x=hdi_range,
    y=p_hdi_female(hdi_range),
    mode='lines',
    line=dict(color='red', dash='dash'),
    showlegend=False
), row=1, col=2)

fig.add_trace(go.Scatter(
    x=hdi_range,
    y=p_hdi_male(hdi_range),
    mode='lines',
    line=dict(color='blue', dash='dash'),
    showlegend=False
), row=1, col=2)

fig.update_xaxes(title_text='HDI 2015', row=1, col=2)
fig.update_yaxes(title_text='Average Suicide Rate (2015)', row=1, col=2)

fig.update_layout(showlegend=True)

# Display the plot
fig.show()


> *Figure 3a: Gender Development Index c.a. 2015 in X-axis. Average suicide rate (2015) per 100k civilians on Y-axis. Blue dots represent male suicides / red dots represent female suicides. Blue (Least Squared) trendline shows a significant positive correlation between GDI and male suicide. Red (Least Squared) trendline shows a (small) negative correlation between GDI and female suicide. 

> *Figure 3b: Human Development Index c.a. 2015 in X-axis. Average suicide rate (2015) per 100k civilians on Y-axis. Blue dots represent male suicides / red dots represent female suicides. Blue (Least Squared) trendline shows a (very small) positive correlation between HDI and male suicide. Red (Least Squared) trendline shows a (small) negative correlation between HDI and female suicide. 

Figure 3a clearly shows the expected decrease in female suicide in the red trendline (8.5 suicides/100k on 0.6 GDI - 5.3/100k suicides on 1.03 GDI). But an even stronger correlation is shown between GDI and male suicides. The blue trendline, which starts at a mere 4.8/100k suicides at a GDI of 6, goes all the way up to 18.1/100k suicides at a GDI of 1.03. This is more than three times as much, and therefore clearly supports the idea that an increase of GDI indicates an increase in male suicide.

The female trendlien in Figure 3b slightly deviates from our hypothesis and shows that an increase in HDI corresponds with a decrease in female suicide, starting from 8.37 suicides/100k (0.35 HDI) and ending at 3.2 suicides/100k (0.94 HDI). For men, however, the trendline stays relitively flat, and thereby stipulates no correlation. It therefore seems that an increase in HDI mostly improves female living conditions, while leaving male living conditions unchanged.

In [None]:
import pandas as pd
import plotly.graph_objects as go

# Read the dataset
males = pd.read_csv('databases/IV DATASET 3.csv')
females = pd.read_csv('databases/IV DATASET 2.csv')

males['HDI Classification'] = males['HDI 2015'].apply(lambda x: 'HD' if x >= 0.85 else 'LD')
females['HDI Classification'] = females['HDI 2015'].apply(lambda x: 'HD' if x >= 0.85 else 'LD')
males['GDI Classification'] = males['GDI 2015'].apply(lambda x: 'HD' if x >= 0.97 else 'LD')
females['GDI Classification'] = females['GDI 2015'].apply(lambda x: 'HD' if x >= 0.97 else 'LD')

# Group the data by HDI and GDI classifications for males
hh_df_m = males[(males['HDI Classification'] == 'HD') & (males['GDI Classification'] == 'HD')]
hl_df_m = males[(males['HDI Classification'] == 'HD') & (males['GDI Classification'] == 'LD')]
lh_df_m = males[(males['HDI Classification'] == 'LD') & (males['GDI Classification'] == 'HD')]
ll_df_m = males[(males['HDI Classification'] == 'LD') & (males['GDI Classification'] == 'LD')]

# Group the data by HDI and GDI classifications for females
hh_df_f = females[(females['HDI Classification'] == 'HD') & (females['GDI Classification'] == 'HD')]
hl_df_f = females[(females['HDI Classification'] == 'HD') & (females['GDI Classification'] == 'LD')]
lh_df_f = females[(females['HDI Classification'] == 'LD') & (females['GDI Classification'] == 'HD')]
ll_df_f = females[(females['HDI Classification'] == 'LD') & (females['GDI Classification'] == 'LD')]

# Extract the necessary columns for plotting
years = ['2000', '2010', '2015']
male_hh = hh_df_m[years].mean()
male_hl = hl_df_m[years].mean()
male_lh = lh_df_m[years].mean()
male_ll = ll_df_m[years].mean()
female_hh = hh_df_f[years].mean()
female_hl = hl_df_f[years].mean()
female_lh = lh_df_f[years].mean()
female_ll = ll_df_f[years].mean()

# Define color and marker styles
colors = ['dodgerblue', 'salmon', 'limegreen', 'purple', 'blue', 'red', 'green', 'orange']
markers = ['circle', 'square', 'diamond', 'triangle-up', 'circle', 'square', 'diamond', 'triangle-up']
labels = ['Male HD-HD', 'Male HD-LD', 'Male LD-HD', 'Male LD-LD', 'Female HD-HD', 'Female HD-LD', 'Female LD-HD', 'Female LD-LD']

# Create the line plot
fig = go.Figure()

# Add traces for each category
for i, data in enumerate([male_hh, male_hl, male_lh, male_ll, female_hh, female_hl, female_lh, female_ll]):
    fig.add_trace(go.Scatter(
        x=years,
        y=data,
        mode='lines+markers',
        name=labels[i],
        marker=dict(color=colors[i], symbol=markers[i], size=8),
        line=dict(width=2)
    ))

# Set plot title and labels
fig.update_layout(
    title='Suicide Development for Male and Female for HD and LD countries',
    xaxis=dict(title='Year'),
    yaxis=dict(title='Suicide Rates'),
    legend=dict(font=dict(size=7), orientation='h', yanchor='top', xanchor='right', x=1, y=1),
    showlegend=True,
    template='plotly_white'
)

# Display the plot
fig.show()



> *Figure 4: Years from 2000 - 2015 on X-axis. Average suicide rate per 100k civilians on Y-axis. Figure shows the development of male- and female suicides throughout the year. Plots are represented in this format: Sex - HDI Development status - GDI Development status. HD stands for Highly Developed countries (>0.85 for HDI, 0.97 for GDI), LD stands for Lowly Developed countries (<= 0.85 for HDI, <= 0.97 for GDI). 

In this figure, countries are seperated into four categories, which separates them based on their development in HDI and GDI (see caption). The figure mostly proves our previous remarks about how countries with a high GDI tend to have an increase in male suicide. As can be seen in the figure, countries labelled as LD (low development in HDI) HD (high development in GDI) signficantly contribute to the most male suicides. The second highest male suicides originate from countries labelled as HD HD (again a high development in GDI). For women, the categories mostly show no significant difference in suicides, however low development in GDI does lead to slightly less female suicide on average (more evident when zoomed in). Most importantly of all, the figure shows that this phenomenon is not merely a 'yearly fluke', but that it has been the same throughout previous years (2000 and 2010). Ultimately, this figure again suggest that a higher GDI will lead to an increase of male suicide and a slight decrease of female suicide.

# POLITICAL PERSPECTIVE

This final of the data story, will analyse the relationship between indices of political development (democracy and political violence), and suicide. Democracy is often thought to be a sign of a country's development, leaving behind the ancient policies of tyranny or other unevenly balanced power relations. However, many countries follow under non-democratic policies, which lead to significanly different living conditions that influence mental health. The Political Violence Score (negative for peacefull - positive for violent) is another relevant indicator of living conditions, as more political violence is related to more war and oppression (e.g. enslavement, denial of human rights), which in order hold their own obvious influences on mental health. Therefore, this perspective is another interesting addition to this data story. Despite the common thought that democracy is a result of cultural and political improvement, this perspective will disprove that thought by showing how democratic countries tend to have more suicides on average, and even less HDI on average. 

## ARGUMENTS FOR THE POLITICAL PERSPECTIVE

Although the relationship between happiness and civilian political involvement (in terms of democracy) is a highly philosophical one, the relation between political violence and happiness is a strongly evident one (Skywood Recovery, 2018). As our data will make clear, perhaps surprising to some, democracies tend to have a higher political violence score, which through common logic decreases the average living conditions. This circumstance might be explained with the idea that democracies, as politically complex entities, engage in intense political competition which can often lead to violent conflicts as rival groups resort to coercion or violence to achieve their goals.

In [None]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go

# Read the data into a DataFrame
df = pd.read_csv('databases/MAIN DATASET.csv')
df = df.drop('Unnamed: 0', axis=1)

# Extract relevant columns
government_types = df['government']
hdi_values = df['HDI 2015']
democracy_status = df['democracy']
political_violence = df['political_violence']

# Grouping average HDI and political violence values by government type and democracy status
government_avg_hdi = {}
government_avg_political_violence = {}
government_democracy = {}
for i in range(len(government_types)):
    government_type = government_types[i]
    hdi_value = hdi_values[i]
    is_democracy = democracy_status[i]
    violence_value = political_violence[i]
    
    if government_type not in government_avg_hdi:
        government_avg_hdi[government_type] = []
    government_avg_hdi[government_type].append(hdi_value)
    
    if government_type not in government_avg_political_violence:
        government_avg_political_violence[government_type] = []
    government_avg_political_violence[government_type].append(violence_value)

    government_democracy[government_type] = is_democracy

# Calculate average HDI and political violence values for each government type
avg_hdi_values = [np.mean(government_avg_hdi[gt]) for gt in government_avg_hdi]
avg_political_violence_values = [np.mean(government_avg_political_violence[gt]) for gt in government_avg_political_violence]

# Sort the bars based on the average political violence values in ascending order
sorted_indices = np.argsort(avg_political_violence_values)
sorted_avg_hdi_values = [avg_hdi_values[i] for i in sorted_indices]
sorted_avg_political_violence_values = [avg_political_violence_values[i] for i in sorted_indices]
sorted_government_types = [list(government_avg_hdi.keys())[i] for i in sorted_indices]

# Prepare data for plotting
x = np.arange(len(sorted_avg_hdi_values))

# Create the bar trace
bar_trace = go.Bar(
    x=x,
    y=sorted_avg_hdi_values,
    name='Average HDI',
    marker=dict(color='blue')
)

# Create the text annotations for democracy status and political violence
text_annotations = []
for i, gov_type in enumerate(sorted_government_types):
    is_democracy = government_democracy[gov_type]
    if is_democracy == 1.0:
        text_annotations.append(dict(
            x=i,
            y=-0.1,
            text='D',
            showarrow=False,
            font=dict(color='green')
        ))
    else:
        text_annotations.append(dict(
            x=i,
            y=-0.1,
            text='ND',
            showarrow=False,
            font=dict(color='red')
        ))
    text_annotations.append(dict(
        x=i,
        y=-0.15,
        text=f'{round(sorted_avg_political_violence_values[i], 1)}',
        showarrow=False,
        font=dict(color='black')
    ))

# Create the trendline trace
trendline_trace = go.Scatter(
    x=x,
    y=np.poly1d(np.polyfit(x, sorted_avg_hdi_values, 1))(x),
    mode='lines',
    name='Trendline',
    line=dict(color='red', dash='dash')
)

# Set layout
layout = go.Layout(
    title='Average HDI by Government / Political Violence',
    xaxis=dict(
        tickvals=x,
        ticktext=sorted_government_types,
        tickangle=-45,
        tickfont=dict(size=10),
        showticklabels=True
    ),
    yaxis=dict(title='Average HDI'),
    annotations=text_annotations,
    showlegend=True,
    legend=dict(font=dict(size=10), x=1, y=1, bgcolor='rgba(0, 0, 0, 0)'),
    template='plotly_white'
)

# Create the figure
fig = go.Figure(data=[bar_trace, trendline_trace], layout=layout)

# Display the plot
fig.show()


> *Figure 4: Bar chart for 14 different government types. Government types on X-axis, together with average political violence score (the higher the score, the more politically violent the government type is), Democracy indice (D for Democracy, ND for Non-Democracy). Average HDI on Y-axis. Government types are sorted left to right based on political violence score (low to high). The trendline proves a negative correlation between political violence and HDI (lower political violence - higher HDI). 

Figure 4 shows how democratic and non-democratic entities perform on HDI and political violence. Despite having 'decent' HDI scores of 0.8 (Parliamentiary Democracy) and 0.6 (Presidential Democracy), figure 4 shows that many other non-democratic governments are able to score high on HDI, and low on political violence. Aside from that, both democratic governments have positive political violence scores, which means that they are relatively politically violent. The idea that democracy is the only policy capable of good living conditions is hereby contradicted. Aside from this, figure 4 proves that a high HDI tends to be afflicted with a low political violence score.

In [None]:
import pandas as pd
import plotly.graph_objects as go

# Read the dataset
df = pd.read_csv('databases/MAIN DATASET.csv')
df = df.drop('Unnamed: 0', axis=1)

# Filter the relevant columns
df_democracy = df[df['government'].str.contains('Democracy')]
df_non_democracy = df[~df['government'].str.contains('Democracy')]

# Calculate mean values for each category
categories = ['Average Suicide Rate', 'Political Violence']
democracy_data = [df_democracy['Average suicide 2015'].mean(), df_democracy['political_violence'].mean()]
non_democracy_data = [df_non_democracy['Average suicide 2015'].mean(), df_non_democracy['political_violence'].mean()]

# Set up colors and styles
bar_colors = ['#008FD5', '#FF2700']
bar_width = 0.35
bar_positions = list(range(len(categories)))

# Create bar traces
democracy_trace = go.Bar(
    x=categories,
    y=democracy_data,
    name='Democracy',
    marker=dict(color=bar_colors[0]),
    width=bar_width,
    opacity=0.8,
    showlegend=True
)

non_democracy_trace = go.Bar(
    x=categories,
    y=non_democracy_data,
    name='Non-Democracy',
    marker=dict(color=bar_colors[1]),
    width=bar_width,
    opacity=0.8,
    showlegend=True
)

# Create layout
layout = go.Layout(
    title='Comparison of Democracy and Non-Democracy',
    xaxis=dict(title='Categories'),
    yaxis=dict(title='Mean Value'),
    barmode='group',
    showlegend=True,
    legend=dict(x=1, y=1, bgcolor='rgba(0, 0, 0, 0)'),
    template='plotly_white'
)

# Create the figure
fig = go.Figure(data=[democracy_trace, non_democracy_trace], layout=layout)

# Display the plot
fig.show()


> *Figure 5: Bar chart for Democratic and Non-Democratic countries. Three categories on X-axis: Average Suicide per 100k civilians - Average Political Violence - Average HDI. Y-axis represents a scale for those three categories. Figure shows that on average democratic countries have more suicides, more political violence, despite having a higher HDI. 96 democratic instances / 44 non_demoncratic instances.

Figure 5 proves that democracies tend to have more suicides than non_democracies (11.2/100k opposed to 10.3). This is not a very significant increase in suicides, but it most importantly shows that democracitic policies are not directly in line with better living conditions, or at least as measured through suicides. Aside from this, figure  also shows that democracies are indead more politically violent then non democracies.

## Reflection

On the draft version of the data story, we had received some feedback. 
- At first the code did not run, because we did not import the proper libraries. We quickly solved this by putting the imports and installs at the top of our notebook.
- The first plot was not meaningful; it did not explain anything in the blink of an eye. We decided to remove the plot as a whole, and replaced it with a more meaningful graph.
- The code-inputs were not hidden. We managed to solve this issue with some help from Teaching Assistants during phsyical classes.
- Our Point of Views were not clear, the plots did not match one certain PoV, and the data story was not a smooth story, but it was rather unmethodically placed parts of information. To resolve this, we took a good look at our perspectives, the graphs that belong to those perspectives, and then we tried connect all pieces of loose information together into one overarching data story.

During the peer feedback, we had received tips from the other two groups.
- The graphs were hard to read, and some did not make a lot of sense. They gave examples on what would be a better and clearer understanding graph. We made sure to apply this to our graphs by adding colours, better names for the axes, which, together with the newly added captions should result in more visually appealing visualizations.
- The combination of using the Gender Development Index (GDI) and the Gender Inequality Index (GII) made the graphs unclear whether it was a good thing that the values in the graphs raised, or a bad thing. We had therefore decided to remove GII as a whole, and only discuss GDI in the Gender Perspective. This should make are data story less confusing as a whole, and more 'simple yet meaningfull'
- The data story did not feel like a story, but rather parts put together.

We examined the feedback provided and worked on the points. Some graphs we have excluded and others we adjusted for more clarity. We worked on making the perspectives not be a separate part of the story, but more merged as a whole story.

## Work Distribution

For our project, we divided the workload among the three of us to ensure an even distribution of responsibilities. Since we do not have a fourth member in our group, we tried to divide the workload evenly. Each team member contributed to different sections of the project as follows:

- Introduction: Developed by Julian
   - Julian took the responsibility of crafting a comprehensive and engaging introduction section for the project. 
   
- Preprocessing: Managed by Lloyd
   - Lloyd played a key role in preprocessing the project data, cleaning and transforming it to make it suitable for analysis. He implemented data cleaning techniques, handled missing values, and ensured data integrity.

- Perspectives: Divided between Julian and Lloyd
    - Julian had worked on the first perspective, while Lloyd worked on the other two perspectives. Mitch helped with some of the visualizations like the choropleth map. 

- Reflection, Work Distribution & Appendix: Handled by Mitch 
    - Mitch took the lead in crafting the reflection section, providing thoughtful insights, and analyzing the outcomes and lessons learned from the project. He also took charge of the work distribution process and the appendix.

By splitting the workload in this manner, we aimed to leverage each team member's strengths and ensure a balanced contribution from everyone involved. This approach allowed us to efficiently complete the project while maintaining consistency and quality across all sections.

## References

# Databases

- Global Suicide Data.
https://www.kaggle.com/datasets/twinkle0705/mental-health-and-suicide-rates?select=Age-standardized+suicide+rates.csv

- Global Suicide Data 2000-2019
https://www.kaggle.com/datasets/sandragracenelson/suicide-rate-of-countries-per-every-year

- Human Development Report 2015.
https://www.kaggle.com/datasets/undp/human-development?select=human_development.csv

- Government types of the world.
https://www.kaggle.com/datasets/janzasadny/rulers-elections-and-irregular-governance

# Citations
- Recovery, S. (2018, November 5). How Oppression Contributes to Depression and Substance Use. Skywood Recovery. https://skywoodrecovery.com/resources/how-oppression-contributes-to-depression-and-substance-use/

- Wise, Marilyn, and Peter Sainsbury. "Democracy: the forgotten determinant of mental health." Health promotion journal of Australia 18.3 (2007): 177-183.


## Appendix

Generative AI (ChatGPT with GPT 3.5) is used to facilitate the creation of this document, as shown in the table below.

| Reasons of Usage | In which parts? | Which prompts were used? |
| ------------------------ | --------------------------------- | -------------------------------------------- |
| Brainstorming multiple perspectives | The entire project framing | "Give examples of perspectives about HDI, GDI and suicide rates per country" |
| Graph ideas | Visualizations | "What kind of graphs are useful for the following keywords."|
| Graph generating | Visualizations | "How can i use this graph to give a clear understanding about the following subject." |
| Improving code | Visualizations and Preprocessing | "Make this code more efficient without losing important information." |

> *Table 1: Usage of generative AI to facilitate the creation of this document.*