# The relationship between the Human Development Index and the prevalance of mental health disorders

## Research question

The aim of this analysis is to investigate the relationship between country-level socioeconomic status and the prevalence of mental health disorders. Studies have suggested that mental health is worse is relatively highly developed countries [(Barbalat & Franck, 2020)](https://bmjopen.bmj.com/content/bmjopen/10/4/e035055.full.pdf). A country's level of human development can be measured using the humand development index (HDI).

The HDI is a metric designed to quantify a country's socieeconomic status by using key dimensions of human development. The index is calculated using four key metrics: (1) life expectancy at birth - to asses a long and healthy life, (2) expected years of schooling - to asses access to knowledge of the younger generation, (3) average years of schooling - to assess access to knowledge of the older generation and (4) gross national income (GNI) per capita - to asses the standard of living. Each of these metrics is normalized to an index value and aggregated into the HDI using the following formula [(2020 HDR Technical Notes)](https://hdr.undp.org/system/files/documents/technical-notes-calculating-human-development-indices.pdf):

$HDI = (I_{Health} * I_{Education} * I_{Income})^\frac{1}{3}$

The HDI dataset contains HDI values from 1990 to 2021 The 'Global Trends in Mental Health Disorder' file contains the prevalence expressed in percentages for schizophrenia, bipolar disorder, eating disorders, anxiety disorders, drug use disorders, depression and alcohol use disorders from 1990 to 2017. 

### Data sources

The [Human Development Index](https://ourworldindata.org/human-development-index) was retrieved from Our World in Data (Roser, 2014) and originally published by the United Nations Development Programme. 

The [Mental Health dataset](https://www.kaggle.com/datasets/thedevastator/uncover-global-trends-in-mental-health-disorder?resource=download) was retrieved from kaggle by the author Amit. 



### Loading libraries

In [26]:
# Loading needed libraries
import numpy as np
import pandas as pd
import yaml
import matplotlib.pyplot as plt
import seaborn as sns

### Defining functions

In [5]:
def data_inspection(df):
    
    print(df.head(), '\n')
    print(f'Dataset contains {df.shape[0]} rows and {df.shape[1]} columns. \n')
    print(f'Datatypes: \n {df.dtypes}, \n')
    print(f'Missing data per column: \n{df.isnull().sum()}')

### Data loading

In [2]:
# Getting data location from configuration file
def get_config():
    with open("config.yaml", 'r') as stream:
        config = yaml.safe_load(stream)
    return config

config = get_config()

# Loading datasets as pd dataframes
# Human development index data
d1 = (config['dataset1'])
hdi = pd.read_csv(d1)
# mental health disorder data
d2 = (config['dataset2'])
mental_disorders = pd.read_csv(d2)

  mental_disorders = pd.read_csv(d2)


### Data inspection 

##### Mental health dataset

In [8]:
# Inspecting dataframe, datatypes, column and row counts and missing values
data_inspection(mental_disorders)

   index       Entity Code  Year Schizophrenia (%) Bipolar disorder (%)  \
0      0  Afghanistan  AFG  1990           0.16056             0.697779   
1      1  Afghanistan  AFG  1991          0.160312             0.697961   
2      2  Afghanistan  AFG  1992          0.160135             0.698107   
3      3  Afghanistan  AFG  1993          0.160037             0.698257   
4      4  Afghanistan  AFG  1994          0.160022             0.698469   

  Eating disorders (%)  Anxiety disorders (%)  Drug use disorders (%)  \
0             0.101855               4.828830                1.677082   
1             0.099313               4.829740                1.684746   
2             0.096692               4.831108                1.694334   
3             0.094336               4.830864                1.705320   
4             0.092439               4.829423                1.716069   

   Depression (%)  Alcohol use disorders (%)  
0        4.071831                   0.672404  
1        4.07953

This dataset contains a lot of missing data. This needs to be filtered out and checked if remaining data is still sufficient and if remaining countries correspond to the HDI dataset. Some of the datatypes need to be converted from object to float. Column names can be shortened for convenience. 

In [21]:
# Checking number of countries
countries = len(mental_disorders['Entity'].unique())
print(f'Number of countries: {countries}')

# Checking time span 
years = len(mental_disorders['Year'].unique())
print(f'Number of years: {years} \n')
print(mental_disorders['Year'].unique())

Number of countries: 276
Number of years: 259 

['1990' '1991' '1992' '1993' '1994' '1995' '1996' '1997' '1998' '1999'
 '2000' '2001' '2002' '2003' '2004' '2005' '2006' '2007' '2008' '2009'
 '2010' '2011' '2012' '2013' '2014' '2015' '2016' '2017' 'Year' '1800'
 '1801' '1802' '1803' '1804' '1805' '1806' '1807' '1808' '1809' '1810'
 '1811' '1812' '1813' '1814' '1815' '1816' '1817' '1818' '1819' '1820'
 '1821' '1822' '1823' '1824' '1825' '1826' '1827' '1828' '1829' '1830'
 '1831' '1832' '1833' '1834' '1835' '1836' '1837' '1838' '1839' '1840'
 '1841' '1842' '1843' '1844' '1845' '1846' '1847' '1848' '1849' '1850'
 '1851' '1852' '1853' '1854' '1855' '1856' '1857' '1858' '1859' '1860'
 '1861' '1862' '1863' '1864' '1865' '1866' '1867' '1868' '1869' '1870'
 '1871' '1872' '1873' '1874' '1875' '1876' '1877' '1878' '1879' '1880'
 '1881' '1882' '1883' '1884' '1885' '1886' '1887' '1888' '1889' '1890'
 '1891' '1892' '1893' '1894' '1895' '1896' '1897' '1898' '1899' '1900'
 '1901' '1902' '1903' '1904' 

The dataset needs to be filtered to only contain data from the years 1990 to 2017. It currently contains years that go too far back (including prehistorical data).

In [7]:
data_inspection(hdi)

        Entity Code  Year  Human Development Index
0  Afghanistan  AFG  1990                    0.273
1  Afghanistan  AFG  1991                    0.279
2  Afghanistan  AFG  1992                    0.287
3  Afghanistan  AFG  1993                    0.297
4  Afghanistan  AFG  1994                    0.292 

Dataset contains 5923 rows and 4 columns. 

Datatypes: 
 Entity                      object
Code                        object
Year                         int64
Human Development Index    float64
dtype: object, 

Missing data per column: 
Entity                       0
Code                       320
Year                         0
Human Development Index      0
dtype: int64


### Data exploration

In [71]:
# Data distribution

### Data cleaning

In [63]:
def reformat_mh():
    # Drop index columns
    df = mental_disorders.drop('index', axis='columns')
    # Rename columns
    df = df.rename(columns = {'Schizophrenia (%)':'Schizophrenia',
    'Bipolar disorder (%)':'BPD', 
    'Eating disorders (%)':'ED',
    'Anxiety disorders (%)':'Anxiety',
    'Drug use disorders (%)':'Drugs',
    'Depression (%)':'Depression',
    'Alcohol use disorders (%)':'Alcohol'
    })
    # Drop missing values
    df = df.dropna()
    # Changing datatypes
    df['Year'] = df['Year'].astype(int)
    df['Schizophrenia'] = df['Schizophrenia'].astype(float)
    df['BPD'] = df['BPD'].astype(float)
    df['ED'] = df['ED'].astype(float)
    
    return df
    
    

In [67]:
mh_clean = reformat_mh()
mh_clean

Unnamed: 0,Entity,Code,Year,Schizophrenia,BPD,ED,Anxiety,Drugs,Depression,Alcohol
0,Afghanistan,AFG,1990,0.160560,0.697779,0.101855,4.828830,1.677082,4.071831,0.672404
1,Afghanistan,AFG,1991,0.160312,0.697961,0.099313,4.829740,1.684746,4.079531,0.671768
2,Afghanistan,AFG,1992,0.160135,0.698107,0.096692,4.831108,1.694334,4.088358,0.670644
3,Afghanistan,AFG,1993,0.160037,0.698257,0.094336,4.830864,1.705320,4.096190,0.669738
4,Afghanistan,AFG,1994,0.160022,0.698469,0.092439,4.829423,1.716069,4.099582,0.669260
...,...,...,...,...,...,...,...,...,...,...
6463,Zimbabwe,ZWE,2013,0.155670,0.607993,0.117248,3.090168,0.766280,3.128192,1.515641
6464,Zimbabwe,ZWE,2014,0.155993,0.608610,0.118073,3.093964,0.768914,3.140290,1.515470
6465,Zimbabwe,ZWE,2015,0.156465,0.609363,0.119470,3.098687,0.771802,3.155710,1.514751
6466,Zimbabwe,ZWE,2016,0.157111,0.610234,0.121456,3.104294,0.772275,3.174134,1.513269


### Data wrangling

### Normalization

Both data sets are normalized

### Statistical analysis

### Data visualisation

#### References
Barbalat, G., & Franck, N. (2020). Ecological study of the association between mental illness with human development, income inequalities and unemployment across OECD countries. BMJ open, 10(4), e035055.

Max Roser (2014) - "Human Development Index (HDI)". Published online at OurWorldInData.org. Retrieved from: 'https://ourworldindata.org/human-development-index' [Online Resource]*