## The Effects of Population and Environmental Metrics on Worldwide Mental Health 

### Team Members
Humaira Nasir (humairan)

Kate Wasmer (kwasmer)

### Overview 

Worldwide anthropological data is integral to understanding the societal differences that impact a country's quality of life. These types of demographical analyses influence public policy and provide ideas on how to initiate reforms that benefit the general populace. That being said, simply looking at basic data points associated with a population can only tell a small part of the story. By adding in more variables from different data sets, we can discover how these factors impact the emotional well-being of a country's population.  

Mental health is a contentious topic that has not been discussed openly until the past few decades.  Despite its stigma, it is a vital aspect of global health that has multiple implications on the inner workings of a society. To understand the big picture surrounding global mental health, it is useful to analyze population data over time. This allows us to consider what potential world events could have had an impact on the human psyche. We can also see how the timely socioeconomic and environmental factors affect a country's overall mental health. Research in this field is crucial for implementing data-driven decisions that address the needs of the people. 

### Motivation 

We wanted to investigate a topic that combined aspects of our unique research interests. Humaira comes from a cognitive science background and is drawn to data sets that revolve around decision-making in some capacity. Meanwhile, Kate is interested in anthropology and computational population genetics. This incenitivzed the two of us to investigate worldwide data sets that included one or more of these components. 

Question 1:
    
What is the overall relationship between happiness and the prevalence of certain mental disorders in a given population? Furthermore, how does this relationship evolve over time, and what factors could contribute to these changes? Answering this question enables us to . We can also see if certain mental disorders trend with happiness score more than others.
    
Question 2:
    
Which environmental factors correlate with certain mental health disorders? In particular, we would like to look at those factors that impact quality of life (e.g, personal freedom, corruption, access to food, etc.)? Environment and genetics play a large part in the development of many mental health disorders. However, the relationship between environment, genetics, and a mental health disorder is not the same for every type of disorder. By exploring this question we may be able to see which disorders are more correlated with enviromental factors.

Question 3:
    
How do factors related to community mediate the percentage of a population diagnosed with a specific psychiatric disorder? Support systems are crucial in mental health treatment. We hope to learn more about the relationship between nationwide social supports and the prevalence of mental health issue found in the population.

### Data Sources 

[World Happiness Report](https://www.kaggle.com/datasets/ajaypalsinghlo/world-happiness-report-2021?select=world-happiness-report.csv)

The World Happiness Report is based on a yearly global poll where particpants from multiple countries describe and quantify their lives by certain measures related to happiness.

[Global Trends in Mental Health Disorders](https://www.kaggle.com/datasets/thedevastator/uncover-global-trends-in-mental-health-disorder)

The Global Trends in Mental Health Disorder dataset is a collection of the percentage of the population of a country that is diagnosed with a certain mental health disorder (including depression, anxiety, and eating disorders) by year.

[Global Hunger Index](https://www.kaggle.com/code/sasakitetsuya/global-hunger-index-analysis/input?select=global-hunger-index.csv)

The Global Hunger Index shows where a country falls when looking at malnourishment at a local and national level. 

Overall, the objective of the project is to compare quality of life measures and mental health. These data sources provide excellent insight into the environmental factors that influence mental health diagnosis on a global scale. The world happiness report and global hunger index data sets give us a broad understanding of the general culture and circumstances of a country. By doing a deeper analysis, we can better understand how citizens perceive their environment while also obtaining unbiased metrics. By comparing these data points to the percentage of mental health disorders, we can see how society as a whole can influence an indivdiual's health. 

### Data Description

The three datasets in our analysis were extracted from Kaggle and in CSV. The "Global Hunger Index" dataset has 471 rows and 5 columns. The mental health data set has 108,553 rows and 11 columns and the World Happiness Report has 1949 rows and 11 columns. The variables that are of most interest from each dataset varies. In regards to the Global Hunger Index, the columns that hold the most interest are, "Year", the year for which the hunger index was calculated, "Global Hunger Index (2021)", the number which corresponds to undernourshiment where the higher the value the more severe hunger is, and "Entity", the country the data was collected from. The variables of interest in the Mental Health dataset are "Entity", "Year", and each column which corresponds to the percentage of the population diagnosed with each type of mental health disorder. Finally, the variables of interest for the Happines Report are "Country name","Year" the year of the report, "Life Ladder", which is a score from 1-10 with how happy individual are on average with their life, "Social Support", which measures the average amount of support an individual has from those around them, "Generosity", average frequency indviduals make charitable contributions as well as "Freedom to make life choices". 

As shown in the code below, the Global Hunger Index does not have any null values outside of the annotation columns which does not have any relevant data for this analysis. However, both the Happiness Report and Mental Health datasets have cases where the columns have a degree of missing values. The column in the world happiness dataset that has the most mssing values is perception of corruption with 110 missing values. In the mental health dataset the columns asscoated wiith anxiety, drug use disorders, and depression have the most missing values at 102,085.


In [33]:
import numpy as np 
import pandas as pd 
import seaborn as sns
import matplotlib as plt

In [34]:
global_hunger = pd.read_csv("global-hunger-index.csv")
global_hunger.head()

Unnamed: 0,Entity,Code,Year,Global Hunger Index (2021),411773-annotations
0,Afghanistan,AFG,2000,50.9,
1,Afghanistan,AFG,2006,42.7,
2,Afghanistan,AFG,2012,34.3,
3,Afghanistan,AFG,2021,28.3,
4,Albania,ALB,2000,20.7,


In [35]:
mental_health = pd.read_csv("worldwide_mh_data.csv", low_memory=False)
mental_health.head()

Unnamed: 0,index,Entity,Code,Year,Schizophrenia (%),Bipolar disorder (%),Eating disorders (%),Anxiety disorders (%),Drug use disorders (%),Depression (%),Alcohol use disorders (%)
0,0,Afghanistan,AFG,1990,0.16056,0.697779,0.101855,4.82883,1.677082,4.071831,0.672404
1,1,Afghanistan,AFG,1991,0.160312,0.697961,0.099313,4.82974,1.684746,4.079531,0.671768
2,2,Afghanistan,AFG,1992,0.160135,0.698107,0.096692,4.831108,1.694334,4.088358,0.670644
3,3,Afghanistan,AFG,1993,0.160037,0.698257,0.094336,4.830864,1.70532,4.09619,0.669738
4,4,Afghanistan,AFG,1994,0.160022,0.698469,0.092439,4.829423,1.716069,4.099582,0.66926


In [36]:
world_happiness = pd.read_csv("world-happiness-report.csv")
world_happiness.head()

Unnamed: 0,Country name,year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect
0,Afghanistan,2008,3.724,7.37,0.451,50.8,0.718,0.168,0.882,0.518,0.258
1,Afghanistan,2009,4.402,7.54,0.552,51.2,0.679,0.19,0.85,0.584,0.237
2,Afghanistan,2010,4.758,7.647,0.539,51.6,0.6,0.121,0.707,0.618,0.275
3,Afghanistan,2011,3.832,7.62,0.521,51.92,0.496,0.162,0.731,0.611,0.267
4,Afghanistan,2012,3.783,7.705,0.521,52.24,0.531,0.236,0.776,0.71,0.268


In [37]:
print(world_happiness.columns)
print(mental_health.columns)
print(global_hunger.columns)

Index(['Country name', 'year', 'Life Ladder', 'Log GDP per capita',
       'Social support', 'Healthy life expectancy at birth',
       'Freedom to make life choices', 'Generosity',
       'Perceptions of corruption', 'Positive affect', 'Negative affect'],
      dtype='object')
Index(['index', 'Entity', 'Code', 'Year', 'Schizophrenia (%)',
       'Bipolar disorder (%)', 'Eating disorders (%)', 'Anxiety disorders (%)',
       'Drug use disorders (%)', 'Depression (%)',
       'Alcohol use disorders (%)'],
      dtype='object')
Index(['Entity', 'Code', 'Year', 'Global Hunger Index (2021)',
       '411773-annotations'],
      dtype='object')


In [38]:
print("Shape of Data Sources")
print("Global hunger dataset dimensions: " + str(global_hunger.shape))
print("Mental health dataset dimensions: " + str(mental_health.shape))
print("World happiness dataset dimensions: " + str(world_happiness.shape))

Shape of Data Sources
Global hunger dataset dimensions: (471, 5)
Mental health dataset dimensions: (108553, 11)
World happiness dataset dimensions: (1949, 11)


In [39]:
print(global_hunger.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 471 entries, 0 to 470
Data columns (total 5 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Entity                      471 non-null    object 
 1   Code                        471 non-null    object 
 2   Year                        471 non-null    int64  
 3   Global Hunger Index (2021)  471 non-null    float64
 4   411773-annotations          12 non-null     object 
dtypes: float64(1), int64(1), object(3)
memory usage: 18.5+ KB
None


In [40]:
print(mental_health.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 108553 entries, 0 to 108552
Data columns (total 11 columns):
 #   Column                     Non-Null Count   Dtype  
---  ------                     --------------   -----  
 0   index                      108553 non-null  int64  
 1   Entity                     108553 non-null  object 
 2   Code                       103141 non-null  object 
 3   Year                       108553 non-null  object 
 4   Schizophrenia (%)          25875 non-null   object 
 5   Bipolar disorder (%)       19406 non-null   object 
 6   Eating disorders (%)       100236 non-null  object 
 7   Anxiety disorders (%)      6468 non-null    float64
 8   Drug use disorders (%)     6468 non-null    float64
 9   Depression (%)             6468 non-null    float64
 10  Alcohol use disorders (%)  6468 non-null    float64
dtypes: float64(4), int64(1), object(6)
memory usage: 9.1+ MB
None


In [41]:
print(world_happiness.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1949 entries, 0 to 1948
Data columns (total 11 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Country name                      1949 non-null   object 
 1   year                              1949 non-null   int64  
 2   Life Ladder                       1949 non-null   float64
 3   Log GDP per capita                1913 non-null   float64
 4   Social support                    1936 non-null   float64
 5   Healthy life expectancy at birth  1894 non-null   float64
 6   Freedom to make life choices      1917 non-null   float64
 7   Generosity                        1860 non-null   float64
 8   Perceptions of corruption         1839 non-null   float64
 9   Positive affect                   1927 non-null   float64
 10  Negative affect                   1933 non-null   float64
dtypes: float64(9), int64(1), object(1)
memory usage: 167.6+ KB
None


In [42]:
global_hunger.isnull().sum()

Entity                          0
Code                            0
Year                            0
Global Hunger Index (2021)      0
411773-annotations            459
dtype: int64

In [43]:
world_happiness.isnull().sum()

Country name                          0
year                                  0
Life Ladder                           0
Log GDP per capita                   36
Social support                       13
Healthy life expectancy at birth     55
Freedom to make life choices         32
Generosity                           89
Perceptions of corruption           110
Positive affect                      22
Negative affect                      16
dtype: int64

In [44]:
mental_health.isnull().sum()

index                             0
Entity                            0
Code                           5412
Year                              0
Schizophrenia (%)             82678
Bipolar disorder (%)          89147
Eating disorders (%)           8317
Anxiety disorders (%)        102085
Drug use disorders (%)       102085
Depression (%)               102085
Alcohol use disorders (%)    102085
dtype: int64

### Data Manipulation

In [45]:
mental_health.drop(columns=["Code", "index"], inplace=True)
mental_health.dropna(inplace=True)
mental_health = mental_health.rename(columns={"Entity":"country"})
mental_health["Year"] = mental_health["Year"].astype(int)
mental_health[['Schizophrenia (%)', 'Bipolar disorder (%)', 'Anxiety disorders (%)', "Eating disorders (%)"]] = mental_health[['Schizophrenia (%)', 'Bipolar disorder (%)', 'Anxiety disorders (%)','Eating disorders (%)']].astype(float)
mental_health.columns = mental_health.columns.str.replace(" (%)", "").str.lower().str.replace(" ", "_")
mental_health.head()

Unnamed: 0,country,year,schizophrenia,bipolar_disorder,eating_disorders,anxiety_disorders,drug_use_disorders,depression,alcohol_use_disorders
0,Afghanistan,1990,0.16056,0.697779,0.101855,4.82883,1.677082,4.071831,0.672404
1,Afghanistan,1991,0.160312,0.697961,0.099313,4.82974,1.684746,4.079531,0.671768
2,Afghanistan,1992,0.160135,0.698107,0.096692,4.831108,1.694334,4.088358,0.670644
3,Afghanistan,1993,0.160037,0.698257,0.094336,4.830864,1.70532,4.09619,0.669738
4,Afghanistan,1994,0.160022,0.698469,0.092439,4.829423,1.716069,4.099582,0.66926


In [46]:
mental_health.to_csv("mental-health-modified.csv")

In [47]:
# Rename columns in the world happiness data set to improve efficiency. 
world_happiness.drop(columns= ["Positive affect", "Negative affect"], inplace=True)
world_happiness = world_happiness.rename(columns={"Country name":"country", "Life Ladder":"happiness"})
world_happiness = world_happiness.rename(columns={"Log GDP per capita":"gdp_per_capita", 
                                                  "Social support":"social_support", 
                                                  "Healthy life expectancy at birth":"life_expectancy",
                                                  "Freedom to make life choices":"freedom", 
                                                  "Generosity":"generosity", 
                                                  "Perceptions of corruption":"corruption"})
world_happiness.columns= world_happiness.columns.str.lower().str.replace(" ", "_")
world_happiness.dropna(inplace=True)
world_happiness.head()

Unnamed: 0,country,year,happiness,gdp_per_capita,social_support,life_expectancy,freedom,generosity,corruption
0,Afghanistan,2008,3.724,7.37,0.451,50.8,0.718,0.168,0.882
1,Afghanistan,2009,4.402,7.54,0.552,51.2,0.679,0.19,0.85
2,Afghanistan,2010,4.758,7.647,0.539,51.6,0.6,0.121,0.707
3,Afghanistan,2011,3.832,7.62,0.521,51.92,0.496,0.162,0.731
4,Afghanistan,2012,3.783,7.705,0.521,52.24,0.531,0.236,0.776


In [48]:
# Drop the annotation column in the global hunger data set and rename columns for efficiency. 
global_hunger.drop(columns=['411773-annotations', 'Code'], inplace=True)
global_hunger = global_hunger.rename(columns= {"Entity": "country", \
                                               "Global Hunger Index (2021)": "global_hunger_index", \
                                                "Year":"year"})

In [49]:
#keep NaN values for global hunger 
data = world_happiness.merge(global_hunger, on=["country", "year"], how="outer")
data = data.merge(mental_health, on=["country","year"], how="outer")
data.head()

Unnamed: 0,country,year,happiness,gdp_per_capita,social_support,life_expectancy,freedom,generosity,corruption,global_hunger_index,schizophrenia,bipolar_disorder,eating_disorders,anxiety_disorders,drug_use_disorders,depression,alcohol_use_disorders
0,Afghanistan,2008,3.724,7.37,0.451,50.8,0.718,0.168,0.882,,0.164639,0.70448,0.093589,4.860437,2.483862,4.129656,0.659501
1,Afghanistan,2009,4.402,7.54,0.552,51.2,0.679,0.19,0.85,,0.164932,0.704925,0.095166,4.861533,2.543884,4.129972,0.661185
2,Afghanistan,2010,4.758,7.647,0.539,51.6,0.6,0.121,0.707,,0.16513,0.705313,0.097327,4.862777,2.571349,4.130874,0.662062
3,Afghanistan,2011,3.832,7.62,0.521,51.92,0.496,0.162,0.731,,0.165272,0.705688,0.098638,4.864773,2.57317,4.130862,0.662254
4,Afghanistan,2012,3.783,7.705,0.521,52.24,0.531,0.236,0.776,34.3,0.165424,0.706086,0.099891,4.867283,2.576189,4.132485,0.662372


In [50]:
data.rename(columns = {'eating_disorders':'ed_nos', 
                             'anxiety_disorders':'anxiety',
                             'drug_use_disorders':'addiction',
                             'alcohol_use_disorders':'alcoholism'}, inplace = True)

In [51]:
data.to_csv("merged_data_p1.csv", sep = ",", index=False, encoding="utf-8")

### Data Visualization

In [52]:
# Create a visualization of the relationship between global hunger and eating disorders. 
# Source: Copilot. 
sns.scatterplot(data=data, x="global_hunger_index", y="eating_disorders", 
                hue="year", palette="viridis") \
                .set_title("The Relationship Between Global Hunger and Eating Disorders")

ValueError: Could not interpret value `eating_disorders` for `y`. An entry with this name does not appear in `data`.

In [None]:
# Create a heatmap of the correlation between mental health disorders. 
# Source: Copilot.
mental_health_corr = data[['schizophrenia', 'bipolar_disorder', 'anxiety_disorders', \
                           'eating_disorders', 'drug_use_disorders', 'depression']].corr()
sns.heatmap(mental_health_corr, annot=True, cmap="viridis") \
    .set_title("Correlation of Prevalent Mental Health Disorders")


In [None]:
# Create a visualization of the relationship between corruption and depression for 4 countries:
# Russia, Afghanistan, United States, and Brazil. 
# Source: Copilot.
countries = ["Sweden", "Jordan", "Madagascar", "Bolivia", "Australia", "United States"]
data_countries = data[data['country'].isin(countries)]
sns.scatterplot(data=data_countries, x="social_support", y="depression", \
                hue="country", palette="viridis") \
                .set_title("Scatterplot of Social Support vs. Depression for 6 Countries")