Mental Health Predictor

Background

Defining Mental illness
- Disturbance of thought , experience, and emotion that causes functional impairment in people.
- Interpersonal difficulty, limiting the ability to work and self destructive behavior.
In US nearly 1 in 5 adults aged 18 years or older (18.5%) have experienced mental illness (US Burden of Disease Collaborator, 2013).
Data science can help us to better understand and effectively implement treatments for mental health problems
Factors causing mental illness - anxiety, depression, biological, psychological and sociological (environment) approaches.

Our analysis project focuses on identifying factors associated with the prevalence of poor mental health in the US.
- Population density, income, water and land features
Building a machine learning model that can predict mental health risk for an individual based on designated factors
What are the most and least significant factors (features) in predicting prevalence of poor mental health in the US?

500 Cities: Local Data for Better Health, 2019. 500 Cities: Mental health not good for >=14 days among adults aged >=18 years ---Centers for Disease Control and Prevention (CDC), Division of Population Health, Epidemiology and Surveillance Branch
- Behavioral Risk Factor Surveillance System (BRFSS) data (2017, 2016)
  - Mental Health Severity: Respondents aged ≥18 years who report 14 or more days during the past 30 days during which their mental health was not good.
US Household Income Statistics---Golden Oak Research Group LLC, “U.S. Income Database Kaggle”. Publication: 5, August 2017
United States Cities Database---SimpleMaps.com, Pareto Software LLC, compiled data from U.S. Geological Survey and U.S. Census Bureau

We used a binary outcome based on %poor mental health prevalence.
The binary outcome was calculated by median split:
- The median % poor mental health of the 500 cities was 13.89%. So…
  - If a city < 13.89% poor mental health → “Good Mental Health”
  - If a city >= 13.89% poor mental health → “Bad Mental Health”
Features were log-transformed and scaled to bring them into a normal distribution
We tried logistic regression, support vector machines (1-3 kernels), decision tree, gradient tree boost (learning rates .05 - 1), random forest, and 1-2 layer deep learning
We used 10-fold cross-validation - i.e., 10 machine learning instances of randomly allocating 90% of data to training and 10% to testing. We averaged the performance across the 10 instances.

Multiple machine learning models were used and most of them provided about 80% accuracy in their mental health risk prediction.
With Random Forest model the most strongest feature in predicting poor mental health was Standard Deviation of Income.
This suggests that income inequality in a city most predicted the prevalence of poor mental health.

Mental health data
- Small sample
- Limited availability
- Subjective self-rating
Differences in time frame of the datasets

The presentation of the project will be found on a Google Slide Presenation, Here

Tableau Dashboard
- Data Exporation Visuals and Machine Learning Summary
  - https://nhafer88.github.io/Mental_Health_Predictor/

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Clean_data		Clean_data
Images		Images
Raw_Data		Raw_Data
sqlite		sqlite
static		static
.DS_Store		.DS_Store
MachineLearning_Shallow_and_Deep.ipynb		MachineLearning_Shallow_and_Deep.ipynb
MachineLearning_Shallow_and_Deep1.ipynb		MachineLearning_Shallow_and_Deep1.ipynb
MachineLearning_Shallow_and_Deep_split.ipynb		MachineLearning_Shallow_and_Deep_split.ipynb
README.md		README.md
Self-assessment.docx		Self-assessment.docx
final_erd.png		final_erd.png
index.html		index.html
merged_data.csv		merged_data.csv