## Machine Learning for diagnosing and predicting Mental Health in Tech Community

This project is aimed to predict mental health using machine learning,

Mental illness is one of the largest burden of disease in the UK. Severe mental disorders effect employees` work performance, students` academic performance and overall quality of life. Mental illnesses are more common, long-lasting and impactful than other health conditions. Not only impacts on individuals but also society. Mental ill health is responsible for 72 million working days lost and costs £34.9 billion each year (Mental Health First Aid England). Patients may avoid mental illness consequences by being provided with proper treatment after being diagnosed.

Digital healthcare technologies, e.g. Artificial Intelligence (AI) and Machine Learning(ML), has been growing interests in the application of the approach to address diagnostic and prediction issues in mental healthcare. Can we actually diagnose mental conditions properly with the aid of AI and ML? After the diagnoses, can the AI and ML systems tell how severe the condition is, and what might be the consequences? We might also consider the AI and ML treatment recommendations.

The aim of the project is to use AI and ML to support diagnosis, prediction and treatment of mental health disorders, particularly for people in tech community.   The specific objectives of the work can be summarized as:

• To develop AI and ML systems to diagnose mental health disorders

• To investigate AI and ML systems to predict consequences of a mental health disorder



-----------------------------------------------------------
# Content of Dataset


 0   ResponseID           -Drop                                                                       
 1   Are you selfemployed                                                                        
 2   How many employees does your company or organization have                                   
 3   Is your employer primarily a tech companyorganization                                       
 4   Is your primary role within your company related to techIT                                  
 5   Do you have previous employers                                                              
 6   Do you have a family history of mental illness                                              
 7   Have you had a mental health disorder in the past                                           
 8   Do you currently have a mental health disorder                                              
 9   If yes, what conditions have you been diagnosed with   (Target Value 2) "This should not be part of the input features"                                 
 10  If maybe, what conditions do you believe you have                                           
 11  Have you been diagnosed with a mental health condition by a medical professional (Rename and change to numerical value) -Target Value       
 12  If so, what conditions were you diagnosed with                                              
 13  Have you ever sought treatment for a mental health issue from a mental health professional  
 14  What is your age    - Rename                                                                        
 15  What is your gender - Rename                                                                        
 16  Age Group           - Rename                                                                        
 17  What country do you live in         - ( Compare countries)                                                        
 18  What US state or territory do you live in                                                   
 19  What country do you work in         - (Compare these using HeatMap)                                                        
 20  What US state or territory do you work in                                                   
 21  Which of the following best describes your work position                                    
 22  Do you work remotely                                                                        
 23  Question Group      - Drop                                                                        
 24  Questions about speaking openly about mental health vs physical health                       
 25  Question            -  Drop                                                                     
 26  Response            -  Drop                                                                        


### 2 Things, check the main target first. Then if yes, then what condition have they been dignosed with.


In [80]:
#import Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline


from scipy import stats
from scipy.stats import randint

#PreProcessing 
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.datasets import make_classification
from sklearn.preprocessing import binarize, LabelEncoder, MinMaxScaler


# models
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier

# Validation libraries
from sklearn import metrics
from sklearn.metrics import accuracy_score, mean_squared_error, precision_recall_curve
from sklearn.model_selection import cross_val_score



In [81]:
#loading dataset
filename = 'data/Survey_Data.csv'
df = pd.read_csv(filename)

In [82]:
# display thge first 5 rows in the dataset
df.head(5)

Unnamed: 0,ResponseID,Are you selfemployed,How many employees does your company or organization have,Is your employer primarily a tech companyorganization,Is your primary role within your company related to techIT,Do you have previous employers,Do you have a family history of mental illness,Have you had a mental health disorder in the past,Do you currently have a mental health disorder,"If yes, what conditions have you been diagnosed with",...,What country do you live in,What US state or territory do you live in,What country do you work in,What US state or territory do you work in,Which of the following best describes your work position,Do you work remotely,Question Group,Question about speaking openly about mental health vs physical health,Question,Response
0,r00000,False,26-100,True,,True,No,Yes,No,,...,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,Resources for employees with mental health dis...,No,Does your employer provide mental health benef...,Not eligible for coverage / N/A
1,r00000,False,26-100,True,,True,No,Yes,No,,...,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,Resources for employees with mental health dis...,No,Do you know the options for mental health care...,
2,r00000,False,26-100,True,,True,No,Yes,No,,...,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,Safe and supportive workplce for those with me...,No,Has your employer ever formally discussed ment...,No
3,r00000,False,26-100,True,,True,No,Yes,No,,...,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,Resources for employees with mental health dis...,No,Does your employer offer resources to learn mo...,No
4,r00000,False,26-100,True,,True,No,Yes,No,,...,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,Resources for employees with mental health dis...,No,Is your anonymity protected if you choose to t...,I don't know


In [83]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60186 entries, 0 to 60185
Data columns (total 27 columns):
 #   Column                                                                                      Non-Null Count  Dtype  
---  ------                                                                                      --------------  -----  
 0   ResponseID                                                                                  60186 non-null  object 
 1   Are you selfemployed                                                                        60186 non-null  bool   
 2   How many employees does your company or organization have                                   48132 non-null  object 
 3   Is your employer primarily a tech companyorganization                                       48132 non-null  object 
 4   Is your primary role within your company related to techIT                                  11046 non-null  object 
 5   Do you have previous employers         

In [84]:
#types of data in the Dataframe 
df.dtypes

ResponseID                                                                                     object
Are you selfemployed                                                                             bool
How many employees does your company or organization have                                      object
Is your employer primarily a tech companyorganization                                          object
Is your primary role within your company related to techIT                                     object
Do you have previous employers                                                                   bool
Do you have a family history of mental illness                                                 object
Have you had a mental health disorder in the past                                              object
Do you currently have a mental health disorder                                                 object
If yes, what conditions have you been diagnosed with                              

In [85]:
print(" VALUES IN DATASET ")
#count the values in the data for each column
df.count()

 VALUES IN DATASET 


ResponseID                                                                                    60186
Are you selfemployed                                                                          60186
How many employees does your company or organization have                                     48132
Is your employer primarily a tech companyorganization                                         48132
Is your primary role within your company related to techIT                                    11046
Do you have previous employers                                                                60186
Do you have a family history of mental illness                                                60186
Have you had a mental health disorder in the past                                             60186
Do you currently have a mental health disorder                                                60186
If yes, what conditions have you been diagnosed with                                          23856



As seen in the data, there are about 22 object tpyes, 4 boolean values and just one float



# Data Cleaning



In [86]:
#rename dataset
df_clean = df

In [87]:
#check null values
df_clean.isnull()

Unnamed: 0,ResponseID,Are you selfemployed,How many employees does your company or organization have,Is your employer primarily a tech companyorganization,Is your primary role within your company related to techIT,Do you have previous employers,Do you have a family history of mental illness,Have you had a mental health disorder in the past,Do you currently have a mental health disorder,"If yes, what conditions have you been diagnosed with",...,What country do you live in,What US state or territory do you live in,What country do you work in,What US state or territory do you work in,Which of the following best describes your work position,Do you work remotely,Question Group,Question about speaking openly about mental health vs physical health,Question,Response
0,False,False,False,False,True,False,False,False,False,True,...,False,True,False,True,False,False,False,False,False,False
1,False,False,False,False,True,False,False,False,False,True,...,False,True,False,True,False,False,False,False,False,True
2,False,False,False,False,True,False,False,False,False,True,...,False,True,False,True,False,False,False,False,False,False
3,False,False,False,False,True,False,False,False,False,True,...,False,True,False,True,False,False,False,False,False,False
4,False,False,False,False,True,False,False,False,False,True,...,False,True,False,True,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
60181,False,False,False,False,True,False,False,False,False,False,...,False,True,False,True,False,False,False,False,False,False
60182,False,False,False,False,True,False,False,False,False,False,...,False,True,False,True,False,False,False,False,False,False
60183,False,False,False,False,True,False,False,False,False,False,...,False,True,False,True,False,False,False,False,False,False
60184,False,False,False,False,True,False,False,False,False,False,...,False,True,False,True,False,False,False,False,False,False


In [88]:
if df_clean.isnull().sum().sum() == 0:
    print('You have no missing data in the dataset.')
else:
    print('There are {} missing data in your dataset.'.format(df_clean.isnull().sum().sum()))


There are 260936 missing data in your dataset.


In [89]:
#check the missing data in the columns and their unique values

In [90]:
#giving values for the different data types
String = 'Nan'
Float = 0.0
# creating a according to data type
IntFeatures = ['What is your age ']
stringFeatures = ['']

In [91]:
#drop columns 
to_drop = ['ResponseID','Question Group','Question','Response']
df_clean.drop(to_drop,inplace=True, axis=1)

Dropped all these Columns

* ResponseID - This column is unique identifiers that do not contibrute to the modeling or analysis of the aims of the project
* Question Group - There is no need to have this in the data as it has no relevance for the prediction
* Response - This was dropped to reduce the 

In [92]:
#Display 
df_clean.head(5)

Unnamed: 0,Are you selfemployed,How many employees does your company or organization have,Is your employer primarily a tech companyorganization,Is your primary role within your company related to techIT,Do you have previous employers,Do you have a family history of mental illness,Have you had a mental health disorder in the past,Do you currently have a mental health disorder,"If yes, what conditions have you been diagnosed with","If maybe, what conditions do you believe you have",...,What is your age,What is your gender,Age Group,What country do you live in,What US state or territory do you live in,What country do you work in,What US state or territory do you work in,Which of the following best describes your work position,Do you work remotely,Question about speaking openly about mental health vs physical health
0,False,26-100,True,,True,No,Yes,No,,,...,39.0,Male,36-40,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,No
1,False,26-100,True,,True,No,Yes,No,,,...,39.0,Male,36-40,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,No
2,False,26-100,True,,True,No,Yes,No,,,...,39.0,Male,36-40,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,No
3,False,26-100,True,,True,No,Yes,No,,,...,39.0,Male,36-40,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,No
4,False,26-100,True,,True,No,Yes,No,,,...,39.0,Male,36-40,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,No


In [93]:
#Findig out the gender types included in the forms 
#rename column 

gender = df_clean['What is your gender '].unique()
print(gender)


['Male' 'Female' 'Trans/Other']


## Renaming Columns

In [94]:
#Rename columns 
df_clean.rename(
    columns={"What is your age": "Age", "What is your gender": "Gender", "Age Group": "Age group" ,"Is your employer primarily a tech companyorganization":
            "Tech Company","How many employees does your company or organization have":"Number of employees"},
    inplace=True,
)

df_clean

#Look at the data and change the column Age,
#Do you have previous employmet,
#do you currently have a mental health disorder
#What is your Age
#Do you have a family history of mental illness
#If yes, what conditions have you been diagnosed with
#What country do you live in
#What US state or territory do you live in
#Which of the following best describes your work position

Unnamed: 0,Are you selfemployed,Number of employees,Tech Company,Is your primary role within your company related to techIT,Do you have previous employers,Do you have a family history of mental illness,Have you had a mental health disorder in the past,Do you currently have a mental health disorder,"If yes, what conditions have you been diagnosed with","If maybe, what conditions do you believe you have",...,Age,What is your gender,Age group,What country do you live in,What US state or territory do you live in,What country do you work in,What US state or territory do you work in,Which of the following best describes your work position,Do you work remotely,Question about speaking openly about mental health vs physical health
0,False,26-100,True,,True,No,Yes,No,,,...,39.0,Male,36-40,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,No
1,False,26-100,True,,True,No,Yes,No,,,...,39.0,Male,36-40,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,No
2,False,26-100,True,,True,No,Yes,No,,,...,39.0,Male,36-40,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,No
3,False,26-100,True,,True,No,Yes,No,,,...,39.0,Male,36-40,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,No
4,False,26-100,True,,True,No,Yes,No,,,...,39.0,Male,36-40,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
60181,False,100-500,True,,False,I don't know,Yes,Yes,Obsessive-Compulsive Disorder|Eating Disorder ...,,...,25.0,Trans/Other,20-25,Canada,,Canada,,Other,Sometimes,No
60182,False,100-500,True,,False,I don't know,Yes,Yes,Obsessive-Compulsive Disorder|Eating Disorder ...,,...,25.0,Trans/Other,20-25,Canada,,Canada,,Other,Sometimes,No
60183,False,100-500,True,,False,I don't know,Yes,Yes,Obsessive-Compulsive Disorder|Eating Disorder ...,,...,25.0,Trans/Other,20-25,Canada,,Canada,,Other,Sometimes,No
60184,False,100-500,True,,False,I don't know,Yes,Yes,Obsessive-Compulsive Disorder|Eating Disorder ...,,...,25.0,Trans/Other,20-25,Canada,,Canada,,Other,Sometimes,No


In [95]:
df_clean['Question about speaking openly about mental health vs physical health'].unique()

array(['No', 'Yes'], dtype=object)

In [96]:
#Rename columns 
df_clean.rename(
    columns={"What is your age": "Age", "What is your gender": "Gender", "Age Group": "Age group" ,"Is your employer primarily a tech companyorganization":
            "Tech Company","How many employees does your company or organization have":"Number of employees"},
    inplace=True,
)

df_clean


Unnamed: 0,Are you selfemployed,Number of employees,Tech Company,Is your primary role within your company related to techIT,Do you have previous employers,Do you have a family history of mental illness,Have you had a mental health disorder in the past,Do you currently have a mental health disorder,"If yes, what conditions have you been diagnosed with","If maybe, what conditions do you believe you have",...,Age,What is your gender,Age group,What country do you live in,What US state or territory do you live in,What country do you work in,What US state or territory do you work in,Which of the following best describes your work position,Do you work remotely,Question about speaking openly about mental health vs physical health
0,False,26-100,True,,True,No,Yes,No,,,...,39.0,Male,36-40,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,No
1,False,26-100,True,,True,No,Yes,No,,,...,39.0,Male,36-40,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,No
2,False,26-100,True,,True,No,Yes,No,,,...,39.0,Male,36-40,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,No
3,False,26-100,True,,True,No,Yes,No,,,...,39.0,Male,36-40,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,No
4,False,26-100,True,,True,No,Yes,No,,,...,39.0,Male,36-40,United Kingdom,,United Kingdom,,Back-end Developer,Sometimes,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
60181,False,100-500,True,,False,I don't know,Yes,Yes,Obsessive-Compulsive Disorder|Eating Disorder ...,,...,25.0,Trans/Other,20-25,Canada,,Canada,,Other,Sometimes,No
60182,False,100-500,True,,False,I don't know,Yes,Yes,Obsessive-Compulsive Disorder|Eating Disorder ...,,...,25.0,Trans/Other,20-25,Canada,,Canada,,Other,Sometimes,No
60183,False,100-500,True,,False,I don't know,Yes,Yes,Obsessive-Compulsive Disorder|Eating Disorder ...,,...,25.0,Trans/Other,20-25,Canada,,Canada,,Other,Sometimes,No
60184,False,100-500,True,,False,I don't know,Yes,Yes,Obsessive-Compulsive Disorder|Eating Disorder ...,,...,25.0,Trans/Other,20-25,Canada,,Canada,,Other,Sometimes,No
