Mental Health

This notebook looks at a survey for Mental Health


ABOUT THE DATA
- Age
- Gender
- Country
- state: If you live in the United States, which state or territory do you     live in?
- self_employed: Are you self-employed?
- family_history: Do you have a family history of mental illness?
- treatment: Have you sought treatment for a mental health condition?
- work_interfere: If you have a mental health condition, do you feel that it   interferes with your work?
- no_employees: How many employees does your company or organization have?
- remote_work: Do you work remotely (outside of an office) at least 50% of     the time?
- tech_company: Is your employer primarily a tech company/organization?
- benefits: Does your employer provide mental health benefits?
- care_options: Do you know the options for mental health care your employer   provides?
- wellness_program: Has your employer ever discussed mental health as part of   an employee wellness program?
- seek_help: Does your employer provide resources to learn more about mental   health issues and how to seek help?
- anonymity: Is your anonymity protected if you choose to take advantage of     mental health or substance abuse treatment resources?
- leave: How easy is it for you to take medical leave for a mental health       condition?
- mental_health_consequence: Do you think that discussing a mental health       issue with your employer would have negative consequences?
- phys_health_consequence: Do you think that discussing a physical health       issue with your employer would have negative consequences?
- coworkers: Would you be willing to discuss a mental health issue with your   coworkers?
- supervisor: Would you be willing to discuss a mental health issue with your   direct supervisor(s)?
- mental_health_interview: Would you bring up a mental health issue with a     potential employer in an interview?
- phys_health_interview: Would you bring up a physical health issue with a     potential employer in an interview?
- mental_vs_physical: Do you feel that your employer takes mental health as     seriously as physical health?
- obs_consequence: Have you heard of or observed negative consequences for     coworkers with mental health conditions in your workplace?

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
# Reading CSV file
file = pd.read_csv("survey.csv")

In [4]:
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 1259)
MH_df = file
MH_df.head()
# Drop State, Timestamp rename to month display month remove date et temp

Unnamed: 0,Timestamp,Age,Gender,Country,state,self_employed,family_history,treatment,work_interfere,no_employees,remote_work,tech_company,benefits,care_options,wellness_program,seek_help,anonymity,leave,mental_health_consequence,phys_health_consequence,coworkers,supervisor,mental_health_interview,phys_health_interview,mental_vs_physical,obs_consequence,comments
0,2014-08-27 11:29:31,37,Female,United States,IL,,No,Yes,Often,6-25,No,Yes,Yes,Not sure,No,Yes,Yes,Somewhat easy,No,No,Some of them,Yes,No,Maybe,Yes,No,
1,2014-08-27 11:29:37,44,M,United States,IN,,No,No,Rarely,More than 1000,No,No,Don't know,No,Don't know,Don't know,Don't know,Don't know,Maybe,No,No,No,No,No,Don't know,No,
2,2014-08-27 11:29:44,32,Male,Canada,,,No,No,Rarely,6-25,No,Yes,No,No,No,No,Don't know,Somewhat difficult,No,No,Yes,Yes,Yes,Yes,No,No,
3,2014-08-27 11:29:46,31,Male,United Kingdom,,,Yes,Yes,Often,26-100,No,Yes,No,Yes,No,No,No,Somewhat difficult,Yes,Yes,Some of them,No,Maybe,Maybe,No,Yes,
4,2014-08-27 11:30:22,31,Male,United States,TX,,No,No,Never,100-500,Yes,Yes,Yes,No,Don't know,Don't know,Don't know,Don't know,No,No,Some of them,Yes,Yes,Yes,Don't know,No,


In [5]:
MH_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1259 entries, 0 to 1258
Data columns (total 27 columns):
Timestamp                    1259 non-null object
Age                          1259 non-null int64
Gender                       1259 non-null object
Country                      1259 non-null object
state                        744 non-null object
self_employed                1241 non-null object
family_history               1259 non-null object
treatment                    1259 non-null object
work_interfere               995 non-null object
no_employees                 1259 non-null object
remote_work                  1259 non-null object
tech_company                 1259 non-null object
benefits                     1259 non-null object
care_options                 1259 non-null object
wellness_program             1259 non-null object
seek_help                    1259 non-null object
anonymity                    1259 non-null object
leave                        1259 non-null obj

In [6]:
# Droped columns with to many NA's in it
MH1 = MH_df.drop(columns = ["comments", "work_interfere", "Timestamp"])

In [7]:
MH1.head()

Unnamed: 0,Age,Gender,Country,state,self_employed,family_history,treatment,no_employees,remote_work,tech_company,benefits,care_options,wellness_program,seek_help,anonymity,leave,mental_health_consequence,phys_health_consequence,coworkers,supervisor,mental_health_interview,phys_health_interview,mental_vs_physical,obs_consequence
0,37,Female,United States,IL,,No,Yes,6-25,No,Yes,Yes,Not sure,No,Yes,Yes,Somewhat easy,No,No,Some of them,Yes,No,Maybe,Yes,No
1,44,M,United States,IN,,No,No,More than 1000,No,No,Don't know,No,Don't know,Don't know,Don't know,Don't know,Maybe,No,No,No,No,No,Don't know,No
2,32,Male,Canada,,,No,No,6-25,No,Yes,No,No,No,No,Don't know,Somewhat difficult,No,No,Yes,Yes,Yes,Yes,No,No
3,31,Male,United Kingdom,,,Yes,Yes,26-100,No,Yes,No,Yes,No,No,No,Somewhat difficult,Yes,Yes,Some of them,No,Maybe,Maybe,No,Yes
4,31,Male,United States,TX,,No,No,100-500,Yes,Yes,Yes,No,Don't know,Don't know,Don't know,Don't know,No,No,Some of them,Yes,Yes,Yes,Don't know,No


In [10]:
# Checking Columns with Nan's
MH1.isna().sum()

Age                            0
Gender                         0
Country                        0
state                        515
self_employed                 18
family_history                 0
treatment                      0
no_employees                   0
remote_work                    0
tech_company                   0
benefits                       0
care_options                   0
wellness_program               0
seek_help                      0
anonymity                      0
leave                          0
mental_health_consequence      0
phys_health_consequence        0
coworkers                      0
supervisor                     0
mental_health_interview        0
phys_health_interview          0
mental_vs_physical             0
obs_consequence                0
dtype: int64

In [11]:
# Checking for unique values in gender column
MH1.Gender.unique()

array(['Female', 'M', 'Male', 'male', 'female', 'm', 'Male-ish', 'maile',
       'Trans-female', 'Cis Female', 'F', 'something kinda male?',
       'Cis Male', 'Woman', 'f', 'Mal', 'Male (CIS)', 'queer/she/they',
       'non-binary', 'Femake', 'woman', 'Make', 'Nah', 'All', 'Enby',
       'fluid', 'Genderqueer', 'Female ', 'Androgyne', 'Agender',
       'cis-female/femme', 'Guy (-ish) ^_^', 'male leaning androgynous',
       'Male ', 'Man', 'Trans woman', 'msle', 'Neuter', 'Female (trans)',
       'queer', 'Female (cis)', 'Mail', 'cis male', 'A little about you',
       'Malr', 'p', 'femail', 'Cis Man',
       'ostensibly male, unsure what that really means'], dtype=object)

In [8]:
# Cleaning the Gender Column
gender = MH1["Gender"].str.lower()

males = ["male", "m", "male-ish", "maile", "mal", "male (cis)", "make", "male ", "man","msle", "mail", "malr","cis man", "Cis Male", "cis male"]
trans = ["trans-female", "something kinda male?", "queer/she/they", "non-binary","nah", "all", "enby", "fluid", "genderqueer", "androgyne", "agender", "male leaning androgynous", "guy (-ish) ^_^", "trans woman", "neuter", "female (trans)", "queer", "ostensibly male, unsure what that really means"]           
females = ["cis female", "f", "female", "woman",  "femake", "female ","cis-female/femme", "female (cis)", "femail"]

for (row, col) in MH1.iterrows():

    if str.lower(col.Gender) in males:
        MH1['Gender'].replace(to_replace=col.Gender, value='Male', inplace=True)

    if str.lower(col.Gender) in females:
        MH1['Gender'].replace(to_replace=col.Gender, value='Female', inplace=True)

    if str.lower(col.Gender) in trans:
        MH1['Gender'].replace(to_replace=col.Gender, value='Trans', inplace=True)
MH1["Gender"].unique()

array(['Female', 'Male', 'Trans', 'A little about you', 'p'], dtype=object)

In [9]:
# Getting rid of "A little about you" and "p"
str_list = ['A little about you', 'p']
MH1 = MH1[~MH1['Gender'].isin(str_list)]

MH1["Gender"].unique()

array(['Female', 'Male', 'Trans'], dtype=object)

In [10]:
MH1["Gender"].head()

0    Female
1      Male
2      Male
3      Male
4      Male
Name: Gender, dtype: object

In [11]:
# Checking Age column
MH1.Age.unique()

array([         37,          44,          32,          31,          33,
                35,          39,          42,          23,          29,
                36,          27,          46,          41,          34,
                30,          40,          38,          50,          24,
                18,          28,          26,          22,          19,
                25,          45,          21,         -29,          43,
                56,          60,          54,         329,          55,
       99999999999,          48,          20,          57,          58,
                47,          62,          51,          65,          49,
             -1726,           5,          53,          61,          11,
                72])

In [12]:
#complete missing age with mean
MH1['Age'].fillna(MH1['Age'].median(), inplace = True)

# Fill with median() values < 18 and > 120
s = pd.Series(MH1['Age'])
s[s<18] = MH1['Age'].median()
MH1['Age'] = s
s = pd.Series(MH1['Age'])
s[s>120] = MH1['Age'].median()
MH1['Age'] = s

#Ranges of Age
MH1['age_range'] = pd.cut(MH1['Age'], [0,20,30,65,100], labels=["0-20", "21-30", "31-65", "66-100"], include_lowest=True)
MH1["Age"].unique()

array([37, 44, 32, 31, 33, 35, 39, 42, 23, 29, 36, 27, 46, 41, 34, 30, 40,
       38, 50, 24, 18, 28, 26, 22, 19, 25, 45, 21, 43, 56, 60, 54, 55, 48,
       20, 57, 58, 47, 62, 51, 65, 49, 53, 61, 72])

In [13]:
# Checking Self employed 
MH1["self_employed"].unique()

array([nan, 'Yes', 'No'], dtype=object)

In [14]:
MH1["self_employed"].isna().count()

1257

In [15]:
MH1.head()

Unnamed: 0,Age,Gender,Country,state,self_employed,family_history,treatment,no_employees,remote_work,tech_company,benefits,care_options,wellness_program,seek_help,anonymity,leave,mental_health_consequence,phys_health_consequence,coworkers,supervisor,mental_health_interview,phys_health_interview,mental_vs_physical,obs_consequence,age_range
0,37,Female,United States,IL,,No,Yes,6-25,No,Yes,Yes,Not sure,No,Yes,Yes,Somewhat easy,No,No,Some of them,Yes,No,Maybe,Yes,No,31-65
1,44,Male,United States,IN,,No,No,More than 1000,No,No,Don't know,No,Don't know,Don't know,Don't know,Don't know,Maybe,No,No,No,No,No,Don't know,No,31-65
2,32,Male,Canada,,,No,No,6-25,No,Yes,No,No,No,No,Don't know,Somewhat difficult,No,No,Yes,Yes,Yes,Yes,No,No,31-65
3,31,Male,United Kingdom,,,Yes,Yes,26-100,No,Yes,No,Yes,No,No,No,Somewhat difficult,Yes,Yes,Some of them,No,Maybe,Maybe,No,Yes,31-65
4,31,Male,United States,TX,,No,No,100-500,Yes,Yes,Yes,No,Don't know,Don't know,Don't know,Don't know,No,No,Some of them,Yes,Yes,Yes,Don't know,No,31-65


In [38]:
MH1["Age"].describe()

count    1257.000000
mean       32.071599
std         7.271222
min        18.000000
25%        27.000000
50%        31.000000
75%        36.000000
max        72.000000
Name: Age, dtype: float64

In [16]:
# Checking Country and State NaN's
MH1.loc[:, 'Country':'state'].head(30)

Unnamed: 0,Country,state
0,United States,IL
1,United States,IN
2,Canada,
3,United Kingdom,
4,United States,TX
5,United States,TN
6,United States,MI
7,Canada,
8,United States,IL
9,Canada,


In [9]:
# Replacing nan in state to 0
MH1['state'] = MH1['state'].replace(np.nan, "NS")

In [17]:
MH1.head(10)

Unnamed: 0,Age,Gender,Country,state,self_employed,family_history,treatment,no_employees,remote_work,tech_company,benefits,care_options,wellness_program,seek_help,anonymity,leave,mental_health_consequence,phys_health_consequence,coworkers,supervisor,mental_health_interview,phys_health_interview,mental_vs_physical,obs_consequence,age_range
0,37,Female,United States,IL,,No,Yes,6-25,No,Yes,Yes,Not sure,No,Yes,Yes,Somewhat easy,No,No,Some of them,Yes,No,Maybe,Yes,No,31-65
1,44,Male,United States,IN,,No,No,More than 1000,No,No,Don't know,No,Don't know,Don't know,Don't know,Don't know,Maybe,No,No,No,No,No,Don't know,No,31-65
2,32,Male,Canada,,,No,No,6-25,No,Yes,No,No,No,No,Don't know,Somewhat difficult,No,No,Yes,Yes,Yes,Yes,No,No,31-65
3,31,Male,United Kingdom,,,Yes,Yes,26-100,No,Yes,No,Yes,No,No,No,Somewhat difficult,Yes,Yes,Some of them,No,Maybe,Maybe,No,Yes,31-65
4,31,Male,United States,TX,,No,No,100-500,Yes,Yes,Yes,No,Don't know,Don't know,Don't know,Don't know,No,No,Some of them,Yes,Yes,Yes,Don't know,No,31-65
5,33,Male,United States,TN,,Yes,No,6-25,No,Yes,Yes,Not sure,No,Don't know,Don't know,Don't know,No,No,Yes,Yes,No,Maybe,Don't know,No,31-65
6,35,Female,United States,MI,,Yes,Yes,1-5,Yes,Yes,No,No,No,No,No,Somewhat difficult,Maybe,Maybe,Some of them,No,No,No,Don't know,No,31-65
7,39,Male,Canada,,,No,No,1-5,Yes,Yes,No,Yes,No,No,Yes,Don't know,No,No,No,No,No,No,No,No,31-65
8,42,Female,United States,IL,,Yes,Yes,100-500,No,Yes,Yes,Yes,No,No,No,Very difficult,Maybe,No,Yes,Yes,No,Maybe,No,No,31-65
9,23,Male,Canada,,,No,No,26-100,No,Yes,Don't know,No,Don't know,Don't know,Don't know,Don't know,No,No,Yes,Yes,Maybe,Maybe,Yes,No,21-30


In [118]:
# Checking  
MH1.dtypes

Age                             int64
Gender                         object
Country                        object
state                          object
self_employed                  object
family_history                 object
treatment                      object
no_employees                   object
remote_work                    object
tech_company                   object
benefits                       object
care_options                   object
wellness_program               object
seek_help                      object
anonymity                      object
leave                          object
mental_health_consequence      object
phys_health_consequence        object
coworkers                      object
supervisor                     object
mental_health_interview        object
phys_health_interview          object
mental_vs_physical             object
obs_consequence                object
age_range                    category
dtype: object

In [18]:
MH1.head(5)

Unnamed: 0,Age,Gender,Country,state,self_employed,family_history,treatment,no_employees,remote_work,tech_company,benefits,care_options,wellness_program,seek_help,anonymity,leave,mental_health_consequence,phys_health_consequence,coworkers,supervisor,mental_health_interview,phys_health_interview,mental_vs_physical,obs_consequence,age_range
0,37,Female,United States,IL,,No,Yes,6-25,No,Yes,Yes,Not sure,No,Yes,Yes,Somewhat easy,No,No,Some of them,Yes,No,Maybe,Yes,No,31-65
1,44,Male,United States,IN,,No,No,More than 1000,No,No,Don't know,No,Don't know,Don't know,Don't know,Don't know,Maybe,No,No,No,No,No,Don't know,No,31-65
2,32,Male,Canada,,,No,No,6-25,No,Yes,No,No,No,No,Don't know,Somewhat difficult,No,No,Yes,Yes,Yes,Yes,No,No,31-65
3,31,Male,United Kingdom,,,Yes,Yes,26-100,No,Yes,No,Yes,No,No,No,Somewhat difficult,Yes,Yes,Some of them,No,Maybe,Maybe,No,Yes,31-65
4,31,Male,United States,TX,,No,No,100-500,Yes,Yes,Yes,No,Don't know,Don't know,Don't know,Don't know,No,No,Some of them,Yes,Yes,Yes,Don't know,No,31-65


In [116]:
# Checking unique data of each column
MH1.age_range.unique()

[31-65, 21-30, 0-20, 66-100]
Categories (4, object): [0-20 < 21-30 < 31-65 < 66-100]

Note on Columns / Change info
- KEEP: Gender: Change to Integer F = 0, M = 1, T = 3
- DONE: Country: Object
- DONE: State: Obj, replace 0 with a letter (NS= No State)
- TODO: Self Employed: [obj, nan, yes, no] Change to Boolean change nan ?*
- TODO: Family History: [Obj Yes No] --> Change to Booleans Yes = 1, No = 0
- TODO: Treatment: [Obj, Yes No] --> Change to Boolean Yes = 1, No = 0
- KEEP: No_Employees: Obj
- TODO: Remote Work: [Obj, Yes No] --> Change to Boolean Yes = 1, No = 0
- TODO: Tech Company: [Obj Yes No] --> Change to Boolean Yes = 1, No = 0
- KEEP: Benefits: [Obj, Yes No Don't know] 
- KEEP: Care Option: [Obj, Yes No Not Sure] 
- KEEP: Wellness Program: [Obj, Yes No Don't know] 
- KEEP: Seek Help: [Obj, Yes No Dont't know] 
- KEEP: Anonymity: [Obj, Yes No Don't know] 
- ?:    Leave: [Obj,Somewhat easy, Don't know, Somewhat difficult,
         Very difficult', Very easy] --> Keep as is Maybe drop it.****
- TODO: Mental Health: [Obj, 'No', 'Maybe', 'Yes'] Look Up. Change to Int?**
- TODO: Phys Health: [Obj, 'No', 'Yes', 'Maybe'] --> Look Up Change to Int?**
- TODO: Coworkers: [Obj, 'Some of them', 'No', 'Yes'] --> Look Up**
- TODO: Supervisor: [Obj, 'Yes', 'No', 'Some of them'] --> Loop Up**
- TODO: MH Interview: [Obj, 'No', 'Yes', 'Maybe'] --> Look Up
- TODO: PH Interview: [Obj, 'Maybe', 'No', 'Yes'] --> Look Up
- TODO: M v. P: [Obj, 'Yes', "Don't know", 'No'] --> Look Up
- TODO: Obs: [Obj, 'No', 'Yes'] --> Think it means yes MI, change to Boolean Yes = =1, No=0
- TODO: Age Range: Drop Column?

Todo: Look up ways to change info on column and save that as new data frame

In [12]:
# TODO: Self Employed: [obj, nan, yes, no] Change to Boolean change nan ?*
# TODO: Family History: [Obj Yes No] --> Change to Booleans Yes = 1, No = 0
MH1["family_history"].astype(bool)
# TODO: Treatment: [Obj, Yes No] --> Change to Boolean Yes = 1, No = 0
# TODO: Remote Work: [Obj, Yes No] --> Change to Boolean Yes = 1, No = 0
# TODO: Tech Company: [Obj Yes No] --> Change to Boolean Yes = 1, No = 0
# TODO: Mental Health: [Obj, 'No', 'Maybe', 'Yes'] Look Up. Change to Int?**
# TODO: Phys Health: [Obj, 'No', 'Yes', 'Maybe'] --> Look Up Change to Int?**
# TODO: Coworkers: [Obj, 'Some of them', 'No', 'Yes'] --> Look Up**
# TODO: Supervisor: [Obj, 'Yes', 'No', 'Some of them'] --> Loop Up**
# TODO: MH Interview: [Obj, 'No', 'Yes', 'Maybe'] --> Look Up
# TODO: PH Interview: [Obj, 'Maybe', 'No', 'Yes'] --> Look Up
# TODO: M v. P: [Obj, 'Yes', "Don't know", 'No'] --> Look Up
# TODO: Obs: [Obj, 'No', 'Yes'] --> Think it means yes MI, change to Boolean Yes = =1, No=0
# TODO: Age Range: Drop Column?

0       True
1       True
2       True
3       True
4       True
5       True
6       True
7       True
8       True
9       True
10      True
11      True
12      True
13      True
14      True
15      True
16      True
17      True
18      True
19      True
20      True
21      True
22      True
23      True
24      True
25      True
26      True
27      True
28      True
29      True
30      True
31      True
32      True
33      True
34      True
35      True
36      True
37      True
38      True
39      True
40      True
41      True
42      True
43      True
44      True
45      True
46      True
47      True
48      True
49      True
50      True
51      True
52      True
53      True
54      True
55      True
56      True
57      True
58      True
59      True
60      True
61      True
62      True
63      True
64      True
65      True
66      True
67      True
68      True
69      True
70      True
71      True
72      True
73      True
74      True
75      True
76      True

In [16]:
MH1.to_csv("mental_health.csv", index=False, header=True)

In [27]:
#MH1["remote_work"].count()
MH2 = MH1.loc[MH1['remote_work'] == "Yes"]

In [32]:
MH2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 374 entries, 4 to 1257
Data columns (total 25 columns):
Age                          374 non-null int64
Gender                       374 non-null object
Country                      374 non-null object
state                        232 non-null object
self_employed                368 non-null object
family_history               374 non-null object
treatment                    374 non-null object
no_employees                 374 non-null object
remote_work                  374 non-null object
tech_company                 374 non-null object
benefits                     374 non-null object
care_options                 374 non-null object
wellness_program             374 non-null object
seek_help                    374 non-null object
anonymity                    374 non-null object
leave                        374 non-null object
mental_health_consequence    374 non-null object
phys_health_consequence      374 non-null object
coworkers    

In [36]:
MH2.to_csv("remote_work.csv", index=False, header=True)

In [33]:
MH3 = MH1.loc[MH1['remote_work'] == "No"]
MH3.head()

Unnamed: 0,Age,Gender,Country,state,self_employed,family_history,treatment,no_employees,remote_work,tech_company,benefits,care_options,wellness_program,seek_help,anonymity,leave,mental_health_consequence,phys_health_consequence,coworkers,supervisor,mental_health_interview,phys_health_interview,mental_vs_physical,obs_consequence,age_range
0,37,Female,United States,IL,,No,Yes,6-25,No,Yes,Yes,Not sure,No,Yes,Yes,Somewhat easy,No,No,Some of them,Yes,No,Maybe,Yes,No,31-65
1,44,Male,United States,IN,,No,No,More than 1000,No,No,Don't know,No,Don't know,Don't know,Don't know,Don't know,Maybe,No,No,No,No,No,Don't know,No,31-65
2,32,Male,Canada,,,No,No,6-25,No,Yes,No,No,No,No,Don't know,Somewhat difficult,No,No,Yes,Yes,Yes,Yes,No,No,31-65
3,31,Male,United Kingdom,,,Yes,Yes,26-100,No,Yes,No,Yes,No,No,No,Somewhat difficult,Yes,Yes,Some of them,No,Maybe,Maybe,No,Yes,31-65
5,33,Male,United States,TN,,Yes,No,6-25,No,Yes,Yes,Not sure,No,Don't know,Don't know,Don't know,No,No,Yes,Yes,No,Maybe,Don't know,No,31-65


In [34]:
MH3.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 883 entries, 0 to 1258
Data columns (total 25 columns):
Age                          883 non-null int64
Gender                       883 non-null object
Country                      883 non-null object
state                        510 non-null object
self_employed                871 non-null object
family_history               883 non-null object
treatment                    883 non-null object
no_employees                 883 non-null object
remote_work                  883 non-null object
tech_company                 883 non-null object
benefits                     883 non-null object
care_options                 883 non-null object
wellness_program             883 non-null object
seek_help                    883 non-null object
anonymity                    883 non-null object
leave                        883 non-null object
mental_health_consequence    883 non-null object
phys_health_consequence      883 non-null object
coworkers    

In [37]:
MH3.to_csv("non_remote_work.csv", index=False, header=True)