# Mental Health in Tech Project

## Data Sets

[OSMI Survey on Mental Health in the Tech Workplace in 2014](https://www.kaggle.com/osmi/mental-health-in-tech-survey) 

["Ongoing" OSMI survey from 2016](https://data.world/kittybot/osmi-mental-health-tech-2016)


## Questions

What factors are most signficant in influencing whether or not a person believes disclosing a mental health issue would have negative consequences?

Can we predict, based on publicly available features of a person and company, whether that person is likely to beleive disclosing a mental health issue would be harmful for their career?

## Exploring and Cleaning 2014 Data

See cleaning.ipynb

In [3]:
import pandas as pd

In [4]:
df14 = pd.read_csv("./datasets/2014/mental-health-in-tech-2014.csv")
print df14.shape
# df14.head(3)

(1259, 27)


In [5]:
# standardize columns to have lowercase names
df14.rename(columns={'Age': 'age', 'Gender': 'gender', 'Country': 'country', 'Timestamp': 'timestamp'}, inplace=True)
# replace confusing no_employees column name
df14.rename(columns={'no_employees': 'num_employees'}, inplace=True)


In [6]:
df_original = pd.read_csv("./datasets/2014/osmi-mental-health-in-tech-original.csv")
print df_original.shape
# print df_original.columns   # original questions/fields

(1259, 27)


<details><summary> Click to expand all **original questions/fields** </summary>
    
- Timestamp   
- Age  
- Gender   
- Country  
- If you live in the United States, which state or territory do you live in?  
- Are you self-employed?  
- Do you have a family history of mental illness?  
- Have you sought treatment for a mental health condition?  
- If you have a mental health condition, do you feel that it interferes with your work?  
- How many employees does your company or organization have?  
- Do you work remotely (outside of an office) at least 50% of the time?  
- Is your employer primarily a tech company/organization?  
- Does your employer provide mental health benefits?  
- Do you know the options for mental health care your employer provides?  
- Has your employer ever discussed mental health as part of an employee wellness program?  
- Does your employer provide resources to learn more about mental health issues and how to seek help?  
- Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources?  
- How easy is it for you to take medical leave for a mental health condition?  
- Do you think that discussing a mental health issue with your employer would have negative consequences?  
- Do you think that discussing a physical health issue with your employer would have negative consequences?  
- Would you be willing to discuss a mental health issue with your coworkers?  
- Would you be willing to discuss a mental health issue with your direct supervisor(s)?  
- Would you bring up a mental health issue with a potential employer in an interview?  
- Would you bring up a physical health issue with a potential employer in an interview?  
- Do you feel that your employer takes mental health as seriously as physical health?  
- Have you heard of or observed negative consequences for coworkers with mental health conditions in your workplace?  
- Any additional notes or comments
</details>

In [7]:
# create reference to look up questions based on column names
column_names = df14.columns
questions = df_original.columns
col_question_map = { 
    column_names[i]: questions[i] for i in range(df_original.shape[1]-1) 
}

# for example:
col_question_map['mental_vs_physical']

'Do you feel that your employer takes mental health as seriously as physical health?'

In [22]:
col_question_map


{'age': 'Age',
 'anonymity': 'Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources?',
 'benefits': 'Does your employer provide mental health benefits?',
 'care_options': 'Do you know the options for mental health care your employer provides?',
 'country': 'Country',
 'coworkers': 'Would you be willing to discuss a mental health issue with your coworkers?',
 'family_history': 'Do you have a family history of mental illness?',
 'gender': 'Gender',
 'leave': 'How easy is it for you to take medical leave for a mental health condition?',
 'mental_health_consequence': 'Do you think that discussing a mental health issue with your employer would have negative consequences?',
 'mental_health_interview': 'Would you bring up a mental health issue with a potential employer in an interview?',
 'mental_vs_physical': 'Do you feel that your employer takes mental health as seriously as physical health?',
 'num_employees': 'How many employee

In [41]:
def stats(col):
    print df[col].value_counts()/df.shape[0]
    print col_question_map[col]

In [40]:
def tstats(col):
    print df[df['treatment']==1][col].value_counts()/df[df['treatment']==1].shape[0]
    print col_question_map[col]


In [44]:
tstats('obs_consequence')

0    0.799058
1    0.200942
Name: obs_consequence, dtype: float64
Have you heard of or observed negative consequences for coworkers with mental health conditions in your workplace?


In [None]:
nstats('mental_health_consequence')

In [33]:
stats('mental_health_interview')

no       0.800635
maybe    0.164416
yes      0.034948
Name: mental_health_interview, dtype: float64
Would you bring up a mental health issue with a potential employer in an interview?


In [35]:
stats('supervisor')

yes             0.409849
no              0.312153
some_of_them    0.277998
Name: supervisor, dtype: float64
Would you be willing to discuss a mental health issue with your direct supervisor(s)?


#### Load Cleaned Data


In [8]:
df = pd.read_csv("./datasets/2014/clean-mental-health-in-tech-2014.csv", index_col=0)
print df.shape

(1259, 183)


In [9]:
# quick NaN check
counts = df.count()
numrows = df.shape[0]
for col in df.columns:
    if counts[col] != numrows:
        print "{0} has {1} NaNs".format(col, numrows-counts[col])

age has 8 NaNs
state has 515 NaNs
self_employed has 18 NaNs
work_interfere has 264 NaNs
comments has 1095 NaNs


In [10]:
df.head(2)

Unnamed: 0,timestamp,age,gender,country,state,self_employed,family_history,treatment,work_interfere,num_employees,...,phys_health_consequence_no,phys_health_consequence_yes,leave_dont_know,leave_somewhat_difficult,leave_somewhat_easy,leave_very_difficult,leave_very_easy,gender_category_female,gender_category_male,gender_category_other
0,2014-08-27 11:29:31,37.0,Female,United States,IL,,0,1,often,6-25,...,1,0,0,0,1,0,0,1,0,0
1,2014-08-27 11:29:37,44.0,M,United States,IN,,0,0,rarely,1000+,...,1,0,1,0,0,0,0,0,1,0


## Charts


In [13]:
import matplotlib.pyplot as plt
% matplotlib inline

In [14]:
col_question_map['mental_health_consequence']

'Do you think that discussing a mental health issue with your employer would have negative consequences?'

In [20]:
tdf = df.dropna(subset=['self_employed', 'age'])
print tdf.shape
y = tdf['mental_health_consequence']
y.value_counts()

(1233, 183)


no       476
maybe    471
yes      286
Name: mental_health_consequence, dtype: int64

In [21]:
# col_question_map


In [None]:
tdf.plot.bar(x=[], y=None, **kwds)[source]