# Encoding Features

Since we need to feed this data to the mode, we need to encode the categorical features

In [1]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.pyplot import pie
import seaborn as sns
import plotly.express as px

In [2]:
df = pd.read_csv("../datasets/final_preprocessed.csv")
df.shape

(1246, 54)

## Analysing the features

In [3]:
df.head()

Unnamed: 0,Timestamp,Are you self-employed?,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided health coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health disorders and options for seeking help?,Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources provided by your employer?,...,Are you openly identified at work as a person with a mental health issue?,"If they knew you suffered from a mental health disorder, how do you think that your team members/co-workers would react?",Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have you observed or experienced a supportive or well handled response to a mental health issue in your current or previous workplace?,"Overall, how well do you think the tech industry supports employees with mental health issues?",What is your age?,What is your gender?,What country do you live in?,What is your race?,What country do you work in?
0,2017,0,100-500,1.0,1.0,No,Yes,No,I don't know,I don't know,...,0.0,10.0,"Yes, I experienced","Yes, I experienced",1.0,27.0,female,United Kingdom,Caucasian,United Kingdom
1,2017,0,100-500,1.0,1.0,Yes,Yes,No,No,I don't know,...,0.0,6.0,"Yes, I observed",Maybe/Not sure,2.0,31.0,male,United Kingdom,Caucasian,United Kingdom
2,2017,0,6-25,1.0,1.0,I don't know,No,I don't know,No,Yes,...,1.0,5.0,"Yes, I experienced","Yes, I experienced",1.0,36.0,male,United States of America,Caucasian,United States of America
3,2017,0,100-500,1.0,0.0,Yes,No,No,I don't know,Yes,...,0.0,4.0,"Yes, I observed","Yes, I observed",2.0,30.0,male,United States of America,Caucasian,United States of America
4,2017,0,6-25,1.0,1.0,Yes,Yes,No,No,Yes,...,1.0,5.0,No,"Yes, I observed",2.0,36.0,female,United States of America,Asian,United States of America


In [4]:
df.isnull().values.any()

False

In [5]:
cnt=0
for i in df.columns:
    print(cnt, df[i].dtype)
    cnt+=1

0 int64
1 int64
2 object
3 object
4 object
5 object
6 object
7 object
8 object
9 object
10 object
11 object
12 object
13 object
14 object
15 object
16 object
17 float64
18 float64
19 int64
20 object
21 object
22 object
23 object
24 object
25 object
26 object
27 object
28 object
29 object
30 object
31 object
32 float64
33 float64
34 object
35 object
36 int64
37 object
38 object
39 object
40 object
41 int64
42 object
43 object
44 float64
45 float64
46 object
47 object
48 float64
49 float64
50 object
51 object
52 object
53 object


In [6]:
obj_df = df.select_dtypes(include=['object']).copy()
obj_df.head()

Unnamed: 0,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided health coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health disorders and options for seeking help?,Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources provided by your employer?,"If a mental health issue prompted you to request a medical leave from work, how easy or difficult would it be to ask for that leave?",Would you feel more comfortable talking to your coworkers about your physical health or your mental health?,...,"If you have a mental health disorder, how often do you feel that it interferes with your work when NOT being treated effectively (i.e., when you are experiencing symptoms)?",Have your observations of how another individual who discussed a mental health issue made you less likely to reveal a mental health issue yourself in your current workplace?,Would you be willing to bring up a physical health issue with a potential employer in an interview?,Would you bring up your mental health with a potential employer in an interview?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have you observed or experienced a supportive or well handled response to a mental health issue in your current or previous workplace?,What is your gender?,What country do you live in?,What is your race?,What country do you work in?
0,100-500,1.0,1.0,No,Yes,No,I don't know,I don't know,I don't know,Same level of comfort for each,...,Sometimes,No,Yes,No,"Yes, I experienced","Yes, I experienced",female,United Kingdom,Caucasian,United Kingdom
1,100-500,1.0,1.0,Yes,Yes,No,No,I don't know,I don't know,Same level of comfort for each,...,Sometimes,No,Yes,No,"Yes, I observed",Maybe/Not sure,male,United Kingdom,Caucasian,United Kingdom
2,6-25,1.0,1.0,I don't know,No,I don't know,No,Yes,Difficult,Same level of comfort for each,...,Sometimes,Yes,Maybe,No,"Yes, I experienced","Yes, I experienced",male,United States of America,Caucasian,United States of America
3,100-500,1.0,0.0,Yes,No,No,I don't know,Yes,Somewhat easy,Physical health,...,Not applicable to me,Maybe,Maybe,No,"Yes, I observed","Yes, I observed",male,United States of America,Caucasian,United States of America
4,6-25,1.0,1.0,Yes,Yes,No,No,Yes,Very easy,Same level of comfort for each,...,Often,No,No,No,No,"Yes, I observed",female,United States of America,Asian,United States of America


In [7]:
obj_df.shape

(1246, 41)

In [8]:
obj_df.rename(columns={obj_df.columns[i]:df.columns.get_loc(obj_df.columns[i]) for i in range(obj_df.shape[1])}, inplace=True)

## 1. Columns which have numeric values

These columns can be directly converted to numeric columns

In [9]:
numer_cols=[3,4,13,15,16,20,28,30,31]

In [10]:
df.loc[df[df.columns[0]] == 2018].groupby([df.columns[4]]).size()

Is your primary role within your company related to tech/IT?
0.0     14
1.0    300
dtype: int64

In [11]:
cleanup_nums={df.columns[i]:{"True":1, "False":0} for i in numer_cols}

In [12]:
df = df.replace(cleanup_nums)

In [13]:
numer_cols=[3,4,13,15,16,20,28,30,31]
for i in numer_cols:
    df[df.columns[i]] = pd.to_numeric(df[df.columns[i]])
    df[df.columns[i]] = df[df.columns[i]].astype('int')

Now the conversion of columns which were already numerical is done successfully

## 2. Columns which have categorical values

Here, we are going to replace the columns which have the following values
- No (replaced with 0)
- Yes (replaced with 1)
- I don't know (replaced with 2)
- Maybe (replaced with 3)


In [14]:
obj_df = df.select_dtypes(include=['object']).copy()
obj_df.head()

Unnamed: 0,How many employees does your company or organization have?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided health coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health disorders and options for seeking help?,Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources provided by your employer?,"If a mental health issue prompted you to request a medical leave from work, how easy or difficult would it be to ask for that leave?",Would you feel more comfortable talking to your coworkers about your physical health or your mental health?,Would you feel comfortable discussing a mental health issue with your direct supervisor(s)?,Would you feel comfortable discussing a mental health issue with your coworkers?,...,"If you have a mental health disorder, how often do you feel that it interferes with your work when NOT being treated effectively (i.e., when you are experiencing symptoms)?",Have your observations of how another individual who discussed a mental health issue made you less likely to reveal a mental health issue yourself in your current workplace?,Would you be willing to bring up a physical health issue with a potential employer in an interview?,Would you bring up your mental health with a potential employer in an interview?,Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have you observed or experienced a supportive or well handled response to a mental health issue in your current or previous workplace?,What is your gender?,What country do you live in?,What is your race?,What country do you work in?
0,100-500,No,Yes,No,I don't know,I don't know,I don't know,Same level of comfort for each,Yes,Yes,...,Sometimes,No,Yes,No,"Yes, I experienced","Yes, I experienced",female,United Kingdom,Caucasian,United Kingdom
1,100-500,Yes,Yes,No,No,I don't know,I don't know,Same level of comfort for each,Maybe,Yes,...,Sometimes,No,Yes,No,"Yes, I observed",Maybe/Not sure,male,United Kingdom,Caucasian,United Kingdom
2,6-25,I don't know,No,I don't know,No,Yes,Difficult,Same level of comfort for each,Yes,Maybe,...,Sometimes,Yes,Maybe,No,"Yes, I experienced","Yes, I experienced",male,United States of America,Caucasian,United States of America
3,100-500,Yes,No,No,I don't know,Yes,Somewhat easy,Physical health,Maybe,Maybe,...,Not applicable to me,Maybe,Maybe,No,"Yes, I observed","Yes, I observed",male,United States of America,Caucasian,United States of America
4,6-25,Yes,Yes,No,No,Yes,Very easy,Same level of comfort for each,Yes,No,...,Often,No,No,No,No,"Yes, I observed",female,United States of America,Asian,United States of America


In [15]:
obj_df.shape

(1246, 32)

In [16]:
yes_no_cols=[]
for i in obj_df.columns:
    if "Yes" in df[i].unique():
        print(df.columns.get_loc(i),df[i].unique())
        yes_no_cols.append(df.columns.get_loc(i))
print('\n',yes_no_cols)

5 ['No' 'Yes' "I don't know" 'Not eligible for coverage / NA']
6 ['Yes' 'No']
7 ['No' "I don't know" 'Yes']
8 ["I don't know" 'No' 'Yes']
9 ["I don't know" 'Yes' 'No']
12 ['Yes' 'Maybe' 'No']
14 ['Yes' 'Maybe' 'No']
34 ['Possibly' 'Yes' 'No' "Don't Know"]
35 ['Possibly' 'Yes' 'No' "Don't Know"]
37 ['No' 'Yes' "I don't know"]
40 ['No' 'Yes' 'Maybe' 'No_Answer']
42 ['Yes' 'Maybe' 'No']
43 ['No' 'Yes' 'Maybe']

 [5, 6, 7, 8, 9, 12, 14, 34, 35, 37, 40, 42, 43]


In [17]:
replace_dict={"No": 0, 'Yes':1, "I don't know":2, "Don't Know":2,"Maybe":2, "Not eligible for coverage / NA":3,'Possibly':3,'No_Answer':3 }

In [18]:
cleanup_nums={df.columns[i]:replace_dict for i in yes_no_cols}

In [19]:
df = df.replace(cleanup_nums)

In [20]:
df.head()

Unnamed: 0,Timestamp,Are you self-employed?,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided health coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health disorders and options for seeking help?,Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources provided by your employer?,...,Are you openly identified at work as a person with a mental health issue?,"If they knew you suffered from a mental health disorder, how do you think that your team members/co-workers would react?",Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have you observed or experienced a supportive or well handled response to a mental health issue in your current or previous workplace?,"Overall, how well do you think the tech industry supports employees with mental health issues?",What is your age?,What is your gender?,What country do you live in?,What is your race?,What country do you work in?
0,2017,0,100-500,1,1,0,1,0,2,2,...,0.0,10.0,"Yes, I experienced","Yes, I experienced",1.0,27.0,female,United Kingdom,Caucasian,United Kingdom
1,2017,0,100-500,1,1,1,1,0,0,2,...,0.0,6.0,"Yes, I observed",Maybe/Not sure,2.0,31.0,male,United Kingdom,Caucasian,United Kingdom
2,2017,0,6-25,1,1,2,0,2,0,1,...,1.0,5.0,"Yes, I experienced","Yes, I experienced",1.0,36.0,male,United States of America,Caucasian,United States of America
3,2017,0,100-500,1,0,1,0,0,2,1,...,0.0,4.0,"Yes, I observed","Yes, I observed",2.0,30.0,male,United States of America,Caucasian,United States of America
4,2017,0,6-25,1,1,1,1,0,0,1,...,1.0,5.0,No,"Yes, I observed",2.0,36.0,female,United States of America,Asian,United States of America


## 3. Other Columns

In [21]:
obj_df = df.select_dtypes(include=['object']).copy()
obj_df.head()

Unnamed: 0,How many employees does your company or organization have?,"If a mental health issue prompted you to request a medical leave from work, how easy or difficult would it be to ask for that leave?",Would you feel more comfortable talking to your coworkers about your physical health or your mental health?,Have your previous employers provided mental health benefits?,Were you aware of the options for mental health care provided by your previous employers?,Did your previous employers ever formally discuss mental health (as part of a wellness campaign or other official communication)?,Did your previous employers provide resources to learn more about mental health disorders and how to seek help?,Was your anonymity protected if you chose to take advantage of mental health or substance abuse treatment resources with previous employers?,Would you have felt more comfortable talking to your previous employer about your physical health or your mental health?,Would you have been willing to discuss your mental health with your direct supervisor(s)?,Would you have been willing to discuss your mental health with your coworkers at previous employers?,"If you have a mental health disorder, how often do you feel that it interferes with your work when being treated effectively?","If you have a mental health disorder, how often do you feel that it interferes with your work when NOT being treated effectively (i.e., when you are experiencing symptoms)?",Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have you observed or experienced a supportive or well handled response to a mental health issue in your current or previous workplace?,What is your gender?,What country do you live in?,What is your race?,What country do you work in?
0,100-500,I don't know,Same level of comfort for each,I don't know,N/A (was not aware),Some did,Some did,"Yes, always",Physical health,"Yes, all of my previous supervisors","No, at none of my previous employers",Sometimes,Sometimes,"Yes, I experienced","Yes, I experienced",female,United Kingdom,Caucasian,United Kingdom
1,100-500,I don't know,Same level of comfort for each,Some did,I was aware of some,None did,None did,I don't know,Physical health,"No, none of my previous supervisors",At some of my previous employers,Not applicable to me,Sometimes,"Yes, I observed",Maybe/Not sure,male,United Kingdom,Caucasian,United Kingdom
2,6-25,Difficult,Same level of comfort for each,Some did,N/A (was not aware),None did,None did,I don't know,Physical health,"No, none of my previous supervisors",At some of my previous employers,Sometimes,Sometimes,"Yes, I experienced","Yes, I experienced",male,United States of America,Caucasian,United States of America
3,100-500,Somewhat easy,Physical health,"No, none did",I was aware of some,None did,Some did,"Yes, always",Physical health,Some of my previous supervisors,At some of my previous employers,Rarely,Not applicable to me,"Yes, I observed","Yes, I observed",male,United States of America,Caucasian,United States of America
4,6-25,Very easy,Same level of comfort for each,Some did,I was aware of some,None did,None did,"Yes, always",Same level of comfort for each,Some of my previous supervisors,At some of my previous employers,Rarely,Often,No,"Yes, I observed",female,United States of America,Asian,United States of America


In [22]:
obj_df.shape

(1246, 19)

In [23]:
for i in obj_df.columns:
    print(df.columns.get_loc(i),df[i].unique())

2 ['100-500' '6-25' '26-100' 'More than 1000' '500-1000' '1-5']
10 ["I don't know" 'Difficult' 'Somewhat easy' 'Very easy'
 'Neither easy nor difficult' 'Somewhat difficult']
11 ['Same level of comfort for each' 'Physical health' 'Mental health']
21 ["I don't know" 'Some did' 'No, none did' 'Yes, they all did']
22 ['N/A (was not aware)' 'I was aware of some' 'N/A (none offered)'
 'Yes, I was aware of all of them' 'No, I only became aware later']
23 ['Some did' 'None did' "I don't know" 'Yes, they all did']
24 ['Some did' 'None did' 'Yes, they all did']
25 ['Yes, always' "I don't know" 'Sometimes' 'No']
26 ['Physical health' 'Same level of comfort for each' 'Mental health']
27 ['Yes, all of my previous supervisors'
 'No, none of my previous supervisors' 'Some of my previous supervisors'
 "I don't know"]
29 ['No, at none of my previous employers' 'At some of my previous employers'
 'Yes, at all of my previous employers' 'Some of my previous employers']
38 ['Sometimes' 'Not applicable to 

### Dealing with `How many employees does your company or organization have?` attribute

Here, each range will be replaced by the average, since it will be indicative of the range

In [24]:
print(df[df.columns[2]].unique())

['100-500' '6-25' '26-100' 'More than 1000' '500-1000' '1-5']


In [25]:
replace_dict={'1-5':3, '6-25':15, '26-100':63, '100-500':300, '500-1000':750, 'More than 1000':1000 }

In [26]:
df = df.replace({df.columns[2]:replace_dict})

In [27]:
print(df[df.columns[2]].unique())

[ 300   15   63 1000  750    3]


### Other columns

The rest of the columns will be encoded using **Label Encoding**

In [28]:
obj_df = df.select_dtypes(include=['object']).copy()
obj_df.shape

(1246, 18)

In [29]:
for i in obj_df.columns:
    df[i] = df[i].astype('category')
    df[i] = df[i].cat.codes

In [30]:
df.head()

Unnamed: 0,Timestamp,Are you self-employed?,How many employees does your company or organization have?,Is your employer primarily a tech company/organization?,Is your primary role within your company related to tech/IT?,Does your employer provide mental health benefits as part of healthcare coverage?,Do you know the options for mental health care available under your employer-provided health coverage?,"Has your employer ever formally discussed mental health (for example, as part of a wellness campaign or other official communication)?",Does your employer offer resources to learn more about mental health disorders and options for seeking help?,Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources provided by your employer?,...,Are you openly identified at work as a person with a mental health issue?,"If they knew you suffered from a mental health disorder, how do you think that your team members/co-workers would react?",Have you observed or experienced an unsupportive or badly handled response to a mental health issue in your current or previous workplace?,Have you observed or experienced a supportive or well handled response to a mental health issue in your current or previous workplace?,"Overall, how well do you think the tech industry supports employees with mental health issues?",What is your age?,What is your gender?,What country do you live in?,What is your race?,What country do you work in?
0,2017,0,300,1,1,0,1,0,2,2,...,0.0,10.0,2,2,1.0,27.0,0,50,3,52
1,2017,0,300,1,1,1,1,0,0,2,...,0.0,6.0,3,0,2.0,31.0,3,50,3,52
2,2017,0,15,1,1,2,0,2,0,1,...,1.0,5.0,2,2,1.0,36.0,3,51,3,53
3,2017,0,300,1,0,1,0,0,2,1,...,0.0,4.0,3,3,2.0,30.0,3,51,3,53
4,2017,0,15,1,1,1,1,0,0,1,...,1.0,5.0,1,3,2.0,36.0,0,51,2,53


In [31]:
cnt=0
for i in df.columns:
    if df[i].dtype=='object':
        print("YES")

#### We have successfully encoded the entire dataset

In [32]:
df.to_csv('../datasets/final.csv')

### Now this dataset will be used to create the model