#### Job Satisfaction

In this notebook, you will be exploring job satisfaction according to the survey results.  Use the cells at the top of the notebook to explore as necessary, and use your findings to solve the questions at the bottom of the notebook.

In [1]:
import pandas as pd
import numpy as np
import JobSatisfaction as t
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_csv('./survey_results_public.csv')
schema = pd.read_csv('./survey_results_schema.csv')
df.head()

Unnamed: 0,Respondent,Professional,ProgramHobby,Country,University,EmploymentStatus,FormalEducation,MajorUndergrad,HomeRemote,CompanySize,...,StackOverflowMakeMoney,Gender,HighestEducationParents,Race,SurveyLong,QuestionsInteresting,QuestionsConfusing,InterestedAnswers,Salary,ExpectedSalary
0,1,Student,"Yes, both",United States,No,"Not employed, and not looking for work",Secondary school,,,,...,Strongly disagree,Male,High school,White or of European descent,Strongly disagree,Strongly agree,Disagree,Strongly agree,,
1,2,Student,"Yes, both",United Kingdom,"Yes, full-time",Employed part-time,Some college/university study without earning ...,Computer science or software engineering,"More than half, but not all, the time",20 to 99 employees,...,Strongly disagree,Male,A master's degree,White or of European descent,Somewhat agree,Somewhat agree,Disagree,Strongly agree,,37500.0
2,3,Professional developer,"Yes, both",United Kingdom,No,Employed full-time,Bachelor's degree,Computer science or software engineering,"Less than half the time, but at least one day ...","10,000 or more employees",...,Disagree,Male,A professional degree,White or of European descent,Somewhat agree,Agree,Disagree,Agree,113750.0,
3,4,Professional non-developer who sometimes write...,"Yes, both",United States,No,Employed full-time,Doctoral degree,A non-computer-focused engineering discipline,"Less than half the time, but at least one day ...","10,000 or more employees",...,Disagree,Male,A doctoral degree,White or of European descent,Agree,Agree,Somewhat agree,Strongly agree,,
4,5,Professional developer,"Yes, I program as a hobby",Switzerland,No,Employed full-time,Master's degree,Computer science or software engineering,Never,10 to 19 employees,...,,,,,,,,,,


In [4]:
#Space for your code
df['JobSatisfaction'].isnull().sum()/df.shape[0]

0.20149722542142184

In [10]:
df['JobSatisfaction']

0         NaN
1         NaN
2         9.0
3         3.0
4         8.0
5         NaN
6         6.0
7         7.0
8         6.0
9         8.0
10        9.0
11        6.0
12        NaN
13        6.0
14        8.0
15        8.0
16        NaN
17        8.0
18        3.0
19        8.0
20        7.0
21        8.0
22        9.0
23        8.0
24        9.0
25        7.0
26        6.0
27        9.0
28        7.0
29        6.0
         ... 
19072     5.0
19073     6.0
19074     NaN
19075    10.0
19076     8.0
19077     9.0
19078     NaN
19079     8.0
19080     7.0
19081     8.0
19082     8.0
19083     7.0
19084     7.0
19085     3.0
19086     8.0
19087     8.0
19088     8.0
19089     6.0
19090     8.0
19091     9.0
19092     7.0
19093     9.0
19094     NaN
19095     8.0
19096     6.0
19097     8.0
19098     NaN
19099     5.0
19100     9.0
19101     8.0
Name: JobSatisfaction, Length: 19102, dtype: float64

In [17]:
#More space for code
employmentStatus = sorted(list(set(df['EmploymentStatus'])))
for status in employmentStatus:
    temp = df[df['EmploymentStatus']==status]
    missingRate = temp['JobSatisfaction'].isnull().sum()/temp.shape[0]
    temp = temp[temp['JobSatisfaction'].notnull()]
    jobSatisfaction = temp['JobSatisfaction'].sum()/temp.shape[0]
    print(status,':',jobSatisfaction, "         ","Missing Rate: ", missingRate)
# May be bias by the missing value
# At least know for those filled 

Employed full-time : 6.980607926125433           Missing Rate:  0.048333943610399124
Employed part-time : 6.846153846153846           Missing Rate:  0.4212328767123288
I prefer not to say : nan           Missing Rate:  1.0
Independent contractor, freelancer, or self-employed : 7.231984829329962           Missing Rate:  0.15219721329046088
Not employed, and not looking for work : nan           Missing Rate:  1.0
Not employed, but looking for work : nan           Missing Rate:  1.0
Retired : nan           Missing Rate:  1.0


  import sys


In [21]:
temp = df[df['JobSatisfaction'].notnull()]
temp.groupby('EmploymentStatus')[['JobSatisfaction']].mean()
# This is your answer.

Unnamed: 0_level_0,JobSatisfaction
EmploymentStatus,Unnamed: 1_level_1
Employed full-time,6.980608
Employed part-time,6.846154
"Independent contractor, freelancer, or self-employed",7.231985


In [27]:
#Additional space for your additional code
temp = df[df['CompanySize'].notnull() & df['JobSatisfaction'].notnull()]
temp = temp.groupby('CompanySize')[['JobSatisfaction']].mean()
temp = temp.sort_values('JobSatisfaction')
temp

Unnamed: 0_level_0,JobSatisfaction
CompanySize,Unnamed: 1_level_1
"10,000 or more employees",6.793617
"5,000 to 9,999 employees",6.832155
"1,000 to 4,999 employees",6.908506
20 to 99 employees,6.997039
Fewer than 10 employees,7.025719
100 to 499 employees,7.029324
500 to 999 employees,7.029967
10 to 19 employees,7.035739
I don't know,7.054622
I prefer not to answer,7.284946


In [30]:
#Feel free to create new cells as you need them
for col in df.columns:
    print(col)

Respondent
Professional
ProgramHobby
Country
University
EmploymentStatus
FormalEducation
MajorUndergrad
HomeRemote
CompanySize
CompanyType
YearsProgram
YearsCodedJob
YearsCodedJobPast
DeveloperType
WebDeveloperType
MobileDeveloperType
NonDeveloperType
CareerSatisfaction
JobSatisfaction
ExCoderReturn
ExCoderNotForMe
ExCoderBalance
ExCoder10Years
ExCoderBelonged
ExCoderSkills
ExCoderWillNotCode
ExCoderActive
PronounceGIF
ProblemSolving
BuildingThings
LearningNewTech
BoringDetails
JobSecurity
DiversityImportant
AnnoyingUI
FriendsDevelopers
RightWrongWay
UnderstandComputers
SeriousWork
InvestTimeTools
WorkPayCare
KinshipDevelopers
ChallengeMyself
CompetePeers
ChangeWorld
JobSeekingStatus
HoursPerWeek
LastNewJob
AssessJobIndustry
AssessJobRole
AssessJobExp
AssessJobDept
AssessJobTech
AssessJobProjects
AssessJobCompensation
AssessJobOffice
AssessJobCommute
AssessJobRemote
AssessJobLeaders
AssessJobProfDevel
AssessJobDiversity
AssessJobProduct
AssessJobFinances
ImportantBenefits
ClickyKeys
Jo

In [37]:
set(df['ProgramHobby'])

{'No',
 'Yes, I contribute to open source projects',
 'Yes, I program as a hobby',
 'Yes, both'}

In [38]:
set(df['EmploymentStatus'])

{'Employed full-time',
 'Employed part-time',
 'I prefer not to say',
 'Independent contractor, freelancer, or self-employed',
 'Not employed, and not looking for work',
 'Not employed, but looking for work',
 'Retired'}

In [39]:
set(df['FormalEducation'])

{"Bachelor's degree",
 'Doctoral degree',
 'I never completed any formal education',
 'I prefer not to answer',
 "Master's degree",
 'Primary/elementary school',
 'Professional degree',
 'Secondary school',
 "Some college/university study without earning a bachelor's degree"}

In [40]:
col = 'ProgramHobby'
temp = df[df[col].notnull() & df['JobSatisfaction'].notnull()]
temp = temp.groupby(col)[['JobSatisfaction']].mean()
temp = temp.sort_values('JobSatisfaction')
temp

Unnamed: 0_level_0,JobSatisfaction
ProgramHobby,Unnamed: 1_level_1
No,6.874806
"Yes, I program as a hobby",6.92715
"Yes, I contribute to open source projects",7.158649
"Yes, both",7.189316


In [41]:
col = 'EmploymentStatus'
temp = df[df[col].notnull() & df['JobSatisfaction'].notnull()]
temp = temp.groupby(col)[['JobSatisfaction']].mean()
temp = temp.sort_values('JobSatisfaction')
temp

Unnamed: 0_level_0,JobSatisfaction
EmploymentStatus,Unnamed: 1_level_1
Employed part-time,6.846154
Employed full-time,6.980608
"Independent contractor, freelancer, or self-employed",7.231985


In [42]:
col = 'FormalEducation'
temp = df[df[col].notnull() & df['JobSatisfaction'].notnull()]
temp = temp.groupby(col)[['JobSatisfaction']].mean()
temp = temp.sort_values('JobSatisfaction')
temp

Unnamed: 0_level_0,JobSatisfaction
FormalEducation,Unnamed: 1_level_1
Bachelor's degree,6.900293
Primary/elementary school,6.946237
Master's degree,6.977356
I never completed any formal education,7.0
Professional degree,7.075893
Some college/university study without earning a bachelor's degree,7.151268
I prefer not to answer,7.211679
Secondary school,7.219512
Doctoral degree,7.438492


#### Question 1

**1.** Use the space above to assist in matching each variable (**a**, **b**, **c**, **d**, **e**, **f**, **g**, or **h** ) as the appropriate key that describes the value in the **job_sol_1** dictionary.

In [29]:
a = 0.734
b = 0.2014
c = 'full-time'
d = 'contractors'
e = 'retired'
f = 'yes'
g = 'no'
h = 'hard to tell'

job_sol_1 = {'The proportion of missing values in the Job Satisfaction column':b, #letter here,
             'According to EmploymentStatus, which group has the highest average job satisfaction?': d, #letter here, 
             'In general, do smaller companies appear to have employees with higher job satisfaction?':f #letter here
            }
             
t.jobsat_check1(job_sol_1)

Nice job! That's what we found as well!


#### Question 2

**2.** Use the space above to assist in matching each variable (**a**, **b**, **c** ) as the appropriate key that describes the value in the **job_sol_2** dictionary. Notice you can have the same letter appear more than once.

In [45]:
a = 'yes'
b = 'no'
c = 'hard to tell'

job_sol_2 = {'Do individuals who program outside of work appear to have higher JobSatisfaction?': a,
             'Does flexibility to work outside of the office appear to have an influence on JobSatisfaction?': a, 
             'A friend says a Doctoral degree increases the chance of having job you like, does this seem true?':a }
             
t.jobsat_check2(job_sol_2)

Nice job! That's what we found as well!
