#### Job Satisfaction

In this notebook, you will be exploring job satisfaction according to the survey results.  Use the cells at the top of the notebook to explore as necessary, and use your findings to solve the questions at the bottom of the notebook.

In [2]:
import pandas as pd
import numpy as np
import JobSatisfaction as t
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_csv('./survey_results_public.csv')
schema = pd.read_csv('./survey_results_schema.csv')
df.head()

Unnamed: 0,Respondent,Professional,ProgramHobby,Country,University,EmploymentStatus,FormalEducation,MajorUndergrad,HomeRemote,CompanySize,...,StackOverflowMakeMoney,Gender,HighestEducationParents,Race,SurveyLong,QuestionsInteresting,QuestionsConfusing,InterestedAnswers,Salary,ExpectedSalary
0,1,Student,"Yes, both",United States,No,"Not employed, and not looking for work",Secondary school,,,,...,Strongly disagree,Male,High school,White or of European descent,Strongly disagree,Strongly agree,Disagree,Strongly agree,,
1,2,Student,"Yes, both",United Kingdom,"Yes, full-time",Employed part-time,Some college/university study without earning ...,Computer science or software engineering,"More than half, but not all, the time",20 to 99 employees,...,Strongly disagree,Male,A master's degree,White or of European descent,Somewhat agree,Somewhat agree,Disagree,Strongly agree,,37500.0
2,3,Professional developer,"Yes, both",United Kingdom,No,Employed full-time,Bachelor's degree,Computer science or software engineering,"Less than half the time, but at least one day ...","10,000 or more employees",...,Disagree,Male,A professional degree,White or of European descent,Somewhat agree,Agree,Disagree,Agree,113750.0,
3,4,Professional non-developer who sometimes write...,"Yes, both",United States,No,Employed full-time,Doctoral degree,A non-computer-focused engineering discipline,"Less than half the time, but at least one day ...","10,000 or more employees",...,Disagree,Male,A doctoral degree,White or of European descent,Agree,Agree,Somewhat agree,Strongly agree,,
4,5,Professional developer,"Yes, I program as a hobby",Switzerland,No,Employed full-time,Master's degree,Computer science or software engineering,Never,10 to 19 employees,...,,,,,,,,,,


In [21]:
js_missing_prop = df['JobSatisfaction'].isna().mean()
js_missing_prop

0.20149722542142184

In [4]:
df_new = df[~df['JobSatisfaction'].isna()]
df_new['EmploymentStatus'].value_counts()

Employed full-time                                      12995
Independent contractor, freelancer, or self-employed     1582
Employed part-time                                        676
Name: EmploymentStatus, dtype: int64

In [23]:
df_new.groupby('EmploymentStatus').mean()['JobSatisfaction']

EmploymentStatus
Employed full-time                                      6.980608
Employed part-time                                      6.846154
Independent contractor, freelancer, or self-employed    7.231985
Name: JobSatisfaction, dtype: float64

In [6]:
df['CompanySize'].value_counts()

20 to 99 employees          3214
100 to 499 employees        2858
10,000 or more employees    1998
10 to 19 employees          1544
1,000 to 4,999 employees    1482
Fewer than 10 employees     1456
500 to 999 employees         946
5,000 to 9,999 employees     606
I don't know                 311
I prefer not to answer       238
Name: CompanySize, dtype: int64

In [24]:
df.groupby('CompanySize').mean()['JobSatisfaction'].sort_values()

CompanySize
10,000 or more employees    6.793617
5,000 to 9,999 employees    6.832155
1,000 to 4,999 employees    6.908506
20 to 99 employees          6.997039
Fewer than 10 employees     7.025719
100 to 499 employees        7.029324
500 to 999 employees        7.029967
10 to 19 employees          7.035739
I don't know                7.054622
I prefer not to answer      7.284946
Name: JobSatisfaction, dtype: float64

#### Question 1

**1.** Use the space above to assist in matching each variable (**a**, **b**, **c**, **d**, **e**, **f**, **g**, or **h** ) as the appropriate key that describes the value in the **job_sol_1** dictionary.

In [8]:
a = 0.734
b = 0.2014
c = 'full-time'
d = 'contractors'
e = 'retired'
f = 'yes'
g = 'no'
h = 'hard to tell'

job_sol_1 = {'The proportion of missing values in the Job Satisfaction column': b,
             'According to EmploymentStatus, which group has the highest average job satisfaction?': d, 
             'In general, do smaller companies appear to have employees with higher job satisfaction?': f}
             
t.jobsat_check1(job_sol_1)

Nice job! That's what we found as well!


#### Question 2

**2.** Use the space above to assist in matching each variable (**a**, **b**, **c** ) as the appropriate key that describes the value in the **job_sol_2** dictionary. Notice you can have the same letter appear more than once.

In [16]:
schema

Unnamed: 0,Column,Question
0,Respondent,Respondent ID number
1,Professional,Which of the following best describes you?
2,ProgramHobby,Do you program as a hobby or contribute to ope...
3,Country,In which country do you currently live?
4,University,"Are you currently enrolled in a formal, degree..."
5,EmploymentStatus,Which of the following best describes your cur...
6,FormalEducation,Which of the following best describes the high...
7,MajorUndergrad,Which of the following best describes your mai...
8,HomeRemote,How often do you work from home or remotely?
9,CompanySize,"In terms of the number of employees, how large..."


In [26]:
df['ProgramHobby'].value_counts()

Yes, I program as a hobby                    9260
Yes, both                                    5033
No                                           3661
Yes, I contribute to open source projects    1148
Name: ProgramHobby, dtype: int64

In [27]:
df_no_hobby = df[df['ProgramHobby'] == 'No']
df_hobby = df[df['ProgramHobby'] != 'No']
df_hobby['JobSatisfaction'].mean(), df_no_hobby['JobSatisfaction'].mean()

(7.0345085647763179, 6.8748063216609854)

In [17]:
df['HomeRemote'].value_counts()

A few days each month                                      5876
Never                                                      5288
All or almost all the time (I'm full-time remote)          1922
Less than half the time, but at least one day each week    1464
More than half, but not all, the time                       676
It's complicated                                            633
About half the time                                         612
Name: HomeRemote, dtype: int64

In [18]:
df_home = df[df['HomeRemote'] != 'Never']
df_no_home = df[df['HomeRemote'] == 'Never']
df_home['JobSatisfaction'].mean(), df_no_home['JobSatisfaction'].mean()

(7.1403005072255716, 6.6971273938384677)

In [9]:
df_doc = df[df['FormalEducation'] == 'Doctoral degree']
df_no_doc = df[df['FormalEducation'] != 'Doctoral degree']
df_doc_new = df_doc[~df_doc['JobSatisfaction'].isna()]
df_no_doc_new = df_no_doc[~df_no_doc['JobSatisfaction'].isna()]
df_doc_new['JobSatisfaction'].mean(), df_no_doc_new['JobSatisfaction'].mean()

(7.4384920634920633, 6.9857617465590884)

In [20]:
a = 'yes'
b = 'no'
c = 'hard to tell'

job_sol_2 = {'Do individuals who program outside of work appear to have higher JobSatisfaction?': a,
             'Does flexibility to work outside of the office appear to have an influence on JobSatisfaction?': a,
             'A friend says a Doctoral degree increases the chance of having job you like, does this seem true?': a}
             
t.jobsat_check2(job_sol_2)

Nice job! That's what we found as well!
