#### Job Satisfaction

In this notebook, you will be exploring job satisfaction according to the survey results.  Use the cells at the top of the notebook to explore as necessary, and use your findings to solve the questions at the bottom of the notebook.

In [1]:
import pandas as pd
import numpy as np
import JobSatisfaction as t
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_csv('./survey_results_public.csv')
schema = pd.read_csv('./survey_results_schema.csv')
df.head()

Unnamed: 0,Respondent,Professional,ProgramHobby,Country,University,EmploymentStatus,FormalEducation,MajorUndergrad,HomeRemote,CompanySize,...,StackOverflowMakeMoney,Gender,HighestEducationParents,Race,SurveyLong,QuestionsInteresting,QuestionsConfusing,InterestedAnswers,Salary,ExpectedSalary
0,1,Student,"Yes, both",United States,No,"Not employed, and not looking for work",Secondary school,,,,...,Strongly disagree,Male,High school,White or of European descent,Strongly disagree,Strongly agree,Disagree,Strongly agree,,
1,2,Student,"Yes, both",United Kingdom,"Yes, full-time",Employed part-time,Some college/university study without earning ...,Computer science or software engineering,"More than half, but not all, the time",20 to 99 employees,...,Strongly disagree,Male,A master's degree,White or of European descent,Somewhat agree,Somewhat agree,Disagree,Strongly agree,,37500.0
2,3,Professional developer,"Yes, both",United Kingdom,No,Employed full-time,Bachelor's degree,Computer science or software engineering,"Less than half the time, but at least one day ...","10,000 or more employees",...,Disagree,Male,A professional degree,White or of European descent,Somewhat agree,Agree,Disagree,Agree,113750.0,
3,4,Professional non-developer who sometimes write...,"Yes, both",United States,No,Employed full-time,Doctoral degree,A non-computer-focused engineering discipline,"Less than half the time, but at least one day ...","10,000 or more employees",...,Disagree,Male,A doctoral degree,White or of European descent,Agree,Agree,Somewhat agree,Strongly agree,,
4,5,Professional developer,"Yes, I program as a hobby",Switzerland,No,Employed full-time,Master's degree,Computer science or software engineering,Never,10 to 19 employees,...,,,,,,,,,,


In [3]:
df.columns

Index(['Respondent', 'Professional', 'ProgramHobby', 'Country', 'University',
       'EmploymentStatus', 'FormalEducation', 'MajorUndergrad', 'HomeRemote',
       'CompanySize',
       ...
       'StackOverflowMakeMoney', 'Gender', 'HighestEducationParents', 'Race',
       'SurveyLong', 'QuestionsInteresting', 'QuestionsConfusing',
       'InterestedAnswers', 'Salary', 'ExpectedSalary'],
      dtype='object', length=154)

In [10]:
#Space for your code
df['JobSatisfaction'].isnull().mean()

0.20149722542142184

In [12]:
df['JobSatisfaction'].notnull().mean()

0.7985027745785782

In [6]:
df.groupby(['EmploymentStatus']).mean()['JobSatisfaction']

  df.groupby(['EmploymentStatus']).mean()['JobSatisfaction']


EmploymentStatus
Employed full-time                                      6.980608
Employed part-time                                      6.846154
I prefer not to say                                          NaN
Independent contractor, freelancer, or self-employed    7.231985
Not employed, and not looking for work                       NaN
Not employed, but looking for work                           NaN
Retired                                                      NaN
Name: JobSatisfaction, dtype: float64

In [16]:
#Feel free to create new cells as you need them
df.groupby(['CompanySize']).mean()['JobSatisfaction'].sort_values()

  df.groupby(['CompanySize']).mean()['JobSatisfaction'].sort_values()


CompanySize
10,000 or more employees    6.793617
5,000 to 9,999 employees    6.832155
1,000 to 4,999 employees    6.908506
20 to 99 employees          6.997039
Fewer than 10 employees     7.025719
100 to 499 employees        7.029324
500 to 999 employees        7.029967
10 to 19 employees          7.035739
I don't know                7.054622
I prefer not to answer      7.284946
Name: JobSatisfaction, dtype: float64

#### Question 1

**1.** Use the space above to assist in matching each variable (a, b, c, d, e, f, g, or h) as the appropriate key that describes the value in the **job_sol_1** dictionary.

In [17]:
a = 0.734
b = 0.2014
c = 'full-time'
d = 'contractors'
e = 'retired'
f = 'yes'
g = 'no'
h = 'hard to tell'

job_sol_1 = {'The proportion of missing values in the Job Satisfaction column': b,
             'According to EmploymentStatus, which group has the highest average job satisfaction?': d, 
             'In general, do smaller companies appear to have employees with higher job satisfaction?': f}
             
t.jobsat_check1(job_sol_1)

Nice job! That's what we found as well!


In [19]:
df['BoringDetails'].unique()

array(['Disagree', nan, 'Somewhat agree', 'Strongly disagree',
       'Strongly agree', 'Agree'], dtype=object)

In [23]:
df['ProgramHobby'].unique()

array(['Yes, both', 'Yes, I program as a hobby', 'No',
       'Yes, I contribute to open source projects'], dtype=object)

In [25]:
df.groupby(['ProgramHobby']).mean()['JobSatisfaction']

  df.groupby(['ProgramHobby']).mean()['JobSatisfaction']


ProgramHobby
No                                           6.874806
Yes, I contribute to open source projects    7.158649
Yes, I program as a hobby                    6.927150
Yes, both                                    7.189316
Name: JobSatisfaction, dtype: float64

In [21]:
def get_description(column_name, schema=schema):
    '''
    INPUT - schema - pandas dataframe with the schema of the developers survey
            column_name - string - the name of the column you would like to know about
    OUTPUT - 
            desc - string - the description of the column
    '''
    desc = list(schema[schema['Column'] == column_name]['Question'])[0]
    return desc

In [22]:
get_description('BoringDetails')

'I tend to get bored by implementation details'

In [26]:
get_description('AssessJobRemote')

"When you're assessing potential jobs to apply to, how important are each of the following to you? The opportunity to work from home/remotely"

In [28]:
get_description('HomeRemote')

'How often do you work from home or remotely?'

In [27]:
df['HomeRemote'].unique()

array([nan, 'More than half, but not all, the time',
       'Less than half the time, but at least one day each week', 'Never',
       "All or almost all the time (I'm full-time remote)",
       "It's complicated", 'A few days each month', 'About half the time'],
      dtype=object)

In [30]:
df.groupby(['HomeRemote']).mean()['JobSatisfaction'].sort_values()

  df.groupby(['HomeRemote']).mean()['JobSatisfaction'].sort_values()


HomeRemote
Never                                                      6.697127
It's complicated                                           6.942053
More than half, but not all, the time                      6.973684
A few days each month                                      7.096694
About half the time                                        7.125737
Less than half the time, but at least one day each week    7.143786
All or almost all the time (I'm full-time remote)          7.405421
Name: JobSatisfaction, dtype: float64

In [31]:
get_description('CousinEducation')

"Let's pretend you have a distant cousin. They are 24 years old, have a college degree in a field not related to computer programming, and have been working a non-coding job for the last two years. They want your advice on how to switch to a career as a software developer. Which of the following options would you most strongly recommend to your cousin?\nLet's pretend you have a distant cousin named Robert. He is 24 years old, has a college degree in a field not related to computer programming, and has been working a non-coding job for the last two years. He wants your advice on how to switch to a career as a software developer. Which of the following options would you most strongly recommend to Robert?\nLet's pretend you have a distant cousin named Alice. She is 24 years old, has a college degree in a field not related to computer programming, and has been working a non-coding job for the last two years. She wants your advice on how to switch to a career as a software developer. Which 

In [38]:
get_description('FormalEducation')

"Which of the following best describes the highest level of formal education that you've completed?"

In [38]:
get_description('FormalEducation')

"Which of the following best describes the highest level of formal education that you've completed?"

In [38]:
get_description('FormalEducation')

"Which of the following best describes the highest level of formal education that you've completed?"

In [38]:
get_description('FormalEducation')

"Which of the following best describes the highest level of formal education that you've completed?"

In [38]:
get_description('FormalEducation')

"Which of the following best describes the highest level of formal education that you've completed?"

In [38]:
get_description('FormalEducation')

"Which of the following best describes the highest level of formal education that you've completed?"

In [None]:
df['FormalEducation

In [39]:
df['FormalEducation'].unique()

array(['Secondary school',
       "Some college/university study without earning a bachelor's degree",
       "Bachelor's degree", 'Doctoral degree', "Master's degree",
       'Professional degree', 'Primary/elementary school',
       'I prefer not to answer', 'I never completed any formal education'],
      dtype=object)

In [41]:
df.groupby(['FormalEducation']).mean()['JobSatisfaction'].sort_values()

  df.groupby(['FormalEducation']).mean()['JobSatisfaction'].sort_values()


FormalEducation
Bachelor's degree                                                    6.900293
Primary/elementary school                                            6.946237
Master's degree                                                      6.977356
I never completed any formal education                               7.000000
Professional degree                                                  7.075893
Some college/university study without earning a bachelor's degree    7.151268
I prefer not to answer                                               7.211679
Secondary school                                                     7.219512
Doctoral degree                                                      7.438492
Name: JobSatisfaction, dtype: float64

#### Question 2

**2.** Use the space above to assist in matching each variable (a, b, c) as the appropriate key that describes the value in the **job_sol_2** dictionary. Notice you can have the same letter appear more than once.

In [42]:
a = 'yes'
b = 'no'
c = 'hard to tell'

job_sol_2 = {'Do individuals who program outside of work appear to have higher JobSatisfaction?': a,
             'Does flexibility to work outside of the office appear to have an influence on JobSatisfaction?': a, 
             'A friend says a Doctoral degree increases the chance of having job you like, does this seem true?': a}
             
t.jobsat_check2(job_sol_2)

Nice job! That's what we found as well!
