# Reasons for Resignation by Age and Employment Period

Using employee exit surveys gathered from the Department of Education, Training, and Employment (DETE), and the Technical and Further Education Institute (TAFE) in Queensland, Australia, this project will examine reasons for resignation by age and employment period. Are younger employees or employees who have worked for these institutions for shorter periods more likely to resign from job dissatisfaction? 
Survey data may be found at the following links:
* [TAFE Survey](https://data.gov.au/dataset/ds-qld-89970a3b-182b-41ea-aea2-6f9f17b5907e/details?q=exit%20survey)
* [DETE Survey](https://data.gov.au/dataset/ds-qld-fe96ff30-d157-4a81-851d-215f2a0fe26d/details?q=exit%20survey)

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#imports pandas and numpy
tafe_survey = pd.read_csv('tafe_survey.csv')
dete_survey = pd.read_csv('dete_survey.csv', na_values='Not Stated')
#reads in the .csvs as dfs with 'Not Stated' entries in dete_survey being converted to nan

In [2]:
tafe_survey.info()
#returns info on tafe_survey df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 702 entries, 0 to 701
Data columns (total 72 columns):
Record ID                                                                                                                                                        702 non-null float64
Institute                                                                                                                                                        702 non-null object
WorkArea                                                                                                                                                         702 non-null object
CESSATION YEAR                                                                                                                                                   695 non-null float64
Reason for ceasing employment                                                                                                                                    701 non-

In [3]:
dete_survey.info()
#returns info on dete_survey df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 822 entries, 0 to 821
Data columns (total 56 columns):
ID                                     822 non-null int64
SeparationType                         822 non-null object
Cease Date                             788 non-null object
DETE Start Date                        749 non-null float64
Role Start Date                        724 non-null float64
Position                               817 non-null object
Classification                         455 non-null object
Region                                 717 non-null object
Business Unit                          126 non-null object
Employment Status                      817 non-null object
Career move to public sector           822 non-null bool
Career move to private sector          822 non-null bool
Interpersonal conflicts                822 non-null bool
Job dissatisfaction                    822 non-null bool
Dissatisfaction with the department    822 non-null bool
Physical work envir

In [4]:
tafe_survey.head()
#prints first 5 rows of tafe_survey

Unnamed: 0,Record ID,Institute,WorkArea,CESSATION YEAR,Reason for ceasing employment,Contributing Factors. Career Move - Public Sector,Contributing Factors. Career Move - Private Sector,Contributing Factors. Career Move - Self-employment,Contributing Factors. Ill Health,Contributing Factors. Maternity/Family,...,Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination?,Workplace. Topic:Does your workplace promote and practice the principles of employment equity?,Workplace. Topic:Does your workplace value the diversity of its employees?,Workplace. Topic:Would you recommend the Institute as an employer to others?,Gender. What is your Gender?,CurrentAge. Current Age,Employment Type. Employment Type,Classification. Classification,LengthofServiceOverall. Overall Length of Service at Institute (in years),LengthofServiceCurrent. Length of Service at current workplace (in years)
0,6.34133e+17,Southern Queensland Institute of TAFE,Non-Delivery (corporate),2010.0,Contract Expired,,,,,,...,Yes,Yes,Yes,Yes,Female,26 30,Temporary Full-time,Administration (AO),1-2,1-2
1,6.341337e+17,Mount Isa Institute of TAFE,Non-Delivery (corporate),2010.0,Retirement,-,-,-,-,-,...,Yes,Yes,Yes,Yes,,,,,,
2,6.341388e+17,Mount Isa Institute of TAFE,Delivery (teaching),2010.0,Retirement,-,-,-,-,-,...,Yes,Yes,Yes,Yes,,,,,,
3,6.341399e+17,Mount Isa Institute of TAFE,Non-Delivery (corporate),2010.0,Resignation,-,-,-,-,-,...,Yes,Yes,Yes,Yes,,,,,,
4,6.341466e+17,Southern Queensland Institute of TAFE,Delivery (teaching),2010.0,Resignation,-,Career Move - Private Sector,-,-,-,...,Yes,Yes,Yes,Yes,Male,41 45,Permanent Full-time,Teacher (including LVT),3-4,3-4


In [5]:
dete_survey.head()
#prints first 5 rows of dete_survey

Unnamed: 0,ID,SeparationType,Cease Date,DETE Start Date,Role Start Date,Position,Classification,Region,Business Unit,Employment Status,...,Kept informed,Wellness programs,Health & Safety,Gender,Age,Aboriginal,Torres Strait,South Sea,Disability,NESB
0,1,Ill Health Retirement,08/2012,1984.0,2004.0,Public Servant,A01-A04,Central Office,Corporate Strategy and Peformance,Permanent Full-time,...,N,N,N,Male,56-60,,,,,Yes
1,2,Voluntary Early Retirement (VER),08/2012,,,Public Servant,AO5-AO7,Central Office,Corporate Strategy and Peformance,Permanent Full-time,...,N,N,N,Male,56-60,,,,,
2,3,Voluntary Early Retirement (VER),05/2012,2011.0,2011.0,Schools Officer,,Central Office,Education Queensland,Permanent Full-time,...,N,N,N,Male,61 or older,,,,,
3,4,Resignation-Other reasons,05/2012,2005.0,2006.0,Teacher,Primary,Central Queensland,,Permanent Full-time,...,A,N,A,Female,36-40,,,,,
4,5,Age Retirement,05/2012,1970.0,1989.0,Head of Curriculum/Head of Special Education,,South East,,Permanent Full-time,...,N,A,M,Female,61 or older,,,,,


In [6]:
tafe_survey.isnull().sum()

Record ID                                                                      0
Institute                                                                      0
WorkArea                                                                       0
CESSATION YEAR                                                                 7
Reason for ceasing employment                                                  1
                                                                            ... 
CurrentAge. Current Age                                                      106
Employment Type. Employment Type                                             106
Classification. Classification                                               106
LengthofServiceOverall. Overall Length of Service at Institute (in years)    106
LengthofServiceCurrent. Length of Service at current workplace (in years)    106
Length: 72, dtype: int64

In [7]:
dete_survey.isnull().sum()

ID                                       0
SeparationType                           0
Cease Date                              34
DETE Start Date                         73
Role Start Date                         98
Position                                 5
Classification                         367
Region                                 105
Business Unit                          696
Employment Status                        5
Career move to public sector             0
Career move to private sector            0
Interpersonal conflicts                  0
Job dissatisfaction                      0
Dissatisfaction with the department      0
Physical work environment                0
Lack of recognition                      0
Lack of job security                     0
Work location                            0
Employment conditions                    0
Maternity/family                         0
Relocation                               0
Study/Travel                             0
Ill Health 

In [8]:
tafe_survey['Reason for ceasing employment'].value_counts()
#returns counts on the values for ['Reason for ceasing employment'] column

Resignation                 340
Contract Expired            127
Retrenchment/ Redundancy    104
Retirement                   82
Transfer                     25
Termination                  23
Name: Reason for ceasing employment, dtype: int64

In [9]:
tafe_survey['LengthofServiceOverall. Overall Length of Service at Institute (in years)'].value_counts()
#returns counts on the values for ['LengthofServiceOverall. Overall Length of Service at Institute (in years)'] column

Less than 1 year      147
1-2                   102
3-4                    96
11-20                  89
More than 20 years     71
5-6                    48
7-10                   43
Name: LengthofServiceOverall. Overall Length of Service at Institute (in years), dtype: int64

In [10]:
dete_survey['SeparationType'].value_counts()
#returns counts on the values for ['SeparationType'] column

Age Retirement                          285
Resignation-Other reasons               150
Resignation-Other employer               91
Resignation-Move overseas/interstate     70
Voluntary Early Retirement (VER)         67
Ill Health Retirement                    61
Other                                    49
Contract Expired                         34
Termination                              15
Name: SeparationType, dtype: int64

In [11]:
dete_survey['Cease Date'].value_counts()
#returns counts on the values for ['Cease Date'] column

2012       344
2013       200
01/2014     43
12/2013     40
09/2013     34
06/2013     27
07/2013     22
10/2013     20
11/2013     16
08/2013     12
05/2013      7
05/2012      6
04/2014      2
02/2014      2
07/2014      2
08/2012      2
04/2013      2
09/2010      1
2010         1
07/2006      1
11/2012      1
2014         1
09/2014      1
07/2012      1
Name: Cease Date, dtype: int64

In [12]:
dete_survey['DETE Start Date'].value_counts()
#returns counts on the values for ['DETE Start Date'] column

2011.0    40
2007.0    34
2008.0    31
2010.0    27
2012.0    27
2009.0    24
2006.0    23
2013.0    21
1970.0    21
1975.0    21
1990.0    20
2005.0    20
1999.0    19
1996.0    19
1992.0    18
2004.0    18
1991.0    18
2000.0    18
1989.0    17
1976.0    15
1988.0    15
2002.0    15
2003.0    15
1978.0    15
1995.0    14
1979.0    14
1974.0    14
1980.0    14
1998.0    14
1997.0    14
1993.0    13
1986.0    12
1972.0    12
1977.0    11
1971.0    10
1994.0    10
1969.0    10
2001.0    10
1984.0    10
1981.0     9
1983.0     9
1973.0     8
1985.0     8
1987.0     7
1982.0     4
1963.0     4
1968.0     3
1967.0     2
1965.0     1
1966.0     1
Name: DETE Start Date, dtype: int64

Many of the columns in these dataframes are not relevant to our above-stated questions. We'll drop these.

In [13]:
tafe_survey.drop(tafe_survey.columns[17:66], axis = 1, inplace=True)
#drops irrelevant columns from tafe_survey

In [14]:
dete_survey.drop(dete_survey.columns[28:49], axis = 1, inplace=True)
#drops irrelevant columns from dete_survey

Column names should be consistent before combining datasets.

In [15]:
tafe_survey.rename({'Record ID': 'id','CESSATION YEAR': 'cease_date','Reason for ceasing employment': 'separationtype','Gender. What is your Gender?': 'gender','CurrentAge. Current Age': 'age','Employment Type. Employment Type': 'employment_status','Classification. Classification': 'position','LengthofServiceOverall. Overall Length of Service at Institute (in years)': 'institute_service','LengthofServiceCurrent. Length of Service at current workplace (in years)': 'role_service'}, axis=1, inplace=True)
#renames many of the TAFE survey columns according to a dictionary

In [16]:
dete_survey.columns = dete_survey.columns.str.lower().str.strip().str.replace(' ','_')
#sets DETE survey columns to all lowercase, strips whitespace, and replaces spaces with underscores

In [17]:
tafe_survey.columns
#displays tafe_survey columns to confirm column name changes

Index(['id', 'Institute', 'WorkArea', 'cease_date', 'separationtype',
       'Contributing Factors. Career Move - Public Sector ',
       'Contributing Factors. Career Move - Private Sector ',
       'Contributing Factors. Career Move - Self-employment',
       'Contributing Factors. Ill Health',
       'Contributing Factors. Maternity/Family',
       'Contributing Factors. Dissatisfaction',
       'Contributing Factors. Job Dissatisfaction',
       'Contributing Factors. Interpersonal Conflict',
       'Contributing Factors. Study', 'Contributing Factors. Travel',
       'Contributing Factors. Other', 'Contributing Factors. NONE', 'gender',
       'age', 'employment_status', 'position', 'institute_service',
       'role_service'],
      dtype='object')

In [18]:
dete_survey.columns
#displays dete_survey columns to confirm column name changes

Index(['id', 'separationtype', 'cease_date', 'dete_start_date',
       'role_start_date', 'position', 'classification', 'region',
       'business_unit', 'employment_status', 'career_move_to_public_sector',
       'career_move_to_private_sector', 'interpersonal_conflicts',
       'job_dissatisfaction', 'dissatisfaction_with_the_department',
       'physical_work_environment', 'lack_of_recognition',
       'lack_of_job_security', 'work_location', 'employment_conditions',
       'maternity/family', 'relocation', 'study/travel', 'ill_health',
       'traumatic_incident', 'work_life_balance', 'workload',
       'none_of_the_above', 'gender', 'age', 'aboriginal', 'torres_strait',
       'south_sea', 'disability', 'nesb'],
      dtype='object')

Now that our columns have more consistent labeling (more on this later), we'll clean up some of their contents. We saw in the beginning that our cease_date column for the DETE survey has some entries with months and some with just years. We'll make it consistent by setting them all to just years. 

In [19]:
dete_survey['cease_date'].value_counts()

2012       344
2013       200
01/2014     43
12/2013     40
09/2013     34
06/2013     27
07/2013     22
10/2013     20
11/2013     16
08/2013     12
05/2013      7
05/2012      6
04/2014      2
02/2014      2
07/2014      2
08/2012      2
04/2013      2
09/2010      1
2010         1
07/2006      1
11/2012      1
2014         1
09/2014      1
07/2012      1
Name: cease_date, dtype: int64

In [20]:
dete_survey['cease_date'] = dete_survey['cease_date'].str.extract(r'([1-2][0-9][0-9][0-9])').astype(float)
#uses regex to extract year from 'cease_date' column from dete_survey, then converts it to a float

In [21]:
dete_survey['cease_date'].value_counts(dropna=False)
#displays counts of values from cease_date column of dete_survey to confirm changes

2013.0    380
2012.0    354
2014.0     51
NaN        34
2010.0      2
2006.0      1
Name: cease_date, dtype: int64

In [22]:
dete_survey['dete_start_date'].value_counts(dropna=False)
#displays count of values from dete_start_date

NaN       73
2011.0    40
2007.0    34
2008.0    31
2010.0    27
2012.0    27
2009.0    24
2006.0    23
1970.0    21
1975.0    21
2013.0    21
2005.0    20
1990.0    20
1999.0    19
1996.0    19
1992.0    18
1991.0    18
2000.0    18
2004.0    18
1989.0    17
1978.0    15
2003.0    15
1988.0    15
1976.0    15
2002.0    15
1974.0    14
1997.0    14
1998.0    14
1979.0    14
1995.0    14
1980.0    14
1993.0    13
1972.0    12
1986.0    12
1977.0    11
1971.0    10
1984.0    10
1994.0    10
1969.0    10
2001.0    10
1983.0     9
1981.0     9
1973.0     8
1985.0     8
1987.0     7
1982.0     4
1963.0     4
1968.0     3
1967.0     2
1965.0     1
1966.0     1
Name: dete_start_date, dtype: int64

In [23]:
tafe_survey['cease_date'].value_counts(dropna=False)
#displays count of values from 'cease_date' column of tafe_survey

2011.0    268
2012.0    235
2010.0    103
2013.0     85
NaN         7
2009.0      4
Name: cease_date, dtype: int64

Now that we have consistent date formatting in our relevant columns, we can create an 'institute_service' column in the DETE survey like that in the TAFE survey.

In [24]:
dete_survey['institute_service'] = dete_survey['cease_date'] - dete_survey['dete_start_date']

In [25]:
dete_survey.head()

Unnamed: 0,id,separationtype,cease_date,dete_start_date,role_start_date,position,classification,region,business_unit,employment_status,...,workload,none_of_the_above,gender,age,aboriginal,torres_strait,south_sea,disability,nesb,institute_service
0,1,Ill Health Retirement,2012.0,1984.0,2004.0,Public Servant,A01-A04,Central Office,Corporate Strategy and Peformance,Permanent Full-time,...,False,True,Male,56-60,,,,,Yes,28.0
1,2,Voluntary Early Retirement (VER),2012.0,,,Public Servant,AO5-AO7,Central Office,Corporate Strategy and Peformance,Permanent Full-time,...,False,False,Male,56-60,,,,,,
2,3,Voluntary Early Retirement (VER),2012.0,2011.0,2011.0,Schools Officer,,Central Office,Education Queensland,Permanent Full-time,...,False,True,Male,61 or older,,,,,,1.0
3,4,Resignation-Other reasons,2012.0,2005.0,2006.0,Teacher,Primary,Central Queensland,,Permanent Full-time,...,False,False,Female,36-40,,,,,,7.0
4,5,Age Retirement,2012.0,1970.0,1989.0,Head of Curriculum/Head of Special Education,,South East,,Permanent Full-time,...,False,False,Female,61 or older,,,,,,42.0
