# Employee Exit Surveys


<b>GOAL</b> - 
We have Employee Exit survey data from 2 Australian organizations and we want to know:
- Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?
    
- Are younger employees resigning due to some kind of dissatisfaction? What about older employees?

<b>DATA</b><br> 
Datasets of 2 Australian organizations, 
- The Department of Education, Training and Employment <b>(DETE)</b> 
- and the Technical and Further Education <b>(TAFE)</b> institute in Queensland, Australia.

Dataset and some of the columns used

<b>dete_survey.csv:</b><br>
<i>ID</i>: An id used to identify the participant of the survey <br>
<i>DETE Start Date</i>: The year the person began employment with the DETE<br>
<i>Cease Date</i>: The year or month the person's employment ended<br>
<i>SeparationType</i>: The reason why the person's employment ended<br>

<b>tafe_survey.csv:</b><br>
<i>Record ID</i>: An id used to identify the participant of the survey<br>
<i>LengthofServiceOverall. Overall Length of Service at Institute (in years)</i>: The length of the person's employment (in years)<br>
<i>Reason for ceasing employment</i>: The reason why the person's employment ended<br>



In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [44]:
dete_survey = pd.read_csv('dete_survey.csv')

tafe_survey = pd.read_csv('tafe_survey.csv')

### Investigating the data

In [3]:
dete_survey.head()

Unnamed: 0,ID,SeparationType,Cease Date,DETE Start Date,Role Start Date,Position,Classification,Region,Business Unit,Employment Status,...,Kept informed,Wellness programs,Health & Safety,Gender,Age,Aboriginal,Torres Strait,South Sea,Disability,NESB
0,1,Ill Health Retirement,08/2012,1984,2004,Public Servant,A01-A04,Central Office,Corporate Strategy and Peformance,Permanent Full-time,...,N,N,N,Male,56-60,,,,,Yes
1,2,Voluntary Early Retirement (VER),08/2012,Not Stated,Not Stated,Public Servant,AO5-AO7,Central Office,Corporate Strategy and Peformance,Permanent Full-time,...,N,N,N,Male,56-60,,,,,
2,3,Voluntary Early Retirement (VER),05/2012,2011,2011,Schools Officer,,Central Office,Education Queensland,Permanent Full-time,...,N,N,N,Male,61 or older,,,,,
3,4,Resignation-Other reasons,05/2012,2005,2006,Teacher,Primary,Central Queensland,,Permanent Full-time,...,A,N,A,Female,36-40,,,,,
4,5,Age Retirement,05/2012,1970,1989,Head of Curriculum/Head of Special Education,,South East,,Permanent Full-time,...,N,A,M,Female,61 or older,,,,,


In [4]:
dete_survey.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 822 entries, 0 to 821
Data columns (total 56 columns):
 #   Column                               Non-Null Count  Dtype 
---  ------                               --------------  ----- 
 0   ID                                   822 non-null    int64 
 1   SeparationType                       822 non-null    object
 2   Cease Date                           822 non-null    object
 3   DETE Start Date                      822 non-null    object
 4   Role Start Date                      822 non-null    object
 5   Position                             817 non-null    object
 6   Classification                       455 non-null    object
 7   Region                               822 non-null    object
 8   Business Unit                        126 non-null    object
 9   Employment Status                    817 non-null    object
 10  Career move to public sector         822 non-null    bool  
 11  Career move to private sector        822 non-

In [5]:
dete_survey.loc[:,'Cease Date'].value_counts()

2012          344
2013          200
01/2014        43
12/2013        40
09/2013        34
Not Stated     34
06/2013        27
07/2013        22
10/2013        20
11/2013        16
08/2013        12
05/2013         7
05/2012         6
07/2014         2
02/2014         2
04/2014         2
08/2012         2
04/2013         2
11/2012         1
09/2014         1
2014            1
2010            1
09/2010         1
07/2006         1
07/2012         1
Name: Cease Date, dtype: int64

In [6]:
dete_survey.loc[:,'DETE Start Date'].value_counts()

Not Stated    73
2011          40
2007          34
2008          31
2010          27
2012          27
2009          24
2006          23
1970          21
2013          21
1975          21
2005          20
1990          20
1999          19
1996          19
1992          18
2004          18
2000          18
1991          18
1989          17
1988          15
1978          15
1976          15
2002          15
2003          15
1995          14
1979          14
1998          14
1974          14
1997          14
1980          14
1993          13
1986          12
1972          12
1977          11
1984          10
1971          10
1994          10
1969          10
2001          10
1983           9
1981           9
1985           8
1973           8
1987           7
1982           4
1963           4
1968           3
1967           2
1965           1
1966           1
Name: DETE Start Date, dtype: int64

In [7]:
dete_survey.SeparationType.value_counts()

Age Retirement                          285
Resignation-Other reasons               150
Resignation-Other employer               91
Resignation-Move overseas/interstate     70
Voluntary Early Retirement (VER)         67
Ill Health Retirement                    61
Other                                    49
Contract Expired                         34
Termination                              15
Name: SeparationType, dtype: int64

<b>Clean up on DETE dataset</b>

- Reset index
- Date fields are of type object and format is different across entries on the Cease Date field. Some entries are month/yr other entries are year only. the Start Date field seems fine<br>
    Also, 'Not stated' is essentially null or NA. would need to deal with that for both columns
- May need to rename a few columns to easily use them in syntax
- Lots of Null entries across various columns would need to be dealt with

In [8]:
tafe_survey.loc[:,['CESSATION YEAR','Reason for ceasing employment','LengthofServiceOverall. Overall Length of Service at Institute (in years)']].head()

Unnamed: 0,CESSATION YEAR,Reason for ceasing employment,LengthofServiceOverall. Overall Length of Service at Institute (in years)
0,2010.0,Contract Expired,1-2
1,2010.0,Retirement,
2,2010.0,Retirement,
3,2010.0,Resignation,
4,2010.0,Resignation,3-4


In [9]:
tafe_survey.loc[:,['CESSATION YEAR','Reason for ceasing employment','LengthofServiceOverall. Overall Length of Service at Institute (in years)']].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 702 entries, 0 to 701
Data columns (total 3 columns):
 #   Column                                                                     Non-Null Count  Dtype  
---  ------                                                                     --------------  -----  
 0   CESSATION YEAR                                                             695 non-null    float64
 1   Reason for ceasing employment                                              701 non-null    object 
 2   LengthofServiceOverall. Overall Length of Service at Institute (in years)  596 non-null    object 
dtypes: float64(1), object(2)
memory usage: 16.6+ KB


In [10]:
tafe_survey.loc[:,'CESSATION YEAR'].value_counts()

2011.0    268
2012.0    235
2010.0    103
2013.0     85
2009.0      4
Name: CESSATION YEAR, dtype: int64

In [11]:
tafe_survey.loc[:,'LengthofServiceOverall. Overall Length of Service at Institute (in years)'].value_counts()

Less than 1 year      147
1-2                   102
3-4                    96
11-20                  89
More than 20 years     71
5-6                    48
7-10                   43
Name: LengthofServiceOverall. Overall Length of Service at Institute (in years), dtype: int64

In [12]:
tafe_survey.loc[:,'LengthofServiceOverall. Overall Length of Service at Institute (in years)'].isnull().sum()

106

In [13]:
tafe_survey.loc[:,'Reason for ceasing employment'].value_counts()

Resignation                 340
Contract Expired            127
Retrenchment/ Redundancy    104
Retirement                   82
Transfer                     25
Termination                  23
Name: Reason for ceasing employment, dtype: int64

<b>Clean up on TAFE dataset</b>

- 'Cessation year' date format may need to be fixed.
- May need to rename a few columns to easily use them in syntax.
- Null entries across 'Cessation year and length of service' needs to be fixed.

================================================================================================================

### Data Cleaning

In [17]:
# Drop unwanted columns


dete_survey.drop(columns=dete_survey.iloc[:,28:49],inplace=True)

tafe_survey.drop(columns=tafe_survey.iloc[:,17:66],inplace=True)

#### Renaming columns

We need to have standardized column names across dete and tafe to be able to merge both data sets.


Rename as follows:

    'Record ID': 'id'
    'CESSATION YEAR': 'cease_date'
    'Reason for ceasing employment': 'separationtype'
    'Gender. What is your Gender?': 'gender'
    'CurrentAge. Current Age': 'age'
    'Employment Type. Employment Type': 'employment_status'
    'Classification. Classification': 'position'
    'LengthofServiceOverall. Overall Length of Service at Institute (in years)': 'institute_service'
    'LengthofServiceCurrent. Length of Service at current workplace (in years)': 'role_service'


In [20]:
# Fixing Columns on Tafe

tafe_survey.columns

Index(['Record ID', 'Institute', 'WorkArea', 'CESSATION YEAR',
       'Reason for ceasing employment',
       'Contributing Factors. Career Move - Public Sector ',
       'Contributing Factors. Career Move - Private Sector ',
       'Contributing Factors. Career Move - Self-employment',
       'Contributing Factors. Ill Health',
       'Contributing Factors. Maternity/Family',
       'Contributing Factors. Dissatisfaction',
       'Contributing Factors. Job Dissatisfaction',
       'Contributing Factors. Interpersonal Conflict',
       'Contributing Factors. Study', 'Contributing Factors. Travel',
       'Contributing Factors. Other', 'Contributing Factors. NONE',
       'Gender. What is your Gender?', 'CurrentAge. Current Age',
       'Employment Type. Employment Type', 'Classification. Classification',
       'LengthofServiceOverall. Overall Length of Service at Institute (in years)',
       'LengthofServiceCurrent. Length of Service at current workplace (in years)'],
      dtype='ob

In [29]:
# make all lower case
# Strip out leading and trailing spaces
# replace middle spaces with underscore 


tafe_survey.columns = tafe_survey.columns.str.strip().str.replace(' ','_').str.strip('.').str.lower()

In [30]:
tafe_survey.columns

Index(['record_id', 'institute', 'workarea', 'cessation_year',
       'reason_for_ceasing_employment',
       'contributing_factors._career_move_-_public_sector',
       'contributing_factors._career_move_-_private_sector',
       'contributing_factors._career_move_-_self-employment',
       'contributing_factors._ill_health',
       'contributing_factors._maternity/family',
       'contributing_factors._dissatisfaction',
       'contributing_factors._job_dissatisfaction',
       'contributing_factors._interpersonal_conflict',
       'contributing_factors._study', 'contributing_factors._travel',
       'contributing_factors._other', 'contributing_factors._none',
       'gender._what_is_your_gender?', 'currentage._current_age',
       'employment_type._employment_type', 'classification._classification',
       'lengthofserviceoverall._overall_length_of_service_at_institute_(in_years)',
       'lengthofservicecurrent._length_of_service_at_current_workplace_(in_years)'],
      dtype='obje

In [41]:
tafe_survey.rename(columns={'record_id':'id','reason_for_ceasing_employment':'separationtype','cessation_year':'cease_date',
                           'gender._what_is_your_gender?':'gender', 'employment_type._employment_type':'employment_status', 
                           'currentage._current_age':'age','classification._classification':'position', 
                            'lengthofserviceoverall._overall_length_of_service_at_institute_(in_years)':'institute_service',
                           'lengthofservicecurrent._length_of_service_at_current_workplace_(in_years)': 'role_service'}, inplace=True)

In [55]:
tafe_survey.rename(columns={'separartion_type':'separation_type'},inplace=True)

In [59]:
tafe_survey.columns

Index(['id', 'institute', 'workarea', 'cease_date', 'separation_type',
       'contributing_factors._career_move_-_public_sector',
       'contributing_factors._career_move_-_private_sector',
       'contributing_factors._career_move_-_self-employment',
       'contributing_factors._ill_health',
       'contributing_factors._maternity/family',
       'contributing_factors._dissatisfaction',
       'contributing_factors._job_dissatisfaction',
       'contributing_factors._interpersonal_conflict',
       'contributing_factors._study', 'contributing_factors._travel',
       'contributing_factors._other', 'contributing_factors._none', 'gender',
       'age', 'employment_status', 'position', 'institute_service',
       'role_service'],
      dtype='object')

In [46]:
# Fixing Columns on Dete

dete_survey.columns

Index(['ID', 'SeparationType', 'Cease Date', 'DETE Start Date',
       'Role Start Date', 'Position', 'Classification', 'Region',
       'Business Unit', 'Employment Status', 'Career move to public sector',
       'Career move to private sector', 'Interpersonal conflicts',
       'Job dissatisfaction', 'Dissatisfaction with the department',
       'Physical work environment', 'Lack of recognition',
       'Lack of job security', 'Work location', 'Employment conditions',
       'Maternity/family', 'Relocation', 'Study/Travel', 'Ill Health',
       'Traumatic incident', 'Work life balance', 'Workload',
       'None of the above', 'Professional Development',
       'Opportunities for promotion', 'Staff morale', 'Workplace issue',
       'Physical environment', 'Worklife balance',
       'Stress and pressure support', 'Performance of supervisor',
       'Peer support', 'Initiative', 'Skills', 'Coach', 'Career Aspirations',
       'Feedback', 'Further PD', 'Communication', 'My say', 'Inform

In [47]:
dete_survey.columns= dete_survey.columns.str.strip().str.replace(' ','_').str.lower()

In [63]:
dete_survey.rename(columns={'separationtype':'separation_type'},inplace=True)

In [64]:
tafe_survey.loc[:,['id','cease_date','separation_type','gender','age','employment_status','position',
                   'institute_service','role_service']]

Unnamed: 0,id,cease_date,separation_type,gender,age,employment_status,position,institute_service,role_service
0,6.341330e+17,2010.0,Contract Expired,Female,26 30,Temporary Full-time,Administration (AO),1-2,1-2
1,6.341337e+17,2010.0,Retirement,,,,,,
2,6.341388e+17,2010.0,Retirement,,,,,,
3,6.341399e+17,2010.0,Resignation,,,,,,
4,6.341466e+17,2010.0,Resignation,Male,41 45,Permanent Full-time,Teacher (including LVT),3-4,3-4
...,...,...,...,...,...,...,...,...,...
697,6.350668e+17,2013.0,Resignation,Male,51-55,Temporary Full-time,Teacher (including LVT),1-2,1-2
698,6.350677e+17,2013.0,Resignation,,,,,,
699,6.350704e+17,2013.0,Resignation,Female,51-55,Permanent Full-time,Teacher (including LVT),5-6,1-2
700,6.350712e+17,2013.0,Contract Expired,Female,41 45,Temporary Full-time,Professional Officer (PO),1-2,1-2


In [65]:
dete_survey.loc[:,['id','cease_date','separation_type','gender','age','employment_status','position',
                   'dete_start_date']]

Unnamed: 0,id,cease_date,separation_type,gender,age,employment_status,position,dete_start_date
0,1,08/2012,Ill Health Retirement,Male,56-60,Permanent Full-time,Public Servant,1984
1,2,08/2012,Voluntary Early Retirement (VER),Male,56-60,Permanent Full-time,Public Servant,Not Stated
2,3,05/2012,Voluntary Early Retirement (VER),Male,61 or older,Permanent Full-time,Schools Officer,2011
3,4,05/2012,Resignation-Other reasons,Female,36-40,Permanent Full-time,Teacher,2005
4,5,05/2012,Age Retirement,Female,61 or older,Permanent Full-time,Head of Curriculum/Head of Special Education,1970
...,...,...,...,...,...,...,...,...
817,819,02/2014,Age Retirement,Female,56-60,Permanent Part-time,Teacher,1977
818,820,01/2014,Age Retirement,Male,51-55,Permanent Full-time,Teacher,1980
819,821,01/2014,Resignation-Move overseas/interstate,Female,31-35,Permanent Full-time,Public Servant,2009
820,822,12/2013,Ill Health Retirement,Female,41-45,Permanent Full-time,Teacher,2001


<b> Data cleaning work so far: </b>

 - Unwanted columns dropped
 - columns renamed to match in both data sets
 - Removed 'non stated' in cease_date column
 
  To do:
   - fix the years in date columns
   - exclude any abnormal dates
   - get data set of just resignations. Because thats our goal. i.e people resigning (we don't care about retirements etc.)


In [70]:
# Fix the 'non-stated' values in dete dataset 'cease date' 

dete_survey.loc[dete_survey.cease_date=='Not Stated','cease_date'] = np.nan

In [77]:
dete_survey.cease_date.value_counts()

2012       344
2013       200
01/2014     43
12/2013     40
09/2013     34
06/2013     27
07/2013     22
10/2013     20
11/2013     16
08/2013     12
05/2013      7
05/2012      6
02/2014      2
04/2014      2
07/2014      2
04/2013      2
08/2012      2
11/2012      1
2014         1
2010         1
09/2010      1
07/2006      1
07/2012      1
09/2014      1
Name: cease_date, dtype: int64