# Employee Exit Surveys


<b>GOAL</b> - 
We have Employee Exit survey data from 2 Australian organizations and we want to know:
- Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?
    
- Are younger employees resigning due to some kind of dissatisfaction? What about older employees?

<b>DATA</b><br> 
Datasets of 2 Australian organizations, 
- The Department of Education, Training and Employment <b>(DETE)</b> 
- and the Technical and Further Education <b>(TAFE)</b> institute in Queensland, Australia.

Dataset and some of the columns used

<b>dete_survey.csv:</b><br>
<i>ID</i>: An id used to identify the participant of the survey <br>
<i>DETE Start Date</i>: The year the person began employment with the DETE<br>
<i>Cease Date</i>: The year or month the person's employment ended<br>
<i>SeparationType</i>: The reason why the person's employment ended<br>

<b>tafe_survey.csv:</b><br>
<i>Record ID</i>: An id used to identify the participant of the survey<br>
<i>LengthofServiceOverall. Overall Length of Service at Institute (in years)</i>: The length of the person's employment (in years)<br>
<i>Reason for ceasing employment</i>: The reason why the person's employment ended<br>



In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
dete_survey = pd.read_csv('dete_survey.csv')

tafe_survey = pd.read_csv('tafe_survey.csv')

### Investigating the data

In [3]:
dete_survey.head()

Unnamed: 0,ID,SeparationType,Cease Date,DETE Start Date,Role Start Date,Position,Classification,Region,Business Unit,Employment Status,...,Kept informed,Wellness programs,Health & Safety,Gender,Age,Aboriginal,Torres Strait,South Sea,Disability,NESB
0,1,Ill Health Retirement,08/2012,1984,2004,Public Servant,A01-A04,Central Office,Corporate Strategy and Peformance,Permanent Full-time,...,N,N,N,Male,56-60,,,,,Yes
1,2,Voluntary Early Retirement (VER),08/2012,Not Stated,Not Stated,Public Servant,AO5-AO7,Central Office,Corporate Strategy and Peformance,Permanent Full-time,...,N,N,N,Male,56-60,,,,,
2,3,Voluntary Early Retirement (VER),05/2012,2011,2011,Schools Officer,,Central Office,Education Queensland,Permanent Full-time,...,N,N,N,Male,61 or older,,,,,
3,4,Resignation-Other reasons,05/2012,2005,2006,Teacher,Primary,Central Queensland,,Permanent Full-time,...,A,N,A,Female,36-40,,,,,
4,5,Age Retirement,05/2012,1970,1989,Head of Curriculum/Head of Special Education,,South East,,Permanent Full-time,...,N,A,M,Female,61 or older,,,,,


In [4]:
dete_survey.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 822 entries, 0 to 821
Data columns (total 56 columns):
 #   Column                               Non-Null Count  Dtype 
---  ------                               --------------  ----- 
 0   ID                                   822 non-null    int64 
 1   SeparationType                       822 non-null    object
 2   Cease Date                           822 non-null    object
 3   DETE Start Date                      822 non-null    object
 4   Role Start Date                      822 non-null    object
 5   Position                             817 non-null    object
 6   Classification                       455 non-null    object
 7   Region                               822 non-null    object
 8   Business Unit                        126 non-null    object
 9   Employment Status                    817 non-null    object
 10  Career move to public sector         822 non-null    bool  
 11  Career move to private sector        822 non-

In [13]:
dete_survey.loc[:,'Cease Date'].value_counts()

2012          344
2013          200
01/2014        43
12/2013        40
09/2013        34
Not Stated     34
06/2013        27
07/2013        22
10/2013        20
11/2013        16
08/2013        12
05/2013         7
05/2012         6
04/2014         2
04/2013         2
07/2014         2
08/2012         2
02/2014         2
2010            1
2014            1
09/2010         1
11/2012         1
07/2012         1
07/2006         1
09/2014         1
Name: Cease Date, dtype: int64

In [14]:
dete_survey.loc[:,'DETE Start Date'].value_counts()

Not Stated    73
2011          40
2007          34
2008          31
2010          27
2012          27
2009          24
2006          23
1975          21
2013          21
1970          21
1990          20
2005          20
1999          19
1996          19
2000          18
2004          18
1991          18
1992          18
1989          17
1976          15
1988          15
1978          15
2002          15
2003          15
1974          14
1979          14
1998          14
1995          14
1980          14
1997          14
1993          13
1986          12
1972          12
1977          11
1984          10
1994          10
1971          10
1969          10
2001          10
1983           9
1981           9
1985           8
1973           8
1987           7
1982           4
1963           4
1968           3
1967           2
1966           1
1965           1
Name: DETE Start Date, dtype: int64

In [15]:
dete_survey.SeparationType.value_counts()

Age Retirement                          285
Resignation-Other reasons               150
Resignation-Other employer               91
Resignation-Move overseas/interstate     70
Voluntary Early Retirement (VER)         67
Ill Health Retirement                    61
Other                                    49
Contract Expired                         34
Termination                              15
Name: SeparationType, dtype: int64

Clean up on DETE dataset

- Reset index
- Date fields are of type object and format is different across entries on the Cease Date field. Some entries are month/yr other entries are year only. the Start Date field seems fine<br>
    Also, 'Not stated' is essentially null or NA. would need to deal with that for both columns
- May need to rename a few columns to easily use them in syntax
- Lots of Null entries across various columns would need to be dealt with

In [17]:
tafe_survey.loc[:,['CESSATION YEAR','Reason for ceasing employment','LengthofServiceOverall. Overall Length of Service at Institute (in years)']].head()

Unnamed: 0,CESSATION YEAR,Reason for ceasing employment,LengthofServiceOverall. Overall Length of Service at Institute (in years)
0,2010.0,Contract Expired,1-2
1,2010.0,Retirement,
2,2010.0,Retirement,
3,2010.0,Resignation,
4,2010.0,Resignation,3-4


In [18]:
tafe_survey.loc[:,['CESSATION YEAR','Reason for ceasing employment','LengthofServiceOverall. Overall Length of Service at Institute (in years)']].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 702 entries, 0 to 701
Data columns (total 3 columns):
 #   Column                                                                     Non-Null Count  Dtype  
---  ------                                                                     --------------  -----  
 0   CESSATION YEAR                                                             695 non-null    float64
 1   Reason for ceasing employment                                              701 non-null    object 
 2   LengthofServiceOverall. Overall Length of Service at Institute (in years)  596 non-null    object 
dtypes: float64(1), object(2)
memory usage: 16.6+ KB


In [21]:
tafe_survey.loc[:,'CESSATION YEAR'].value_counts()

2011.0    268
2012.0    235
2010.0    103
2013.0     85
2009.0      4
Name: CESSATION YEAR, dtype: int64

In [20]:
tafe_survey.loc[:,'LengthofServiceOverall. Overall Length of Service at Institute (in years)'].value_counts()

Less than 1 year      147
1-2                   102
3-4                    96
11-20                  89
More than 20 years     71
5-6                    48
7-10                   43
Name: LengthofServiceOverall. Overall Length of Service at Institute (in years), dtype: int64

In [24]:
tafe_survey.loc[:,'LengthofServiceOverall. Overall Length of Service at Institute (in years)'].isnull().sum()

106

In [22]:
tafe_survey.loc[:,'Reason for ceasing employment'].value_counts()

Resignation                 340
Contract Expired            127
Retrenchment/ Redundancy    104
Retirement                   82
Transfer                     25
Termination                  23
Name: Reason for ceasing employment, dtype: int64

Clean up on TAFE dataset

- 'Cessation year' date format may need to be fixed.
- May need to rename a few columns to easily use them in syntax.
- Null entries across 'Cessation year and length of service' needs to be fixed.

================================================================================================================

### Data Cleaning

In [27]:
### DATA CLEANING - DETE

dete_survey.head(2)

Unnamed: 0,ID,SeparationType,Cease Date,DETE Start Date,Role Start Date,Position,Classification,Region,Business Unit,Employment Status,...,Kept informed,Wellness programs,Health & Safety,Gender,Age,Aboriginal,Torres Strait,South Sea,Disability,NESB
0,1,Ill Health Retirement,08/2012,1984,2004,Public Servant,A01-A04,Central Office,Corporate Strategy and Peformance,Permanent Full-time,...,N,N,N,Male,56-60,,,,,Yes
1,2,Voluntary Early Retirement (VER),08/2012,Not Stated,Not Stated,Public Servant,AO5-AO7,Central Office,Corporate Strategy and Peformance,Permanent Full-time,...,N,N,N,Male,56-60,,,,,


In [30]:

dete_survey.set_index('ID',inplace=True)

In [31]:
dete_survey.head(2)

Unnamed: 0_level_0,SeparationType,Cease Date,DETE Start Date,Role Start Date,Position,Classification,Region,Business Unit,Employment Status,Career move to public sector,...,Kept informed,Wellness programs,Health & Safety,Gender,Age,Aboriginal,Torres Strait,South Sea,Disability,NESB
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,Ill Health Retirement,08/2012,1984,2004,Public Servant,A01-A04,Central Office,Corporate Strategy and Peformance,Permanent Full-time,True,...,N,N,N,Male,56-60,,,,,Yes
2,Voluntary Early Retirement (VER),08/2012,Not Stated,Not Stated,Public Servant,AO5-AO7,Central Office,Corporate Strategy and Peformance,Permanent Full-time,False,...,N,N,N,Male,56-60,,,,,


In [55]:
# Drop unwanted columns


dete_survey.drop(columns=dete_survey.iloc[:,28:49],inplace=True)

tafe_survey.drop(columns=tafe_survey.iloc[:,17:66],inplace=True)

In [None]:
## Renaming columns
