# Employee Exit Surveys

## Backgroud

Exit interviews **may well reveal the need for a better learning and development strategy within the business**. If employees don't feel supported or challenged in their roles, then it's more likely they will have. It will also flag-up opportunities in management development and succession planning.

<img src="https://media.istockphoto.com/photos/letter-of-resignation-silver-pen-picture-id535481420?k=20&m=535481420&s=612x612&w=0&h=Frev4FqB0FKRgMyoAaJSpNCKnfTYPgBU6yFsEJmx_0Q="/>

## Project and Data Overview

In this project, we'll work with exit surveys from employees of the [Department of Education, Training and Employment](https://en.wikipedia.org/wiki/Department_of_Education_and_Training_(Queensland)) (DETE) and the Technical and Further Education (TAFE) institude in Queensland, Australia. DETE is a ministerial dapartment of the Queensland Government responsible for the administration and quality of eduction in Queensland state.

We can find the DETE exit survey data [here](https://data.gov.au/dataset/ds-qld-fe96ff30-d157-4a81-851d-215f2a0fe26d/details?q=exit%20survey). However, the orignial TAFE exit survey data is no longer available.
Therefore, we'll be using the modified versions of the original datasets to make them easier to work with, which includes changing the encoding to `UTF-8` (the original ones are encoded using `cp1252`).

## Data Dictionary

The data dictionary wasn't provided with the datasets. Therefore, we'll use our general knowledge to define the columns used in them.

Below is a preview of a couple columns we'll work with from the `dete_survey.csv`:

- `ID`: An id used to identify the participant of the survey
- `SeparationType`: The reason why the person's employment ended
- `Cease Date`: The year or month the person's employment ended
- `DETE Start Date`: The year the person began employment with the DETE

Below is a preview of a couple columns we'll work with from the `tafe_survey.csv`:

- `Record ID`: An id used to identify the participant of the survey
- `Reason for ceasing employment`: The reason why the person's employment ended
- `LengthofServiceOverall. Overall Length of Service at Institute (in years)`: The length of the person's employment (in years)

## Business Problem

In this project, we'll analyze these datasets (DETE & TAFE) to find out the answers of the following questions:

- Are employees who only worked for the institues for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?
- Are younger employees resigning due to some kind of dissatisfaction? What about older employees?

We'll combine the resluts for *both* surveys to answer these questions. However, although both used the same survey template, one of them customized some of the answers.


## Importing Libraries

We'll start by importing some useful libraries we need in the project.

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.style as style

# Display all columns in the dataframe
pd.set_option('display.max_columns', None)
# Plot style
style.use('fivethirtyeight')
# Enable the inline plotting
%matplotlib inline

## Load and Analysis Datasets

Next, we'll read in the `dete_survey.csv` and `tafe_survey.csv` datasets into pandas and explore them.

In [2]:
# Read datasets
dete_survey = pd.read_csv('dete_survey.csv')
tafe_survey = pd.read_csv('tafe_survey.csv')

Now that our datasets are loaded, we'll gather some basic information about both dataframes using `DataFrame.info()` and take a took at first few rows using `DataFrame.head()`.

## 1. DETE Survey Data

In [3]:
# Preview DETE dataset
dete_survey.info()
dete_survey.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 822 entries, 0 to 821
Data columns (total 56 columns):
 #   Column                               Non-Null Count  Dtype 
---  ------                               --------------  ----- 
 0   ID                                   822 non-null    int64 
 1   SeparationType                       822 non-null    object
 2   Cease Date                           822 non-null    object
 3   DETE Start Date                      822 non-null    object
 4   Role Start Date                      822 non-null    object
 5   Position                             817 non-null    object
 6   Classification                       455 non-null    object
 7   Region                               822 non-null    object
 8   Business Unit                        126 non-null    object
 9   Employment Status                    817 non-null    object
 10  Career move to public sector         822 non-null    bool  
 11  Career move to private sector        822 non-

Unnamed: 0,ID,SeparationType,Cease Date,DETE Start Date,Role Start Date,Position,Classification,Region,Business Unit,Employment Status,Career move to public sector,Career move to private sector,Interpersonal conflicts,Job dissatisfaction,Dissatisfaction with the department,Physical work environment,Lack of recognition,Lack of job security,Work location,Employment conditions,Maternity/family,Relocation,Study/Travel,Ill Health,Traumatic incident,Work life balance,Workload,None of the above,Professional Development,Opportunities for promotion,Staff morale,Workplace issue,Physical environment,Worklife balance,Stress and pressure support,Performance of supervisor,Peer support,Initiative,Skills,Coach,Career Aspirations,Feedback,Further PD,Communication,My say,Information,Kept informed,Wellness programs,Health & Safety,Gender,Age,Aboriginal,Torres Strait,South Sea,Disability,NESB
0,1,Ill Health Retirement,08/2012,1984,2004,Public Servant,A01-A04,Central Office,Corporate Strategy and Peformance,Permanent Full-time,True,False,False,True,False,False,True,False,False,False,False,False,False,False,False,False,False,True,A,A,N,N,N,A,A,A,A,N,N,N,A,A,A,N,A,A,N,N,N,Male,56-60,,,,,Yes
1,2,Voluntary Early Retirement (VER),08/2012,Not Stated,Not Stated,Public Servant,AO5-AO7,Central Office,Corporate Strategy and Peformance,Permanent Full-time,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,A,A,N,N,N,N,A,A,A,N,N,N,A,A,A,N,A,A,N,N,N,Male,56-60,,,,,
2,3,Voluntary Early Retirement (VER),05/2012,2011,2011,Schools Officer,,Central Office,Education Queensland,Permanent Full-time,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,A,A,N,N,N,N,Male,61 or older,,,,,
3,4,Resignation-Other reasons,05/2012,2005,2006,Teacher,Primary,Central Queensland,,Permanent Full-time,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,A,N,N,N,A,A,N,N,A,A,A,A,A,A,A,A,A,A,A,N,A,Female,36-40,,,,,
4,5,Age Retirement,05/2012,1970,1989,Head of Curriculum/Head of Special Education,,South East,,Permanent Full-time,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,A,A,N,N,D,D,N,A,A,A,A,A,A,SA,SA,D,D,A,N,A,M,Female,61 or older,,,,,


### Initial Notes

- The dataset has **822 rows** and **56 columns**.
- The column names do not conform to the recommended python naming convention rather it appears inconsistent.
- 18 out of 56 columns are stored as bool dtypes, the only column `ID` is stored as int, and rest of the columns have object dtypes.
- 32 out of 56 columns have missing values. The column `Classification` has arount 44%, whereas, `Business Unit`, `Aboriginal`, `Torres Strait`, `South Sea`, `Disability`, and `NESB` have more than 50% missing values.
- Time columns like `Cease Date`, `DETE Start Date`, and `Role Start Date` are stored as string dtypes instead of datetime.

In [14]:
dete_survey.describe(include='all')

Unnamed: 0,ID,SeparationType,Cease Date,DETE Start Date,Role Start Date,Position,Classification,Region,Business Unit,Employment Status,Career move to public sector,Career move to private sector,Interpersonal conflicts,Job dissatisfaction,Dissatisfaction with the department,Physical work environment,Lack of recognition,Lack of job security,Work location,Employment conditions,Maternity/family,Relocation,Study/Travel,Ill Health,Traumatic incident,Work life balance,Workload,None of the above,Professional Development,Opportunities for promotion,Staff morale,Workplace issue,Physical environment,Worklife balance,Stress and pressure support,Performance of supervisor,Peer support,Initiative,Skills,Coach,Career Aspirations,Feedback,Further PD,Communication,My say,Information,Kept informed,Wellness programs,Health & Safety,Gender,Age,Aboriginal,Torres Strait,South Sea,Disability,NESB
count,822.0,822,822.0,822,822,817,455,822,126,817,822,822,822,822,822,822,822,822,822,822,822,822,822,822,822,822,822,822,808,735,816,788,817,815,810,813,812,813,811,767,746,792,768,814,812,816,813,766,793,798,811,16,3,7,23,32
unique,,9,25.0,51,46,15,8,9,14,5,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,2,10,1,1,1,1,1
top,,Age Retirement,2012.0,Not Stated,Not Stated,Teacher,Primary,Metropolitan,Education Queensland,Permanent Full-time,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,A,Female,61 or older,Yes,Yes,Yes,Yes,Yes
freq,,285,344.0,73,98,324,161,135,54,434,800,742,788,733,761,806,765,794,795,788,760,754,785,710,794,605,735,605,413,242,335,357,467,359,342,349,401,396,372,345,246,348,293,399,400,436,401,253,386,573,222,16,3,7,23,32
mean,411.693431,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
std,237.70582,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
min,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
25%,206.25,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
50%,411.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
75%,616.75,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


### Additional Notes

- The most frequent reason why employees exit from DETE appears to be *age retirement*, as seen in the `SeparationType` column.
- Most employees are *61 or older* in the `Age` column which further solidify the reason why employees exit from DETE.
- `DETE Start Date` and `Role Start Date` columns have *Not Stated* as most frequent value and it should be referred as **NaN**.
- Columns like `Aboriginal`, `Torres Strait`, `South Sea`, `Disability`, and `NESB` have one unique value which is **Yes** and all other values are stored as **NaN**, rather they should be stored as **No**. This is the reason why these columns have such a high percentage of missing values.
- The columns from `Professional Development` to `Health & Safety` have *A* as most common values. This seems quite unusual as 'A' doesn't seem to represent anything. We'll explore these columns further.

To investigate the unusual entries like **'A'** in the columns from `Professional Development` to `Health & Safety`, we'll have to find all the unique values. For this purpose, first we'll use `pd.DataFrame.values` on these columns to convert the **DataFrame** into the numpy **array** then we use `np.ravel()` method to flatten the array. In the end we'll use `pd.unique()` method to get the unique values of all the columns.

The workflow may sound confusing but the implementation is fairly straightforward.

In [50]:
column_values = dete_survey.loc[:, 'Professional Development':'Health & Safety'].values.ravel()
unique_values = pd.unique(column_values)
unique_values

array(['A', 'N', 'D', 'SA', 'M', 'SD', nan], dtype=object)

### Observation


## 2. TAFE Survey Data

In [None]:
# Preview TAFE dataset
tafe_survey.info()
tafe_survey.head()

Before making our observations about these datasets, lets further explore the data and count the number of missing values `NaN`:

In [None]:
# Count missing values in 'dete_survey'
dete_survey.isnull().sum()

In [None]:
# Count missing values in 'tafe_survey'
tafe_survey.isnull().sum()