PURPOSE: To support HR decision-making on cost reduction and employee job satisfaction by identifying the main predictors of excessive absenteeism at work.
AIM: To predict excessive absenteeism at work (>= 3 hours absence during work hours) based on historical absenteeism data.
METHOD: Use Python for EDA and feature engineering to prepare data to be fed into a pre-built logistic regression model and visualise insights in Tableau Public.
DATA: Primary raw data provided by company's HR department.
Markups are given in the two Jupyter notebooks -- "Predict Absenteeism Project Part I.ipynb" and "Predict Absenteeism Project Part II.ipynb" -- in which the project is carried out. Each step is numbered and provided a markup so you can follow along with the exploratory analysis, cleanup, and creation of dummy variables. The visualisation can be accessed through this link:
This is one of my first projects, so please let me know what you think and if you have any suggestions.
DS