Employee Attrition Problem

Table of Content

Overview
Data Preparation and Exploration
Data Modeling
Important Features and Conclusion

Overview

I am exploring the employee dataset to solve following research questions:

Why employee are prone to leave?
What type of employee are prone to leave the company?
Predict the future employee who would tend to leave the company.

Data Preparation and Exploration

Histogram of Current Employee

Histogram of ex-employee

A few observations can be made based on the information and histograms for numerical features. Many histograms are tail-heavy; indeed several distributions are right-skewed (e.g.time_spend_company, promotion_last_5years). Data transformation methods may be required to approach a normal distribution prior to fitting a model to the data.

Data Modeling

I use a range of baseline algorithms before we move on to more sophisticated solutions. The algorithms considered in this section are: Logistic Regression, Random Forest, and, SVM.

Since this is my real project during my internship, I don't mention the exact result here, but I outlined some inportant features to predict the employee turnover. Based on ROC graph, the logistic regression showed higher AUC score compared to random forest.

Important Features and Conclusion

As a concluded remark, as the company generates more data on its employees (on New Joiners and recent Leavers) the algorithm can be re-trained using the additional data and theoritically generate more accurate predictions to identify high-risk employees of leaving based on the probabilistic label assigned to each feature variable (i.e. employee) by the algorithm.

Employees can be assigning a "Risk Category" based on the predicted label such that:

Low-risk for employees with label < 0.6
Medium-risk for employees with label between 0.6 and 0.8
High-risk for employees with label > 0.8

A strategic "Retention Plan" should be drawn for each Risk Category group. The face-to-face meetings between a HR representative and employees can be initiated for medium- and high-risk employees to discuss work conditions. Also, a meeting with those employee's Line Manager would allow to discuss the work environment within the team and whether steps can be taken to improve it.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Chiranjibi Ghimire-POC.pptx		Chiranjibi Ghimire-POC.pptx
Chiranjibi Ghimire-PROJECTREPORT.pptx		Chiranjibi Ghimire-PROJECTREPORT.pptx
Project_POC.py		Project_POC.py
README.md		README.md
TakenMind-Python-Analytics-Problem-case-study-1-1.xlsx		TakenMind-Python-Analytics-Problem-case-study-1-1.xlsx
df_existing-histogram.png		df_existing-histogram.png
df_nonexisting_histogram_plots.png		df_nonexisting_histogram_plots.png
ex-employee_mean.png		ex-employee_mean.png
ex-emplyee-dept_mean.png		ex-emplyee-dept_mean.png
ex-emplyee-salary_mean.png		ex-emplyee-salary_mean.png
existing-salary_mean.png		existing-salary_mean.png
existingEmployee_mean.png		existingEmployee_mean.png
existing_dept_mean.png		existing_dept_mean.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Employee Attrition Problem

Table of Content

Overview

Data Preparation and Exploration

Data Modeling

Important Features and Conclusion

About

Releases

Packages

Languages

cghimire/Employee-Attrition-Problem

Folders and files

Latest commit

History

Repository files navigation

Employee Attrition Problem

Table of Content

Overview

Data Preparation and Exploration

Data Modeling

Important Features and Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages