## A CLASSIFICATION PROJECT - PREDICTING EMPLOYEE ATTRITION

#### BUSINESS UNDERSTANDING

Employee attrition refers to the process where employees leave an organization, either voluntarily or involuntarily. High attrition rates can be costly for businesses, as they impact productivity, morale, recruitment costs, and training expenses. 
The primary objective of predicting employee attrition is to identify employees who are at risk of leaving the organization in the near future. By doing so, companies can take proactive measures to improve retention, enhance employee satisfaction, and reduce the overall cost of turnover.

##### PROJECT GOAL
The goal of this project is to develop a robust machine learning pipeline to predict whether specific employees are likely to leave the company. The predictive modeling will be conducted following an in-depth analysis of the dataset obtained. 

##### ANALYTICAL QUESTIONS
1. What is the percentage of Attrition?
2. How satisfied are employees after 3 years at the company?
3. Does marital status affect attrition rate?

#### DATA UNDERSTANDING

#### Loading the Necessary Libraries

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

#Data Preparation
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import RobustScaler, OneHotEncoder, LabelEncoder
from sklearn.feature_selection import SelectKBest, mutual_info_classif
from imblearn.pipeline import Pipeline as imbpipeline
from imblearn.over_sampling import RandomOverSampler, SMOTE
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

# Evaluation
from sklearn.metrics import classification_report, confusion_matrix, roc_curve, auc

#Model Persistance
import joblib

In [5]:
#Load the dataset
data = pd.read_excel('Data/Attrition Dataset.xlsx')
data.head()

Unnamed: 0,Employee ID,Age,Department,Education Level,EducationField,Environment Satisfaction,Job Satisfaction,Marital Status,Gross Salary,Work Life Balance,Length of Service,Attrition
0,1001,41,Personal Finance,2,Finance,2,4,Single,5993.0,1,6,Yes
1,1002,37,Personal Finance,1,Finance,4,3,Single,2090.0,3,7,Yes
2,1003,33,Personal Finance,1,Finance,4,3,Married,2909.0,3,8,No
3,1004,27,Personal Finance,1,Finance,1,2,Married,3468.0,3,2,No
4,1005,32,Personal Finance,1,Finance,4,4,Single,3068.0,2,7,Yes
