# HR Analytics Project- Understanding the Attrition in HR

**Problem Statement:**

Every year a lot of companies hire a number of employees. The companies invest time and money in training those employees, not just this but there are training programs within the companies for their existing employees as well. The aim of these programs is to increase the effectiveness of their employees. But where HR Analytics fit in this? and is it just about improving the performance of employees?

HR Analytics

Human resource analytics (HR analytics) is an area in the field of analytics that refers to applying analytic processes to the human resource department of an organization in the hope of improving employee performance and therefore getting a better return on investment. HR analytics does not just deal with gathering data on employee efficiency. Instead, it aims to provide insight into each process by gathering data and then using it to make relevant decisions about how to improve these processes.

Attrition in HR

Attrition in human resources refers to the gradual loss of employees overtime. In general, relatively high attrition is problematic for companies. HR professionals often assume a leadership role in designing company compensation programs, work culture, and motivation systems that help the organization retain top employees.

Attrition affecting Companies

A major problem in high employee attrition is its cost to an organization. Job postings, hiring processes, paperwork, and new hire training are some of the common expenses of losing employees and replacing them. Additionally, regular employee turnover prohibits your organization from increasing its collective knowledge base and experience over time. This is especially concerning if your business is customer-facing, as customers often prefer to interact with familiar people. Errors and issues are more likely if you constantly have new workers.

How does Attrition affect companies? and how does HR Analytics help in analyzing attrition? We will write the code and try to understand the process step by step.

# Importing Libraries 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns 
from scipy.stats import zscore
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PowerTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.decomposition import PCA
from sklearn.model_selection import GridSearchCV
import warnings
warnings.filterwarnings("ignore")


# Getting the Data


In [None]:
adf = pd.read_csv('Attrition_data.csv')

# Exploratory Data Analysis (EDA)

In [None]:
adf.head(10)

Checking for Null Values

In [None]:
adf.isnull().sum()

Extracting Information about the dataset

In [None]:
adf.info()

Describing the dataset to obtain - Count , Mean , Standard deviation , Mininmum , IQR , Maximum values.

In [None]:
adf.describe()

In [None]:
adf.columns

In [None]:
adf.head()

Plotting *Countplot* against the target variable.

A count plot can be thought of as a histogram across a **categorical** , instead of quantitative, variable

In [None]:
plt.figure(figsize = (16,9))
sns.countplot('Age', hue='Attrition', data=adf)

In [None]:
plt.figure(figsize = (16,9))
sns.countplot('BusinessTravel', hue='Attrition', data=adf)

In [None]:
plt.figure(figsize = (16,9))
sns.countplot('Department', hue='Attrition', data=adf)

In [None]:
plt.figure(figsize = (16,9))
sns.countplot('EducationField', hue='Attrition', data=adf)

In [None]:
plt.figure(figsize = (16,9))
sns.countplot('Gender', hue='Attrition', data=adf)

In [None]:
plt.figure(figsize = (20,9))
sns.countplot('JobRole', hue='Attrition', data=adf)

In [None]:
plt.figure(figsize = (20,9))
sns.countplot('MaritalStatus', hue='Attrition', data=adf)

In [None]:
plt.figure(figsize = (20,9))
sns.countplot('OverTime', hue='Attrition', data=adf)

Here our target variable is Attrition and other are our predictor variable.


# Data Preprocessing

In our dataset there are twp type of datatypes - **Object** and **Integer**  .

Finding datas with **Object** type .

In [None]:
object_type = [feature for feature in adf.columns if adf[feature].dtypes =='O']
print(object_type)
print("Number of columns with object data type in adf is :" , len(object_type))

Columns found above are of **Object** datatype.

Finding datas with **Integer** datatype.

In [None]:
integer_type = [feature for feature in adf.columns if adf[feature].dtypes !='O']
print(integer_type)
print("Number of columns with int64 data type is : " , len(integer_type))

From above we can observe that
9 columns are as object datatype and 25 columns are as integer datatype.

Counting Values of each column.

In [None]:
object_col = ['Attrition', 'BusinessTravel', 'Department', 'EducationField', 'Gender', 'JobRole', 'MaritalStatus', 'Over18', 'OverTime']
for col in object_col:
    print(col,'\n',adf[col].value_counts() )

Here we can use **LabelEncoder** and it will also produce the same result , But here I have used .replace method to assign values explicitely.

**Attrition**

Attrition is our target variable , and has two categories so we'll transfrom its categories (Yes & No) to 1 & 0.

In [None]:
adf.Attrition.replace({'Yes': 1, 'No': 0}, inplace=True)

In [None]:
adf.head()

**Business Travel**

This column has three values 

1.**Non-Travel**

2.**Travel-Rarely**

3.**Travel-Frequently**

so we will replace them by 0 , 1 , 2 

In [None]:
adf.BusinessTravel.replace({'Non-Travel': 0, 'Travel_Rarely': 1, 'Travel_Frequently':2}, inplace=True)

In [None]:
adf.head()

**Department**

This column has three values 

1.**Sales**

2.**Research & Development**


3.**Human Resources**

So we will convert them with 0 , 1 , 2 

In [None]:
adf.Department.replace({'Sales': 0, 'Research & Development': 1, 'Human Resources': 2, }, inplace=True)

In [None]:
adf.head()

**EducationField**

This column has 6 values

1.**Life Sciences**   

2.**Medical**    

3.**Marketing** 

4.**Technical Degree**

5.**Human Resources**

6.**Other** 

So we will convert these values with 0 , 1 , 2 , 3 , 4 , 5

In [None]:
adf.EducationField.replace({'Life Sciences': 0, 'Medical': 1, 'Marketing': 2, 
                             'Technical Degree': 3, 'Human Resources': 4, 'Other':5},inplace = True)

In [None]:
adf['EducationField'].head(8)

**Gender**

This column has 2 values 

1.**Male**

2.**Female**

So we will convert them by 1 , 0 

In [None]:
adf.Gender.replace({'Male': 1, 'Female': 1}, inplace=True)

In [None]:
adf['Gender'].head(8)

**JobRole**

This column has nine values

1.**Sales Executive**

2.**Research Scientist**

3.**Laboratory Technician**

4.**Manufacturing Director**

5.**Healthcare Representative**

6.**Manager**

7.**Sales Representative**

8.**Research Director**

9.**Human Resources**

So we will convert them by 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8


In [None]:
adf.JobRole.replace({'Sales Executive': 0, 'Research Scientist': 1, 'Laboratory Technician': 2,
                     'Manufacturing Director': 3, 'Healthcare Representative': 4, 'Manager': 5,
                     'Sales Representative': 6, 'Research Director': 7, 'Human Resources': 8}, inplace=True)

In [None]:
adf['JobRole']

**MaritalStatus**

This column has 3 values

1.**Single**

2.**Married**

3.**Divorced**

So we will convert them by 0 , 1 , 2

In [None]:
 adf.MaritalStatus.replace({'Single': 0, 'Married': 1, 'Divorced': 2}, inplace=True)

In [None]:
adf['MaritalStatus']

**Over18**

In [None]:
adf.shape

In [None]:
adf['Over18'].value_counts

Here we can see that all employees are over 18 , so we can droop this column.

In [None]:
adf.drop('Over18' , axis = 1 , inplace = True)

**OverTime**

This colummn has two values 

1.**Yes**

2.**No**

So we will convert it by 1 , 0

In [None]:
adf.OverTime.replace({'Yes': 1, 'No': 0}, inplace=True)

In [None]:
adf['OverTime']

Checking for object values in the dataset if any:

In [None]:
object_type = [feature for feature in adf.columns if adf[feature].dtypes =='O']
print(object_type)
print("Number of columns with object data type in adf is :" , len(object_type))

We have sucessfully converted all our **Object** datatype columns.

In [None]:
adf

**Visualising Outliers**

*Boxplot*

A boxplot is a standardized way of displaying the distribution of data based on a five number summary (“minimum”, first quartile (Q1), median, third quartile (Q3), and “maximum”).

In [None]:
collist=adf.columns.values
ncol=8
nrows=5
plt.figure(figsize=(16, 9 ))
for i in range (0,len(collist)):
    plt.subplot(nrows,ncol,i+1)
    sns.boxplot(adf[collist[i]],color='darkcyan',orient='h')
    plt.tight_layout()

We can observe that some of the columns has Outliers , so we need to deal with them. 

In [None]:
adf.describe()

From above we can observe there are Outliers in our dataset , So we need to remove these outliers.

**Removing Outliers**

Finding Zscore of our datas.

A  **Z-score** is a numerical measurement that describes a value's relationship to the mean of a group of values. Z-score is measured in terms of standard deviations from the mean. If a Z-score is 0, it indicates that the data point's score is identical to the mean score. A Z-score of 1.0 would indicate a value that is one standard deviation from the mean.

In [None]:
z_score = np.abs(zscore(adf))

In [None]:
print(np.where(z_score>3))

From above arrays we can observe that first array shows row number and second array shows column number of values having Zscore > 3.

adf_wo = Our DataFrame without Outliers .

In [None]:
adf_wo = adf.drop([  28,   45,   62,   62,   63,   64,   85,   98,   98,  110,  123,
        123,  123,  126,  126,  126,  153,  178,  187,  187,  190,  190,
        218,  231,  231,  237,  237,  270,  270,  281,  326,  386,  386,
        401,  411,  425,  425,  427,  445,  466,  473,  477,  535,  561,
        561,  584,  592,  595,  595,  595,  616,  624,  635,  653,  653,
        677,  686,  701,  716,  746,  749,  752,  799,  838,  861,  861,
        875,  875,  894,  914,  914,  918,  922,  926,  926,  937,  956,
        962,  976,  976, 1008, 1024, 1043, 1078, 1078, 1086, 1086, 1093,
       1111, 1116, 1116, 1135, 1138, 1138, 1156, 1184, 1221, 1223, 1242,
       1295, 1301, 1301, 1303, 1327, 1331, 1348, 1351, 1401, 1414, 1430])

In [None]:
adf_wo

Here above we have successfully removed all the rows having outliers.

In [None]:
columnss = ['Attrition','Age', 'DailyRate', 'DistanceFromHome', 'Education', 'EmployeeCount', 'EmployeeNumber', 'EnvironmentSatisfaction', 'HourlyRate', 'JobInvolvement', 'JobLevel', 'JobSatisfaction', 'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked', 'PercentSalaryHike', 'PerformanceRating', 'RelationshipSatisfaction', 'StandardHours', 'StockOptionLevel', 'TotalWorkingYears', 'TrainingTimesLastYear', 'WorkLifeBalance', 'YearsAtCompany', 'YearsInCurrentRole', 'YearsSinceLastPromotion', 'YearsWithCurrManager']

**Correlation**

Visualizing Correlation between our predictor variable and target variable .

In [None]:
plt.figure(figsize = (24,12))
sns.heatmap(adf_wo.corr()  , cmap = 'YlGnBu_r')

In [None]:
corr_df = adf_wo.corr()
corr_df  = corr_df.iloc[: , 1:2]
corr_df

From above we can observe that some columns are internally correlated .

**Skewness**

Checking for Skewness in our dataset[features].

In [None]:
x_predictor = adf_wo.drop('Attrition', axis = 1)
x_predictor

In [None]:
x_predictor.skew()

*DistPlot*

This function combines the matplotlib hist function (with automatic calculation of a good default bin size) with the seaborn kdeplot() and rugplot() functions.

In [None]:
for feature in x_predictor :
    sns.distplot(x_predictor[feature] , kde = True , color = 'darkcyan' )
    plt.xlabel(feature)
    plt.ylabel("count")
    plt.title(feature)
    plt.show()

The rule of thumb seems to be: If the skewness is between -0.5 and 0.5, the data are fairly symmetrical. If the skewness is between -1 and – 0.5 or between 0.5 and 1, the data are moderately skewed. If the skewness is less than -1 or greater than 1, the data are highly skewed.

Power transforms are a family of parametric, monotonic transformations that are applied to make data more Gaussian-like. 

In [None]:
powert = PowerTransformer( method = 'yeo-johnson' , standardize = False)
x_t = powert.fit_transform(x_predictor)

In [None]:
x_t

In [None]:
x_trans = pd.DataFrame(x_t , columns = x_predictor.columns)
x_trans

In [None]:
x_trans.skew()

In [None]:
for feature in x_predictor :
    sns.distplot(x_predictor[feature] , kde = True , color = 'darkcyan' )
    plt.xlabel(feature)
    plt.ylabel("count")
    plt.title(feature)
    plt.show()

Here we can see that we have successfully removed skewness of our dataset.

Checking for min and max values for each column

In [None]:
for i in x_trans :
    print(i , max(x_trans[i]) - min(x_trans[i]))

We can see that there is vast different between values of different columns , So we will scale them .

Gaussian's distribution with zero mean and unit variance is standard scaling.

In [None]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x_s = scaler.fit_transform(x_trans)
x_s

In [None]:
x_sc = pd.DataFrame(x_s , columns = x_trans.columns)
x_sc

Here we can drop **EmployeeNumber** column as its not worth taking forward.

In [None]:
x_sc = x_sc.drop('EmployeeNumber' , axis = 1)

In [None]:
x_sc.head()

In [None]:
plt.figure(figsize= (20,9))
sns.heatmap(x_sc.corr() , cmap = 'Spectral')

From above heatmap we can observe that some **predictor columns** are highly correlated . So we will use PCA for dimensionality reduction.

**Principal Component Analysis (PCA)**

In [None]:
pca = PCA(n_components = 'mle' , svd_solver = 'full' )
xpca = pca.fit_transform(x_sc)

In [None]:
xpca

In [None]:
x_f = pd.DataFrame(xpca )
x_f

In [None]:
print(pca.components_)

In [None]:
x_f.shape

From above we can observe that the columns are reduced to 28 from 32.

In [None]:
y = adf_wo.iloc[: ,1:2 ]
y

# Machine Learning Models

**Finding best Random State**

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
maxAccu=0
maxRS=0
for i in range(0,200):
    X_train,X_test,y_train,y_test=train_test_split(x_f,y,test_size=.30,random_state=i)
    rf=RandomForestClassifier()
    rf.fit(X_train,y_train)
    predrf=rf.predict(X_test)
    acc=accuracy_score(y_test,predrf)
    if acc>maxAccu:
        maxAccu=acc
        maxRS=i
    
print('Best accuracy is ',maxAccu, 'on random state ',maxRS)
    

In [None]:
def rmse_cv(model, x_train, y):
    rmse =- (cross_val_score(model, x_train, y, scoring='neg_mean_squared_error', cv=5))
    return(rmse*100)

models = [LogisticRegression(),
             KNeighborsClassifier(),
             SVC(),
             RandomForestClassifier(),
             AdaBoostClassifier(),
             DecisionTreeClassifier(),
             GaussianNB()
         ]

names = ['LogisticRegression','K Nearest Neighbor','Support Vector Classifier','Random Forest','AdaBoost Classifier',
         'Decision Tree Classifier' , 'GaussianNB' ]

for model,name in zip(models,names):
    fit = model.fit(X_train , y_train)
    y_predicted = model.predict(X_test)
    score = model.score(X_train , y_train)
    print(name ," - " ,score)
    print("Accuracy Score" , accuracy_score(y_test , y_predicted))
    print("Confusion Matrix" , confusion_matrix(y_test , y_predicted))
    print("Classification Report" , classification_report(y_test , y_predicted))

From above we can conclude that **Support Vector Classifier ,  AdaBoost Classifier , Random Forest Classifier and Decision Tree Classifier** have the best scores so we will use these algorithms for our future predictions.

**Support Vector Classifiers**

In [None]:
maxAccu=0
maxRS=0
for i in range(0,200):
    X_train,X_test,y_train,y_test=train_test_split(x_f,y,test_size=.30,random_state=i)
    sv = SVC()
    sv.fit(X_train,y_train)
    predsv = sv.predict(X_test)
    acc=accuracy_score(y_test,predsv)
    if acc>maxAccu:
        maxAccu=acc
        maxRS=i
    
print('Best accuracy is ',maxAccu, 'on random state ',maxRS)
    

In [None]:
X_train,X_test,y_train,y_test=train_test_split(x_f,y,test_size=.30,random_state=99)
sv = SVC()
sv.fit(X_train , y_train)
sv_predicted = sv.predict(X_test)
score = sv.score(X_train , y_train)

print(SVC() ," - " ,score)
print("Accuracy:",accuracy_score(sv_predicted, y_test))
print("Confusion Matrix:\n",confusion_matrix(sv_predicted, y_test))
print("\t\tclassification report")
print("-" * 52)
print(classification_report(sv_predicted , y_test))

**Random Forest Classifier**

In [None]:
maxAccu=0
maxRS=0
for i in range(0,200):
    X_train,X_test,y_train,y_test=train_test_split(x_f,y,test_size=.30,random_state=i)
    rf = RandomForestClassifier()
    rf.fit(X_train,y_train)
    predrf = rf.predict(X_test)
    acc=accuracy_score(y_test,predrf)
    if acc>maxAccu:
        maxAccu=acc
        maxRS=i
    
print('Best accuracy is ',maxAccu, 'on random state ',maxRS)
    

In [None]:
X_train,X_test,y_train,y_test=train_test_split(x_f,y,test_size=.30,random_state=106)
rf = RandomForestClassifier()
rf.fit(X_train , y_train)
rf_predicted = rf.predict(X_test)
score = rf.score(X_train , y_train)

print(RandomForestClassifier() ," - " ,score)
print("Accuracy:",accuracy_score(rf_predicted, y_test))
print("Confusion Matrix:\n",confusion_matrix(rf_predicted, y_test))
print("\t\tclassification report")
print("-" * 52)
print(classification_report(rf_predicted , y_test))

**Decision Tree Classifier**

In [None]:
maxAccu=0
maxRS=0
for i in range(0,200):
    X_train,X_test,y_train,y_test=train_test_split(x_f,y,test_size=.30,random_state=i)
    dtr = DecisionTreeClassifier()
    dtr.fit(X_train,y_train)
    preddtr = dtr.predict(X_test)
    acc=accuracy_score(y_test,preddtr)
    if acc>maxAccu:
        maxAccu=acc
        maxRS=i
    
print('Best accuracy is ',maxAccu, 'on random state ',maxRS)
    

In [None]:
X_train,X_test,y_train,y_test=train_test_split(x_f,y,test_size=.30,random_state=106)
dtr = DecisionTreeClassifier()
dtr.fit(X_train , y_train)
dtr_predicted = dtr.predict(X_test)
score = dtr.score(X_train , y_train)

print(DecisionTreeClassifier() ," - " ,score)
print("Accuracy:",accuracy_score(dtr_predicted, y_test))
print("Confusion Matrix:\n",confusion_matrix(dtr_predicted, y_test))
print("\t\tclassification report")
print("-" * 52)
print(classification_report(dtr_predicted , y_test))

**AdaBoost Classifier**

In [None]:
maxAccu=0
maxRS=0
for i in range(0,200):
    X_train,X_test,y_train,y_test=train_test_split(x_f,y,test_size=.30,random_state=i)
    ab = AdaBoostClassifier()
    ab.fit(X_train,y_train)
    predab = ab.predict(X_test)
    acc=accuracy_score(y_test,predab)
    if acc>maxAccu:
        maxAccu=acc
        maxRS=i
    
print('Best accuracy is ',maxAccu, 'on random state ',maxRS)
    

In [None]:
X_train,X_test,y_train,y_test=train_test_split(x_f,y,test_size=.30,random_state=166)
ab = AdaBoostClassifier()
ab.fit(X_train , y_train)
ab_predicted = ab.predict(X_test)
score = ab.score(X_train , y_train)

print(AdaBoostClassifier() ," - " ,score)
print("Accuracy:",accuracy_score(ab_predicted, y_test))
print("Confusion Matrix:\n",confusion_matrix(ab_predicted, y_test))
print("\t\tclassification report")
print("-" * 52)
print(classification_report(ab_predicted , y_test))

AdaBoost Classifier **Hyperparameter tuning**

In [None]:
param_grid = {
    'learning_rate':[0.001, 0.10, 0.1, 1],

             'n_estimators':range(50, 400, 50)
             }


ab = AdaBoostClassifier( random_state = 166)


grid_ab = GridSearchCV(ab , param_grid, scoring = 'accuracy')
grid_ab.fit(X_train, y_train)

print("Best Hyper Parameters:\n",grid_ab.best_params_)
print("training accuracy:\n",grid_ab.best_score_)
ab_grid_pred = grid_ab.best_estimator_.predict(X_test)

print("Accuracy:",accuracy_score(ab_grid_pred , y_test))

print("Confusion Matrix:\n",confusion_matrix(ab_grid_pred , y_test))
print("\t\tclassification report")
print("-" * 52)
print(classification_report(ab_grid_pred , y_test))


From above we can observe that **Random Forest Classifier** is giving the best scores for our Predictions .


**Training Random Forest Classifier again**

In [None]:
X_train,X_test,y_train,y_test=train_test_split(x_f,y,test_size=.30,random_state=106)

random_forest = RandomForestClassifier(n_estimators=100, oob_score = True)
random_forest.fit(X_train, y_train)
Y_prediction = random_forest.predict(X_test)
random_forest.score(X_train, y_train)
acc_random_forest = round(random_forest.score(X_train, y_train) * 100, 2)
print(round(acc_random_forest,2,), "%")

**Random Forest Hyperparameter Tuning**

In [None]:
param_grid = { "criterion" : ["gini", "entropy"], 
              "min_samples_leaf" : [1, 5, 10, 25, 50, 70], 
              "min_samples_split" : [2, 4, 10, 12, 16, 18, 25, 35], 
              "n_estimators": [100, 400, 700, 1000, 1500]}

rf = RandomForestClassifier(n_estimators=100, max_features='auto', oob_score=True, random_state=1, n_jobs=-1)
clf = GridSearchCV(estimator=rf, param_grid=param_grid, n_jobs=-1)
clf.fit(X_train, y_train)
clf.best_params_

**Confusion Matrix**

In [None]:
predictions = cross_val_predict(random_forest, X_train, y_train, cv=3)
confusion_matrix(Y_train, predictions)

**Precision and Recall**

In [None]:
# Precision and Recall:
print("Precision:", precision_score(Y_train, predictions))
print("Recall:",recall_score(Y_train, predictions))

**Getting the Probabilities**

In [None]:
y_scores = random_forest.predict_proba(X_train)
y_scores = y_scores[:,1]

precision, recall, threshold = precision_recall_curve(y_train, y_scores)
def plot_precision_and_recall(precision, recall, threshold):
    plt.plot(threshold, precision[:-1], "r-", label="precision", linewidth=5)
    plt.plot(threshold, recall[:-1], "b", label="recall", linewidth=5)
    plt.xlabel("threshold", fontsize=19)
    plt.legend(loc="upper right", fontsize=19)
    plt.ylim([0, 1])

plt.figure(figsize=(14, 7))
plot_precision_and_recall(precision, recall, threshold)
plt.show()

**ROC_AUC Curve**

This curve plots the true positive rate (also called recall) against the false positive rate (ratio of incorrectly classified negative instances), instead of plotting the precision versus the recall.

In [None]:
false_positive_rate, true_positive_rate, thresholds = roc_curve(y_train, y_scores)

# plotting them against each other

def plot_roc_curve(false_positive_rate, true_positive_rate, label=None):
    plt.plot(false_positive_rate, true_positive_rate, linewidth=2, label=label)
    plt.plot([0, 1], [0, 1], 'r', linewidth=4)
    plt.axis([0, 1, 0, 1])
    plt.xlabel('False Positive Rate (FPR)', fontsize=16)
    plt.ylabel('True Positive Rate (TPR)', fontsize=16)

plt.figure(figsize=(14, 7))
plot_roc_curve(false_positive_rate, true_positive_rate)
plt.show()

In [None]:
import joblib
joblib.dump(rf,'RandomForestClassifier.pkl')