## HR Analytics Employee Attrition & Performance

### What does employee attrition mean?
- The attrition rate is a calculation of the number of individuals that leave or move out of a business over a specified time frame.

### What is the difference between employee turnover and attrition?
Both terms are used when an employee leaves a company but with a difference in processes, such as discharge, termination, or resignation. 
- Attrition occurs when an employee retires or when the employer terminates the job position.
- When employee turnover occurs, the employers look for a replacement.

### What is a good employee attrition rate?
A good, average turnover rate is around 10%.

### What are some effective methods to reduce a high attrition rate?
* Recruit and hire the right people.
* Offer benefits and compensation.
* Offer flexibility.
* Pay attention to employee engagement.
* Recognize and reward employees.

Well, let's get down to work then, let's see what's going on at IBM!

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import rcParams

%config InlineBackend.figure_format = 'retina'
sns.set_style("white")
rcParams['figure.figsize'] = 6,4
import matplotlib.ticker as ticker

import os
from IPython.display import Image
import warnings    # We want to suppress warnings
warnings.filterwarnings("ignore")    # Ignore warnings

## retina is used to see the image in HD formate
## if any warnings are occured we can reduce the warnings

In [3]:
df=pd.read_csv(r'C:\Files\HR-Employee-Attrition- IBM.csv')

In [4]:
df

Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeNumber,...,RelationshipSatisfaction,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
0,41,Yes,Travel_Rarely,1102,Sales,1,2,Life Sciences,1,1,...,1,80,0,8,0,1,6,4,0,5
1,49,No,Travel_Frequently,279,Research & Development,8,1,Life Sciences,1,2,...,4,80,1,10,3,3,10,7,1,7
2,37,Yes,Travel_Rarely,1373,Research & Development,2,2,Other,1,4,...,2,80,0,7,3,3,0,0,0,0
3,33,No,Travel_Frequently,1392,Research & Development,3,4,Life Sciences,1,5,...,3,80,0,8,3,3,8,7,3,0
4,27,No,Travel_Rarely,591,Research & Development,2,1,Medical,1,7,...,4,80,1,6,3,3,2,2,2,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1465,36,No,Travel_Frequently,884,Research & Development,23,2,Medical,1,2061,...,3,80,1,17,3,3,5,2,0,3
1466,39,No,Travel_Rarely,613,Research & Development,6,1,Medical,1,2062,...,1,80,1,9,5,3,7,7,1,7
1467,27,No,Travel_Rarely,155,Research & Development,4,3,Life Sciences,1,2064,...,2,80,1,6,0,3,6,2,0,3
1468,49,No,Travel_Frequently,1023,Sales,2,3,Medical,1,2065,...,4,80,0,17,3,2,9,6,0,8


In [21]:
df.head()

Unnamed: 0,Age,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EnvironmentSatisfaction,Gender,HourlyRate,...,RelationshipSatisfaction,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager,AttritionNumeric
0,41,Travel_Rarely,1102,Sales,1,2,Life Sciences,2,Female,94,...,1,0,8,0,1,6,4,0,5,1
1,49,Travel_Frequently,279,Research & Development,8,1,Life Sciences,3,Male,61,...,4,1,10,3,3,10,7,1,7,0
2,37,Travel_Rarely,1373,Research & Development,2,2,Other,4,Male,92,...,2,0,7,3,3,0,0,0,0,1
3,33,Travel_Frequently,1392,Research & Development,3,4,Life Sciences,4,Female,56,...,3,0,8,3,3,8,7,3,0,0
4,27,Travel_Rarely,591,Research & Development,2,1,Medical,1,Male,40,...,4,1,6,3,3,2,2,2,2,0


## check the null values in dataset

In [6]:
df.isna()
## No null values

Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeNumber,...,RelationshipSatisfaction,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1465,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1466,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1467,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1468,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [8]:
df.isnull().sum()

# ther is no null valus in data frame

Age                         0
Attrition                   0
BusinessTravel              0
DailyRate                   0
Department                  0
DistanceFromHome            0
Education                   0
EducationField              0
EmployeeCount               0
EmployeeNumber              0
EnvironmentSatisfaction     0
Gender                      0
HourlyRate                  0
JobInvolvement              0
JobLevel                    0
JobRole                     0
JobSatisfaction             0
MaritalStatus               0
MonthlyIncome               0
MonthlyRate                 0
NumCompaniesWorked          0
Over18                      0
OverTime                    0
PercentSalaryHike           0
PerformanceRating           0
RelationshipSatisfaction    0
StandardHours               0
StockOptionLevel            0
TotalWorkingYears           0
TrainingTimesLastYear       0
WorkLifeBalance             0
YearsAtCompany              0
YearsInCurrentRole          0
YearsSince

## Convert the o/p variables or attrition categorical to numerical using map function


In [9]:
a={"Yes":1,"No":0}
df["AttritionNumeric"]=df["Attrition"].map(a)

In [10]:
df["AttritionNumeric"]

0       1
1       0
2       1
3       0
4       0
       ..
1465    0
1466    0
1467    0
1468    0
1469    0
Name: AttritionNumeric, Length: 1470, dtype: int64

In [11]:
df["AttritionNumeric"].value_counts()

0    1233
1     237
Name: AttritionNumeric, dtype: int64

## If any unwanted columns avilable fo the data anlysis remove from the dataset

In [None]:
df.drop(['EmployeeCount', 'EmployeeNumber', 'StandardHours',"Attrition"], axis="columns",'EmployeeCount', 'EmployeeNumber', 'Over18', 'StandardHours',"Attrition"], axis="columns", inplace=True))

# it drops the given columns from the dataset True means removes permenantly

In [19]:
df.head()

Unnamed: 0,Age,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EnvironmentSatisfaction,Gender,HourlyRate,...,RelationshipSatisfaction,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager,AttritionNumeric
0,41,Travel_Rarely,1102,Sales,1,2,Life Sciences,2,Female,94,...,1,0,8,0,1,6,4,0,5,1
1,49,Travel_Frequently,279,Research & Development,8,1,Life Sciences,3,Male,61,...,4,1,10,3,3,10,7,1,7,0
2,37,Travel_Rarely,1373,Research & Development,2,2,Other,4,Male,92,...,2,0,7,3,3,0,0,0,0,1
3,33,Travel_Frequently,1392,Research & Development,3,4,Life Sciences,4,Female,56,...,3,0,8,3,3,8,7,3,0,0
4,27,Travel_Rarely,591,Research & Development,2,1,Medical,1,Male,40,...,4,1,6,3,3,2,2,2,2,0


In [24]:
df.describe()

Unnamed: 0,Age,DailyRate,DistanceFromHome,Education,EnvironmentSatisfaction,HourlyRate,JobInvolvement,JobLevel,JobSatisfaction,MonthlyIncome,...,RelationshipSatisfaction,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager,AttritionNumeric
count,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,...,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0
mean,36.92381,802.485714,9.192517,2.912925,2.721769,65.891156,2.729932,2.063946,2.728571,6502.931293,...,2.712245,0.793878,11.279592,2.79932,2.761224,7.008163,4.229252,2.187755,4.123129,0.161224
std,9.135373,403.5091,8.106864,1.024165,1.093082,20.329428,0.711561,1.10694,1.102846,4707.956783,...,1.081209,0.852077,7.780782,1.289271,0.706476,6.126525,3.623137,3.22243,3.568136,0.367863
min,18.0,102.0,1.0,1.0,1.0,30.0,1.0,1.0,1.0,1009.0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
25%,30.0,465.0,2.0,2.0,2.0,48.0,2.0,1.0,2.0,2911.0,...,2.0,0.0,6.0,2.0,2.0,3.0,2.0,0.0,2.0,0.0
50%,36.0,802.0,7.0,3.0,3.0,66.0,3.0,2.0,3.0,4919.0,...,3.0,1.0,10.0,3.0,3.0,5.0,3.0,1.0,3.0,0.0
75%,43.0,1157.0,14.0,4.0,4.0,83.75,3.0,3.0,4.0,8379.0,...,4.0,1.0,15.0,3.0,3.0,9.0,7.0,3.0,7.0,0.0
max,60.0,1499.0,29.0,5.0,4.0,100.0,4.0,5.0,4.0,19999.0,...,4.0,3.0,40.0,6.0,4.0,40.0,18.0,15.0,17.0,1.0


## If we want interchange the order for rows and columns we use transpose


In [25]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Age,1470.0,36.92381,9.135373,18.0,30.0,36.0,43.0,60.0
DailyRate,1470.0,802.485714,403.5091,102.0,465.0,802.0,1157.0,1499.0
DistanceFromHome,1470.0,9.192517,8.106864,1.0,2.0,7.0,14.0,29.0
Education,1470.0,2.912925,1.024165,1.0,2.0,3.0,4.0,5.0
EnvironmentSatisfaction,1470.0,2.721769,1.093082,1.0,2.0,3.0,4.0,4.0
HourlyRate,1470.0,65.891156,20.329428,30.0,48.0,66.0,83.75,100.0
JobInvolvement,1470.0,2.729932,0.711561,1.0,2.0,3.0,3.0,4.0
JobLevel,1470.0,2.063946,1.10694,1.0,1.0,2.0,3.0,5.0
JobSatisfaction,1470.0,2.728571,1.102846,1.0,2.0,3.0,4.0,4.0
MonthlyIncome,1470.0,6502.931293,4707.956783,1009.0,2911.0,4919.0,8379.0,19999.0


* The average age is 36 years, there are very young people with 18 years (young trainees or interns?) and people with 60 years (managers with many years of work?).
* There are people who live very close to work, almost next door, and others are almost an hour's drive away!
* Salaries are a world of their own! The average is 6500 usd, the lowest earners reach 1000 usd (the young trainees?) and the highest earners almost 20000 usd (that 60 year old boss with many years behind him?).
* There are employees who have been working for almost 40 years and others who have just started! The average, however, is 7 years (I don't think that's bad at all).

## We are going to separate the workers into two dataframes according to Attrition and compare their averages

In [26]:
at_yes=df.loc[df["AttritionNumeric"]==1]
at_No=df.loc[df["AttritionNumeric"]==0]


In [27]:
at_yes

Unnamed: 0,Age,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EnvironmentSatisfaction,Gender,HourlyRate,...,RelationshipSatisfaction,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager,AttritionNumeric
0,41,Travel_Rarely,1102,Sales,1,2,Life Sciences,2,Female,94,...,1,0,8,0,1,6,4,0,5,1
2,37,Travel_Rarely,1373,Research & Development,2,2,Other,4,Male,92,...,2,0,7,3,3,0,0,0,0,1
14,28,Travel_Rarely,103,Research & Development,24,3,Life Sciences,3,Male,50,...,2,0,6,4,3,4,2,0,3,1
21,36,Travel_Rarely,1218,Sales,9,4,Life Sciences,3,Male,82,...,2,0,10,4,3,5,3,0,3,1
24,34,Travel_Rarely,699,Research & Development,6,1,Medical,2,Male,83,...,3,0,8,2,3,4,2,1,3,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1438,23,Travel_Frequently,638,Sales,9,3,Marketing,4,Male,33,...,1,1,1,3,2,1,0,1,0,1
1442,29,Travel_Rarely,1092,Research & Development,1,4,Medical,1,Male,36,...,2,3,4,3,4,2,2,2,2,1
1444,56,Travel_Rarely,310,Research & Development,7,2,Technical Degree,4,Male,72,...,4,1,14,4,1,10,9,9,8,1
1452,50,Travel_Frequently,878,Sales,1,4,Life Sciences,2,Male,94,...,4,2,12,3,3,6,3,0,1,1


In [28]:
at_No

Unnamed: 0,Age,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EnvironmentSatisfaction,Gender,HourlyRate,...,RelationshipSatisfaction,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager,AttritionNumeric
1,49,Travel_Frequently,279,Research & Development,8,1,Life Sciences,3,Male,61,...,4,1,10,3,3,10,7,1,7,0
3,33,Travel_Frequently,1392,Research & Development,3,4,Life Sciences,4,Female,56,...,3,0,8,3,3,8,7,3,0,0
4,27,Travel_Rarely,591,Research & Development,2,1,Medical,1,Male,40,...,4,1,6,3,3,2,2,2,2,0
5,32,Travel_Frequently,1005,Research & Development,2,2,Life Sciences,4,Male,79,...,3,0,8,2,2,7,7,3,6,0
6,59,Travel_Rarely,1324,Research & Development,3,3,Medical,3,Female,81,...,1,3,12,3,2,1,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1465,36,Travel_Frequently,884,Research & Development,23,2,Medical,3,Male,41,...,3,1,17,3,3,5,2,0,3,0
1466,39,Travel_Rarely,613,Research & Development,6,1,Medical,4,Male,42,...,1,1,9,5,3,7,7,1,7,0
1467,27,Travel_Rarely,155,Research & Development,4,3,Life Sciences,2,Male,87,...,2,1,6,0,3,6,2,0,3,0
1468,49,Travel_Frequently,1023,Sales,2,3,Medical,4,Male,63,...,4,0,17,3,2,9,6,0,8,0


## Now join the 2 dataframes and rename the columns and give color 

In [30]:
## here I am replace the 0 and 1 or Yes no with mean values
df_mean=pd.concat((at_yes.mean(),at_No.mean()),axis=1)
df_mean.columns=['Yes','No']
df_mean.round(2).style.background_gradient(cmap="tab20_r",axis=1)

Unnamed: 0,Yes,No
Age,33.61,37.56
DailyRate,750.36,812.5
DistanceFromHome,10.63,8.92
Education,2.84,2.93
EnvironmentSatisfaction,2.46,2.77
HourlyRate,65.57,65.95
JobInvolvement,2.52,2.77
JobLevel,1.64,2.15
JobSatisfaction,2.47,2.78
MonthlyIncome,4787.09,6832.74


### Characteristics of employees who want to leave the IBM company

* They tend to be younger, 33 years old on average, they tend to live farther from work, they are less satisfied with the work environment, they have lower incomes (everything is correlated because it is normal to earn more the more years you are in the company).
* They have been working less years, they have less rights to stock options (as before, the more years in the company, the more rights you have).

## What is a good employee attrition rate?
A good average turnover rate is around 10%.