<h1>IBM HR Attrition - EDA and Modelling</h1>
<p><b>Task : </b>To generate the dashboard of the dataset and perform modelling of the data to predict the attrition of any Employee.</p>
<img src="https://www.jigsawacademy.com/wp-content/uploads/2017/12/Topic-1.png" style="width : 100%;">

In [None]:
!pip install imbalanced-learn
!pip install delayed
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import sklearn
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from imblearn.over_sampling import SMOTE
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.metrics import confusion_matrix

<h1>Loading and Preparing the Data</h1>

In [None]:
data = pd.read_csv("../input/ibm-hr-analytics-attrition-dataset/WA_Fn-UseC_-HR-Employee-Attrition.csv")
data

In [None]:
data.isna().sum()

<p>Let us take a look at the columns to select and remove any unwanted columsn like ID</p>

In [None]:
data.columns
data.drop(['EmployeeNumber'],axis=1,inplace=True)
data.columns

In [None]:
data.describe()

<h1>Exploratory Data Analysis - General Dataset Analysis</h1>

In [None]:
plt.rcParams.update({'axes.facecolor':'whitesmoke'})
plt.figure(figsize=(15,10))
plt.subplots_adjust(left=0.1,
                    bottom=0.1, 
                    right=0.9, 
                    top=0.9, 
                    wspace=0.4, 
                    hspace=0.4)

plt.subplot(2,2,1)
xaxis,counts = np.unique(data['BusinessTravel'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Business Travel Type")
plt.xlabel("Travel Type")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,2)
xaxis,counts = np.unique(data['Attrition'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Attrition")
plt.xlabel("Attrition")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,3)
xaxis,counts = np.unique(data['Department'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Department Category")
plt.xlabel("Department")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,4)
plt.xticks(rotation=45)
xaxis,counts = np.unique(data['EducationField'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Education Field Types")
plt.xlabel("Field")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')
plt.show()

<p><b>Inference : </b>Majority of the employess travel only <b>Rarely</b> for business purposes. The <b>R&D</b> Department has the highest number of employees with <b>Life Sciences</b> and <b>Medical</b> Degress with most in number. The attrition number is really <b>less</b>.</p>

<h1>Exploratory Data Analysis - Category Wise Analysis</h1>
<p>Business Travel Type : Non-Travel</p>

In [None]:
plt.rcParams.update({'axes.facecolor':'whitesmoke'})
plt.figure(figsize=(15,10))
plt.subplots_adjust(left=0.1,
                    bottom=0.1, 
                    right=0.9, 
                    top=0.9, 
                    wspace=0.4, 
                    hspace=0.4)

fitlered_data = data[data['BusinessTravel'] == "Non-Travel"]
plt.subplot(2,2,1)
xaxis,counts = np.unique(fitlered_data['Attrition'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Attrition")
plt.xlabel("Attrition")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,2)
xaxis,counts = np.unique(fitlered_data['Department'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Department Category")
plt.xlabel("Department")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,3)
plt.xticks(rotation=45)
xaxis,counts = np.unique(fitlered_data['EducationField'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Education Field Types")
plt.xlabel("Field")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')
plt.show()

Business Travel Type : Travel_Frequently

In [None]:
plt.rcParams.update({'axes.facecolor':'whitesmoke'})
plt.figure(figsize=(15,10))
plt.subplots_adjust(left=0.1,
                    bottom=0.1, 
                    right=0.9, 
                    top=0.9, 
                    wspace=0.4, 
                    hspace=0.4)

fitlered_data = data[data['BusinessTravel'] == "Travel_Frequently"]
plt.subplot(2,2,1)
xaxis,counts = np.unique(fitlered_data['Attrition'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Attrition")
plt.xlabel("Attrition")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,2)
xaxis,counts = np.unique(fitlered_data['Department'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Department Category")
plt.xlabel("Department")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,3)
plt.xticks(rotation=45)
xaxis,counts = np.unique(fitlered_data['EducationField'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Education Field Types")
plt.xlabel("Field")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')
plt.show()

Business Travel Type : Travel_Rarely

In [None]:
plt.rcParams.update({'axes.facecolor':'whitesmoke'})
plt.figure(figsize=(15,10))
plt.subplots_adjust(left=0.1,
                    bottom=0.1, 
                    right=0.9, 
                    top=0.9, 
                    wspace=0.4, 
                    hspace=0.4)

fitlered_data = data[data['BusinessTravel'] == "Travel_Rarely"]
plt.subplot(2,2,1)
xaxis,counts = np.unique(fitlered_data['Attrition'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Attrition")
plt.xlabel("Attrition")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,2)
xaxis,counts = np.unique(fitlered_data['Department'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Department Category")
plt.xlabel("Department")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,3)
plt.xticks(rotation=45)
xaxis,counts = np.unique(fitlered_data['EducationField'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Education Field Types")
plt.xlabel("Field")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')
plt.show()

<p><b>Inference : </b>For all three Business Travel Types (Non-Travelling, Travelling Frequently and Travelling Rarely), There is <b>less Attirition Rate</b> with majority of the employees working for <b>R&D Department</b> with their Education Field being <b>Life Sciences</b> and <b>Medical</b>.</p>

<p>Attrition Type : No</p>

In [None]:
plt.rcParams.update({'axes.facecolor':'whitesmoke'})
plt.figure(figsize=(15,10))
plt.subplots_adjust(left=0.1,
                    bottom=0.1, 
                    right=0.9, 
                    top=0.9, 
                    wspace=0.4, 
                    hspace=0.4)

fitlered_data = data[data['Attrition'] == "No"]
plt.subplot(2,2,1)
xaxis,counts = np.unique(fitlered_data['BusinessTravel'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Business Travel Type")
plt.xlabel("Travel Type")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,2)
xaxis,counts = np.unique(fitlered_data['Department'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Department Category")
plt.xlabel("Department")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,3)
plt.xticks(rotation=45)
xaxis,counts = np.unique(fitlered_data['EducationField'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Education Field Types")
plt.xlabel("Field")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')
plt.show()

<p>Attrition Type : Yes</p>

In [None]:
plt.rcParams.update({'axes.facecolor':'whitesmoke'})
plt.figure(figsize=(15,10))
plt.subplots_adjust(left=0.1,
                    bottom=0.1, 
                    right=0.9, 
                    top=0.9, 
                    wspace=0.4, 
                    hspace=0.4)

fitlered_data = data[data['Attrition'] == "Yes"]
plt.subplot(2,2,1)
xaxis,counts = np.unique(fitlered_data['BusinessTravel'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Business Travel Type")
plt.xlabel("Travel Type")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,2)
xaxis,counts = np.unique(fitlered_data['Department'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Department Category")
plt.xlabel("Department")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,3)
plt.xticks(rotation=45)
xaxis,counts = np.unique(fitlered_data['EducationField'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Education Field Types")
plt.xlabel("Field")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')
plt.show()

<p><b>Inference : </b>In compariso with the data where there is no Attrition, it can be concluded that attrition occours in people who <b>travel frequently for business purposes</b> and those who works majorly in the sales department.</p>

<p>Department : Human Resources</p>

In [None]:
plt.rcParams.update({'axes.facecolor':'whitesmoke'})
plt.figure(figsize=(15,10))
plt.subplots_adjust(left=0.1,
                    bottom=0.1, 
                    right=0.9, 
                    top=0.9, 
                    wspace=0.4, 
                    hspace=0.4)

fitlered_data = data[data['Department'] == "Human Resources"]
plt.subplot(2,2,1)
xaxis,counts = np.unique(fitlered_data['BusinessTravel'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Business Travel Type")
plt.xlabel("Travel Type")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,2)
xaxis,counts = np.unique(fitlered_data['Attrition'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Attrition")
plt.xlabel("Attrition")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,3)
plt.xticks(rotation=45)
xaxis,counts = np.unique(fitlered_data['EducationField'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Education Field Types")
plt.xlabel("Field")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')


plt.show()

<p><b>Inference : </b>Employees of <b>HR</b> department mainly has their education field as <b>Human Resources</b> or <b>Life Sciences</b>.</p>

<p>Department : Research & Development</p>

In [None]:
plt.rcParams.update({'axes.facecolor':'whitesmoke'})
plt.figure(figsize=(15,10))
plt.subplots_adjust(left=0.1,
                    bottom=0.1, 
                    right=0.9, 
                    top=0.9, 
                    wspace=0.4, 
                    hspace=0.4)

fitlered_data = data[data['Department'] == "Research & Development"]
plt.subplot(2,2,1)
xaxis,counts = np.unique(fitlered_data['BusinessTravel'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Business Travel Type")
plt.xlabel("Travel Type")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,2)
xaxis,counts = np.unique(fitlered_data['Attrition'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Attrition")
plt.xlabel("Attrition")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,3)
plt.xticks(rotation=45)
xaxis,counts = np.unique(fitlered_data['EducationField'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Education Field Types")
plt.xlabel("Field")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')


plt.show()

<p><b>Inference : </b>Employees of <b>R&D</b> has their education field in <b>Life Sciences</b> and <b>Medical</b> with <b>Technical Degree</b> being the third highest.</p>

<p>Department : Sales</p>

In [None]:
plt.rcParams.update({'axes.facecolor':'whitesmoke'})
plt.figure(figsize=(15,10))
plt.subplots_adjust(left=0.1,
                    bottom=0.1, 
                    right=0.9, 
                    top=0.9, 
                    wspace=0.4, 
                    hspace=0.4)

fitlered_data = data[data['Department'] == "Sales"]
plt.subplot(2,2,1)
xaxis,counts = np.unique(fitlered_data['BusinessTravel'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Business Travel Type")
plt.xlabel("Travel Type")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,2)
xaxis,counts = np.unique(fitlered_data['Attrition'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Attrition")
plt.xlabel("Attrition")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,3)
plt.xticks(rotation=45)
xaxis,counts = np.unique(fitlered_data['EducationField'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Education Field Types")
plt.xlabel("Field")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')


plt.show()

<p><b>Inference : </b>Employess of <b>Sales</b> department has thier education field in <b>Life Sciences</b>,<b>Marketing</b> and <b>Medical</b>.</p>

<p>Education Field : Life Sciences</p>

In [None]:
plt.rcParams.update({'axes.facecolor':'whitesmoke'})
plt.figure(figsize=(15,10))
plt.subplots_adjust(left=0.1,
                    bottom=0.1, 
                    right=0.9, 
                    top=0.9, 
                    wspace=0.4, 
                    hspace=0.4)

fitlered_data = data[data['EducationField'] == "Life Sciences"]
plt.subplot(2,2,1)
xaxis,counts = np.unique(fitlered_data['BusinessTravel'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Business Travel Type")
plt.xlabel("Travel Type")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,2)
xaxis,counts = np.unique(fitlered_data['Attrition'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Attrition")
plt.xlabel("Attrition")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,3)
plt.xticks(rotation=45)
xaxis,counts = np.unique(fitlered_data['Department'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Department Types")
plt.xlabel("Department")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.show()

<p><b>Inference : </b>Employees of <b>Life Sciences</b> mainly work in <b>R&D</b> Department.</p>

<p>Education Field : Marketing</p>

In [None]:
plt.rcParams.update({'axes.facecolor':'whitesmoke'})
plt.figure(figsize=(15,10))
plt.subplots_adjust(left=0.1,
                    bottom=0.1, 
                    right=0.9, 
                    top=0.9, 
                    wspace=0.4, 
                    hspace=0.4)

fitlered_data = data[data['EducationField'] == "Marketing"]
plt.subplot(2,2,1)
xaxis,counts = np.unique(fitlered_data['BusinessTravel'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Business Travel Type")
plt.xlabel("Travel Type")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,2)
xaxis,counts = np.unique(fitlered_data['Attrition'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Attrition")
plt.xlabel("Attrition")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,3)
plt.xticks(rotation=45)
xaxis,counts = np.unique(fitlered_data['Department'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Department Types")
plt.xlabel("Department")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')


plt.show()

<p><b>Inference : </b>Employees of <b>Marketing</b> Education Field mainly work in Sales Department.</p>

<p>Education Field : Medical</p>

In [None]:
plt.rcParams.update({'axes.facecolor':'whitesmoke'})
plt.figure(figsize=(15,10))
plt.subplots_adjust(left=0.1,
                    bottom=0.1, 
                    right=0.9, 
                    top=0.9, 
                    wspace=0.4, 
                    hspace=0.4)

fitlered_data = data[data['EducationField'] == "Medical"]
plt.subplot(2,2,1)
xaxis,counts = np.unique(fitlered_data['BusinessTravel'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Business Travel Type")
plt.xlabel("Travel Type")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,2)
xaxis,counts = np.unique(fitlered_data['Attrition'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Attrition")
plt.xlabel("Attrition")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,3)
plt.xticks(rotation=45)
xaxis,counts = np.unique(fitlered_data['Department'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Department Types")
plt.xlabel("Department")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')


plt.show()

<p><b>Inference : </b>Employees of <b>Medical</b> Education Field mainly work in <b>R&D</b> Department.</p>

<p>Education Field : Other<p>

In [None]:
plt.rcParams.update({'axes.facecolor':'whitesmoke'})
plt.figure(figsize=(15,10))
plt.subplots_adjust(left=0.1,
                    bottom=0.1, 
                    right=0.9, 
                    top=0.9, 
                    wspace=0.4, 
                    hspace=0.4)

fitlered_data = data[data['EducationField'] == "Other"]
plt.subplot(2,2,1)
xaxis,counts = np.unique(fitlered_data['BusinessTravel'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Business Travel Type")
plt.xlabel("Travel Type")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,2)
xaxis,counts = np.unique(fitlered_data['Attrition'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Attrition")
plt.xlabel("Attrition")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,3)
plt.xticks(rotation=45)
xaxis,counts = np.unique(fitlered_data['Department'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Department Types")
plt.xlabel("Department")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')


plt.show()

<p><b>Inference : </b>Employees of <b>Other</b> education fields mainly work in <b>R&d</b> and <b>Sales</b> Departments.</p>

<p>Education Field : Technical Degree</p>

In [None]:
plt.rcParams.update({'axes.facecolor':'whitesmoke'})
plt.figure(figsize=(15,10))
plt.subplots_adjust(left=0.1,
                    bottom=0.1, 
                    right=0.9, 
                    top=0.9, 
                    wspace=0.4, 
                    hspace=0.4)

fitlered_data = data[data['EducationField'] == "Technical Degree"]
plt.subplot(2,2,1)
xaxis,counts = np.unique(fitlered_data['BusinessTravel'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Business Travel Type")
plt.xlabel("Travel Type")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,2)
xaxis,counts = np.unique(fitlered_data['Attrition'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Attrition")
plt.xlabel("Attrition")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')

plt.subplot(2,2,3)
plt.xticks(rotation=45)
xaxis,counts = np.unique(fitlered_data['Department'],return_counts=True)
plt.bar(xaxis,counts,zorder=3,color=['deepskyblue','skyblue','lightskyblue'])
plt.tick_params(left=False,bottom=False)
plt.title("Department Types")
plt.xlabel("Department")
plt.ylabel("Count")
plt.grid(True,linewidth=2.0,alpha=1,zorder=0,color='white')


plt.show()

<p><b>Inference : </b>Employees of <b>Technical Degeree</b> Education Field mainly works in <b>R&D</b> Department.</p>

In [None]:
data.info()

<p>Use LabelEncoder() to encode the Text-based features into Integers.</p>

In [None]:
data["Attrition"] = LabelEncoder().fit_transform(data['Attrition'])
data["BusinessTravel"] = LabelEncoder().fit_transform(data['BusinessTravel'])
data["Department"] = LabelEncoder().fit_transform(data['Department'])
data["EducationField"] = LabelEncoder().fit_transform(data['EducationField'])
data["Gender"] = LabelEncoder().fit_transform(data['Gender'])
data["JobRole"] = LabelEncoder().fit_transform(data['JobRole'])
data["MaritalStatus"] = LabelEncoder().fit_transform(data['MaritalStatus'])
data["Over18"] = LabelEncoder().fit_transform(data['Over18'])
data["OverTime"] = LabelEncoder().fit_transform(data['OverTime'])

In [None]:
data

<h1>Modelling</h1>
<ul>
    <li>Over sample the dataset using SMOTE</li>
    <li>Model the oversampled dataset</li>
</ul>

In [None]:
cols = list(data.columns)
cols.remove("Attrition")
sampled,target = SMOTE().fit_resample(data[cols],data["Attrition"])

In [None]:
X_train,X_test,Y_train,Y_test = train_test_split(sampled[cols],
                                                 target,
                                                 test_size = 0.3,
                                                 shuffle=True)

In [None]:
print("Train Feature Size : ",len(X_train))
print("Train Label Size : ",len(Y_train))
print("Test Feature Size : ",len(X_test))
print("Test Label Size : ",len(Y_test))

<h1>Logistic Regression Model</h1>

In [None]:
logistic_model = LogisticRegression(solver='liblinear',random_state=0).fit(X_train,Y_train)
print("Train Accuracy : {:.2f} %".format(accuracy_score(logistic_model.predict(X_train),Y_train)))
print("Test Accuracy : {:.2f} %".format(accuracy_score(logistic_model.predict(X_test),Y_test)))

cm = confusion_matrix(Y_test,logistic_model.predict(X_test))
classes = ["0","1"]
disp = ConfusionMatrixDisplay(confusion_matrix=cm,
                              display_labels=classes)
fig, ax = plt.subplots(figsize=(10,10))
plt.title("Confusion Matrix")
disp = disp.plot(ax=ax)
plt.show()

<h1>Random Forest Model</h1>

In [None]:
random_forest = RandomForestClassifier(n_estimators=590,
                                       random_state=0).fit(X_train,Y_train)
print("Train Accuracy : {:.2f} %".format(accuracy_score(random_forest.predict(X_train),Y_train)))
print("Test Accuracy : {:.2f} %".format(accuracy_score(random_forest.predict(X_test),Y_test)))

cm = confusion_matrix(Y_test,random_forest.predict(X_test))
classes = ["0","1"]
disp = ConfusionMatrixDisplay(confusion_matrix=cm,
                              display_labels=classes)
fig, ax = plt.subplots(figsize=(10,10))
plt.title("Confusion Matrix")
disp = disp.plot(ax=ax)
plt.show()

<p><b>Inference : </b>Thus, the model performs in a good manner with 91% accuracy and relatively less misclassifications.</p>

<h1 style="margin:auto;text-align:center;background-color:rgb(232, 230, 223);border-radius : 5px;padding-top : 25px;padding-bottom : 25px; width : 100%;font-size : 25px;">Thank you for reading! Upvote and share my notebook if you liked it</h1>