# EMPLOYEE ATTRITION CONTROL

### CASE STUDY
The project is to help a company which is trying to control attrition. There are two sets of data: "Existing employees" and "Employees who have left". Following attributes are available for every employee;
-Satisfaction Level
-Last evaluation
-Number of projects
-Average monthly hours
-Time spent at the company
-Whether they have had a work accident
-Whether they have had a promotion in the last 5 years
-Departments (column sales)
-Salary

Use your analytics skills to answer the following questions :
1. What type of employees are leaving?
2. Which employees are prone to leave next.
3. Recommendations

# DATA EXPLORATION

In [1]:
import numpy as np
import pandas as pd

In [2]:
train = pd.read_csv(r"C:\Users\DELL\Documents\train.csv")
test = pd.read_csv(r"C:\Users\DELL\Documents\test2.csv")

In [3]:
train.head()

Unnamed: 0,Emp ID,satisfaction_level,last_evaluation,number_project,average_montly_hours,Work_accident,promotion_last_5years,dept,salary,Status,YearOfRecruitment
0,1,0.58,0.74,4,215,0,0,sales,low,0,2016
1,4,0.78,0.82,5,247,0,0,sales,low,0,2016
2,5,0.49,0.6,3,214,0,0,sales,low,0,2017
3,6,0.36,0.95,3,206,0,0,sales,low,0,2015
4,7,0.54,0.37,2,176,0,0,sales,low,0,2017


In [4]:
test.head()

Unnamed: 0,Emp ID,satisfaction_level,last_evaluation,number_project,average_montly_hours,Work_accident,promotion_last_5years,dept,salary,YearOfRecruitment
0,2,0.82,0.67,2,202,0,0,sales,low,2016
1,3,0.45,0.69,5,193,0,0,sales,low,2016
2,8,0.99,0.91,5,136,0,0,sales,low,2015
3,10,0.74,0.64,4,268,0,0,sales,low,2016
4,13,0.48,0.94,5,255,0,0,accounting,medium,2013


In [5]:
train.isnull().mean()

Emp ID                   0.0
satisfaction_level       0.0
last_evaluation          0.0
number_project           0.0
average_montly_hours     0.0
Work_accident            0.0
promotion_last_5years    0.0
dept                     0.0
salary                   0.0
Status                   0.0
YearOfRecruitment        0.0
dtype: float64

In [6]:
test.isnull().mean()

Emp ID                   0.0
satisfaction_level       0.0
last_evaluation          0.0
number_project           0.0
average_montly_hours     0.0
Work_accident            0.0
promotion_last_5years    0.0
dept                     0.0
salary                   0.0
YearOfRecruitment        0.0
dtype: float64

In [7]:
train.drop(['dept'], axis=1, inplace=True)
test.drop(['dept'], axis=1, inplace=True)

In [8]:
train.head()

Unnamed: 0,Emp ID,satisfaction_level,last_evaluation,number_project,average_montly_hours,Work_accident,promotion_last_5years,salary,Status,YearOfRecruitment
0,1,0.58,0.74,4,215,0,0,low,0,2016
1,4,0.78,0.82,5,247,0,0,low,0,2016
2,5,0.49,0.6,3,214,0,0,low,0,2017
3,6,0.36,0.95,3,206,0,0,low,0,2015
4,7,0.54,0.37,2,176,0,0,low,0,2017


In [9]:
test.head()

Unnamed: 0,Emp ID,satisfaction_level,last_evaluation,number_project,average_montly_hours,Work_accident,promotion_last_5years,salary,YearOfRecruitment
0,2,0.82,0.67,2,202,0,0,low,2016
1,3,0.45,0.69,5,193,0,0,low,2016
2,8,0.99,0.91,5,136,0,0,low,2015
3,10,0.74,0.64,4,268,0,0,low,2016
4,13,0.48,0.94,5,255,0,0,medium,2013


In [10]:
column = ['salary']
train=pd.get_dummies(train, columns=column, drop_first=True)
test=pd.get_dummies(test,columns=column, drop_first=True)

In [11]:
train.head()

Unnamed: 0,Emp ID,satisfaction_level,last_evaluation,number_project,average_montly_hours,Work_accident,promotion_last_5years,Status,YearOfRecruitment,salary_low,salary_medium
0,1,0.58,0.74,4,215,0,0,0,2016,1,0
1,4,0.78,0.82,5,247,0,0,0,2016,1,0
2,5,0.49,0.6,3,214,0,0,0,2017,1,0
3,6,0.36,0.95,3,206,0,0,0,2015,1,0
4,7,0.54,0.37,2,176,0,0,0,2017,1,0


In [12]:
test.head()

Unnamed: 0,Emp ID,satisfaction_level,last_evaluation,number_project,average_montly_hours,Work_accident,promotion_last_5years,YearOfRecruitment,salary_low,salary_medium
0,2,0.82,0.67,2,202,0,0,2016,1,0
1,3,0.45,0.69,5,193,0,0,2016,1,0
2,8,0.99,0.91,5,136,0,0,2015,1,0
3,10,0.74,0.64,4,268,0,0,2016,1,0
4,13,0.48,0.94,5,255,0,0,2013,0,1


# DATA MODELLING

In [13]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import train_test_split

In [14]:
X = train.drop(['Status'], axis=1)
y = train['Status']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=42)

In [15]:
log = LogisticRegression()

In [16]:
log.fit(X_train, y_train)



LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)

# DATA PREDICTION

In [17]:
prediction = log.predict(X_test)

In [18]:
accuracy_score(prediction, y_test)

0.9987361769352291

In [19]:
confusion_matrix(prediction, y_test)

array([[2427,    3],
       [   1,  734]], dtype=int64)

In [20]:
id = test['Emp ID']

In [21]:
id.head(30)

0      2
1      3
2      8
3     10
4     13
5     17
6     18
7     19
8     23
9     29
10    36
11    39
12    40
13    42
14    45
15    46
16    50
17    51
18    54
19    55
20    58
21    59
22    64
23    68
24    71
25    74
26    75
27    76
28    77
29    78
Name: Emp ID, dtype: int64

In [22]:
final_prediction = log.predict(test)

In [23]:
final_prediction.shape

(4450,)

In [24]:
submission = pd.DataFrame({'Emp ID':id, 'Status':final_prediction})

In [26]:
submission.to_csv('Employee Model', index = False)

In [27]:
submission

Unnamed: 0,Emp ID,Status
0,2,0
1,3,0
2,8,0
3,10,0
4,13,0
5,17,0
6,18,0
7,19,0
8,23,0
9,29,0


# Conclusion

In this emplyee attrition control model, we have seen that many of the employees of the company will leave because they have  not been promoted for the past 5years.

# Recommendation

The company should endeavour to promote their employees to avoid the employees leaving the company