# Attrition Prediction APP

What is **Attrition** ?
It is basically the turn over rate of employees of the company. It is measured as a percentage of the total workforce calculated over specific periods like monthly, quarterly or annually

Reasons for **attrition**
- employees looking for better opportunities
- A negative work environment
- Lack of work life balance
- Bad management
- Employee sickness

### Problem Statement
Create a Web App that takes input form user online

### Dataset Description The Dataset contains following attribute
1. satisfaction_level- A numerical score of the overall satisfaction of employee
2. last_evaluation- The score of employees last evaluation
3. number_projects- The number of projects handled by thge employee
4. average_montly_hours- Average monthly hours worked by employee
5. time_spend_company- Years spent at the company by employee
6. Work_accident- Whether the employee had a workplace accident (1=Yes/0=No)
7. quit- Whether the employee g=has quit the job (1=Yes/0=No)
8. promotion_last_5_years- whether the employee received a promotion in last 5 years (1=Yes/0=No)
9. department- department of the employee
10. salary- salary level of the employee

### Tasks to be performed
1. Import necessary libraries
2. Data Exploration
3. Data Cleaning'
4. EDA
5. Data Preprocessing
6. Model fit and Evaluate
7. OPtimize the model
8. Interpret the model
9. Deploy the model


In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.tree import DecisionTreeClassifier
import pickle
import streamlit 

In [3]:
data=pd.read_csv('employee_data .csv')
data.head()

Unnamed: 0,satisfaction_level,last_evaluation,number_project,average_montly_hours,time_spend_company,Work_accident,quit,promotion_last_5years,department,salary
0,0.38,0.53,2,157,3,0,1,0,sales,low
1,0.8,0.86,5,262,6,0,1,0,sales,medium
2,0.11,0.88,7,272,4,0,1,0,sales,medium
3,0.72,0.87,5,223,5,0,1,0,sales,low
4,0.37,0.52,2,159,3,0,1,0,sales,low


In [4]:
data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
satisfaction_level,14999.0,0.612834,0.248631,0.09,0.44,0.64,0.82,1.0
last_evaluation,14999.0,0.716102,0.171169,0.36,0.56,0.72,0.87,1.0
number_project,14999.0,3.803054,1.232592,2.0,3.0,4.0,5.0,7.0
average_montly_hours,14999.0,201.050337,49.943099,96.0,156.0,200.0,245.0,310.0
time_spend_company,14999.0,3.498233,1.460136,2.0,3.0,3.0,4.0,10.0
Work_accident,14999.0,0.14461,0.351719,0.0,0.0,0.0,0.0,1.0
quit,14999.0,0.238083,0.425924,0.0,0.0,0.0,0.0,1.0
promotion_last_5years,14999.0,0.021268,0.144281,0.0,0.0,0.0,0.0,1.0


In [5]:
data.describe(include='O')

Unnamed: 0,department,salary
count,14999,14999
unique,10,3
top,sales,low
freq,4140,7316


In [6]:
data.dtypes

satisfaction_level       float64
last_evaluation          float64
number_project             int64
average_montly_hours       int64
time_spend_company         int64
Work_accident              int64
quit                       int64
promotion_last_5years      int64
department                object
salary                    object
dtype: object

In [7]:
data.duplicated().sum()

3008

In [9]:
data.quit.unique()

array([1, 0], dtype=int64)

In [10]:
data.shape

(14999, 10)

In [12]:
df=data.drop_duplicates(keep='first')

In [13]:
df.shape

(11991, 10)

### Preprocessing

In [14]:
df.head(2)

Unnamed: 0,satisfaction_level,last_evaluation,number_project,average_montly_hours,time_spend_company,Work_accident,quit,promotion_last_5years,department,salary
0,0.38,0.53,2,157,3,0,1,0,sales,low
1,0.8,0.86,5,262,6,0,1,0,sales,medium


In [15]:
X=df.drop('quit', axis=1)
y=df.quit

In [16]:
features=pd.get_dummies(X, dtype=float)

In [17]:
output, uniques=pd.factorize(y)

In [18]:
x_train, x_test, y_train, y_test=train_test_split(features, output, test_size=0.3)

In [19]:
from sklearn.ensemble import RandomForestClassifier
rfc=RandomForestClassifier()
rfc.fit(x_train, y_train)
y_pred=rfc.predict(x_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.98      0.91      0.94       612
           1       0.98      1.00      0.99      2986

    accuracy                           0.98      3598
   macro avg       0.98      0.95      0.97      3598
weighted avg       0.98      0.98      0.98      3598



In [20]:
model=open('rfc.pickle', 'wb')
pickle.dump(rfc, model)
model.close()

In [21]:
map_pickle=open('map.pickle', 'wb')
pickle.dump(uniques, map_pickle)
map_pickle.close()

In [22]:
%%writefile att.py
import streamlit as st
import pickle 
model=open('rfc.pickle', 'rb')
clf=pickle.load(model)
model.close()

map_pickle=open('map.pickle', 'rb')
unique_mapping=pickle.load(map_pickle)
map_pickle.close()

satisfaction_level=st.number_input('satisfaction_level', 0.0, 1.0, 0.0)
last_evaluation=st.number_input('last_evaluation', 0.0, 1.0, 0.0)
number_project= st.number_input('number_project', 2,7,2)
average_montly_hours=st.number_input('average_montly_hours',96, 310, 96)
time_spend_company=st.number_input('time_spend_company', 2, 10, 2)
Work_accident=st.selectbox('Work_accident', options=[0,1])
promotion_last_5years=st.selectbox('promotion_last_5years', options=[0,1])
department=st.selectbox('department', options=['sales', 'accounting', 'hr', 'technical', 'support', 'management',
       'IT', 'product_mng', 'marketing', 'RandD'])
salary=st.selectbox('salary', options=['low', 'medium', 'high'])

department_IT, department_RandD,department_accounting, department_hr, department_management,department_marketing, department_product_mng, department_sales, department_support,  department_technical=0,0,0,0,0,0,0,0,0,0,

salary_high,salary_low,salary_medium=0,0,0

if 'salary'=='high':
    salary_high=1
elif 'salary'=='medium':
    salary_medium=1
elif 'salary'=='low':
    salary_low==1
    
if 'department'=='sales':
    department_sales==1
elif 'department'=='accounting':
    department_accounting=1
elif 'department'=='hr':
    department_hr=1
elif 'department'=='technical':
    department_technical==1
elif 'department'=='support':
    department_support==1
elif 'department'=='management':
    department_management==1
elif 'department'== 'IT':
    department_IT=1
elif 'department'=='product_mng':
    department_product_mng=1
elif 'department'=='marketing':
    department_marketing=1
elif 'department'=='RandD':
    department_RandD=1
    
new_pred=clf.predict([[satisfaction_level, last_evaluation, number_project,
       average_montly_hours, time_spend_company, Work_accident,
       promotion_last_5years, department_IT, department_RandD,
       department_accounting, department_hr, department_management,
       department_marketing, department_product_mng, department_sales,
       department_support, department_technical, salary_high,
       salary_low, salary_medium]])

prediction=unique_mapping[new_pred][0]
st.write(prediction)

Writing att.py


# END