
# 🏢 Predicting Employee Attrition: A Decision-Support Tool for HR Strategy

This project uses HR data to identify patterns in employee attrition and predict who is at risk of leaving. It aims to help HR departments proactively improve retention using data-driven insights.


## 📂 Load Dataset

In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay

import warnings
warnings.filterwarnings('ignore')

df = pd.read_csv("HR_capstone_dataset.csv")
df.head()


## 🧹 Clean and Prepare the Data

In [None]:

df = df.rename(columns={
    'Work_accident': 'work_accident',
    'average_montly_hours': 'average_monthly_hours',
    'time_spend_company': 'tenure',
    'Department': 'department'
})
df = df.drop_duplicates()

# Encode salary
df['salary'] = df['salary'].map({'low': 0, 'medium': 1, 'high': 2})

# One-hot encode department
df = pd.get_dummies(df, columns=['department'], drop_first=True)
df.head()


## 📊 Visual Insights

In [None]:

# Attrition by department
plt.figure(figsize=(10,4))
sns.countplot(data=df, x='salary', hue='left')
plt.title("Attrition by Salary Level")
plt.grid(True)
plt.show()

# Satisfaction vs. Attrition
plt.figure(figsize=(8,4))
sns.boxplot(data=df, x='left', y='satisfaction_level')
plt.title("Satisfaction Level by Attrition Status")
plt.grid(True)
plt.show()


## 🤖 Train the Model

In [None]:

X = df.drop(columns='left')
y = df['left']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print(classification_report(y_test, y_pred))
ConfusionMatrixDisplay.from_estimator(model, X_test, y_test)


## 📌 Feature Importance

In [None]:

importances = pd.Series(model.feature_importances_, index=X.columns)
importances.sort_values().plot(kind='barh', figsize=(10,6), title='Feature Importances')
plt.grid(True)
plt.show()


## 🧠 Predict Attrition for a Hypothetical Employee

In [None]:

def predict_attrition(model, satisfaction_level, last_evaluation, number_project, 
                      average_monthly_hours, tenure, work_accident, promotion_last_5years,
                      salary, department_vector):
    input_data = [satisfaction_level, last_evaluation, number_project,
                  average_monthly_hours, tenure, work_accident, promotion_last_5years,
                  salary] + department_vector
    pred = model.predict_proba([input_data])[0]
    print(f"Likelihood of staying: {pred[0]*100:.1f}%, Leaving: {pred[1]*100:.1f}%")

# Example call: assume Sales dept is encoded, others are 0
predict_attrition(model, 0.4, 0.6, 4, 180, 3, 0, 0, 0, [1,0,0,0,0,0,0,0])



## ✅ Conclusion

This notebook demonstrates how HR teams can use data to identify potential attrition risks and take preemptive action. Visualizations and model explanations help make these predictions transparent and actionable.
