Employee Turnover Analytics

Project Overview

Portobello Tech, an app innovator, aims to predict employee turnover using historical data. This project utilizes machine learning techniques to analyze factors influencing employee attrition and propose retention strategies.

## Data Source

The dataset used for analysis is sourced from [HR Analytics on Kaggle](https://www.kaggle.com/liujiaqi/hr-comma-sepcsv). It includes various features such as satisfaction level, last evaluation rating, number of projects, average monthly hours, years spent in the company, work accidents, promotions, department, and salary.

## Project Goals

1. **Data Quality Checks**
   - Identify and handle missing values.
   
2. **Exploratory Data Analysis (EDA)**
   - Determine factors contributing to employee turnover.
   - Visualize correlations and distributions of key features.
   - Analyze project involvement of employees who left vs. stayed.

3. **Clustering Analysis**
   - Cluster employees who left based on satisfaction and evaluation.
   - Interpret clusters to understand employee segments.

4. **Handling Class Imbalance**
   - Use SMOTE technique to address class imbalance in the dataset.

5. **Model Building and Evaluation**
   - Train logistic regression, random forest, and gradient boosting classifiers.
   - Perform 5-fold cross-validation and evaluate model performance.
   - Select the best model based on evaluation metrics.

6. **Model Performance Metrics**
   - Assess ROC/AUC, confusion matrices, and classification reports.
   - Determine appropriate metrics (e.g., recall or precision) for model evaluation.

7. **Retention Strategies**
   - Predict turnover probabilities using the best model.
   - Categorize employees into risk zones (Safe, Low-Risk, Medium-Risk, High-Risk).
   - Recommend retention strategies tailored to each risk zone.

## Usage Instructions

1. **Data Preparation**
   - Ensure all dependencies (Python libraries) are installed.
   - Download and preprocess the dataset as per preprocessing steps outlined.

2. **Executing the Analysis**
   - Run scripts/notebooks sequentially for data cleaning, EDA, clustering, and modeling.
   - Review visualizations and analysis outputs to derive insights.

3. **Interpreting Results**
   - Refer to classification reports, ROC curves, and retention strategy recommendations.
   - Adjust strategies based on model predictions and business context.

4. **Conclusion**
   - Summarize findings, implications for employee retention, and future enhancements.

## Contributors

- Priyadharshini Shankar - ML Developer, HR Department

## Acknowledgments

- Kaggle (Data Source)
- Imbalanced-Learn (SMOTE Implementation)
- Scikit-Learn, Matplotlib, Seaborn (Python Libraries)

---

Feel free to customize this template based on specific details and findings from your analysis.