---

# **Part 2: Construct and Execute Stage**  
---

### **Transition from Part 1**  
In **Part 1: Plan and Analyze Stages**, I established the project scope, aligned with business objectives, prepared and cleaned the dataset, and performed comprehensive exploratory data analysis (EDA) to uncover key patterns in employee attrition. This phase provided insights into workload, satisfaction, and turnover trends.

Now, in **Part 2**, I will transition from **EDA** to **predictive modeling**, implementing both **statistical analysis techniques** and **machine learning methods**. This stage will focus on developing and evaluating models, including multiple logistic regression and machine learning models like Decision Tree and Random Forest, to predict employee turnover risk. I will also compare model performance and select the best-performing model for turnover prediction.

### **Objectives of Part 2** 

![stages_part2.png](attachment:stages_part2.png)

- **Construct Stage – Statistical Analysis and Regression Modeling:** Built and evaluated a series of predictive models, starting with logistic regression as a baseline and progressing to machine learning models like decision trees and random forest. Selected modeling approaches based on performance metrics and interpretability, ensuring that data did not include any post-attrition information to avoid data leakage.

- **Execute Stage – Machine Learning Models:** Presented findings and recommendations to stakeholders through visualizations and an executive summary. Highlighted the benefits and limitations of each model, explained key drivers of employee attrition, and provided actionable recommendations for HR to improve retention and satisfaction. Incorporated feedback from stakeholders to refine the final recommendations.

These stages will be crucial in refining the classification model and ensuring robust performance for employee turnover detection.

### **Looking Back: Part 1**  
To explore the project foundation—covering dataset structuring, advanced exploratory data analysis, and key data visualizations—refer to:"

➡ **[Part 1: Salifort_Motors_Turnover_Part1_Plan_and_Analyze_Includes_EDA.ipynb](https://github.com/Cyberoctane29/Salifort-Motors-Predicting-Employee-Turnover-and-Improving-Retention/blob/main/Project%20Parts/Salifort_Motors_Turnover_Part1_Plan_and_Analyze_Includes_EDA.ipynb)**  

https://github.com/Cyberoctane29/Salifort-Motors-Predicting-Employee-Turnover-and-Improving-Retention/blob/main/Project%20Parts/Salifort_Motors_Turnover_Part1_Plan_and_Analyze_Includes_EDA.ipynb

---

<img src="https://i.ibb.co/fgz5hQq/Construct.png" alt="Construct" align=left style="margin-right: 15px;">

# **Construct Stage**

## Milestone 1: Determine which models are most appropriate - Analyze and Construct

### Import packages

I will start by importing the required Python libraries:

- **pandas** for data manipulation and analysis  
- **numpy** for numerical computations and handling arrays  
- **matplotlib** and **seaborn** for data visualization and creating plots  
- **scikit-learn** libraries for machine learning models, including **LogisticRegression**, **DecisionTreeClassifier**, and **RandomForestClassifier**
- **GridSearchCV** and **train_test_split** for splitting data and tuning models
- **various metrics** such as **accuracy_score**, **precision_score**, and **f1_score** to evaluate the models
- **pickle** for saving and loading models

**Versions:**  
- **NumPy:** 2.2.4  
- **Pandas:** 2.2.3  
- **Matplotlib:** 3.10.1  
- **Seaborn:** 0.13.2
- **Scikit-learn Version:** 1.6.1

In [None]:
# For data manipulation
import numpy as np
import pandas as pd

# For data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# For displaying all of the columns in dataframes
pd.set_option('display.max_columns', None)


from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

# For metrics and helpful functions
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import (accuracy_score, precision_score, recall_score,
                             f1_score, confusion_matrix, classification_report,
                             roc_auc_score, roc_curve, precision_recall_curve,
                             average_precision_score, auc, ConfusionMatrixDisplay)

from sklearn.tree import plot_tree

# For saving models
import pickle