| ![salifort_logo-2.png](attachment:salifort_logo-2.png) | 
|--------------------------------------------------------|

# Salifort Motors: Predicting Employee Turnover and Improving Retention

## **Introduction**

Salifort Motors, a leader in alternative energy vehicle manufacturing, has brought me on board as a newly hired data analytics professional. As part of their data team, I am tasked with analyzing employee turnover trends to support the Human Resources department in increasing retention and reducing the high cost of attrition.

In this capstone project, I will put into practice the full range of data analytics skills I’ve developed throughout the course. This includes setting up workflows, conducting exploratory data analysis (EDA), building predictive models, and communicating key insights to stakeholders.

Using Python, I will build and evaluate both statistical and machine learning models—including logistic regression, decision trees, and random forests—to predict whether an employee is likely to leave the company. My ultimate goal is to select a champion model based on performance evaluation and provide actionable recommendations that can help Salifort Motors improve employee satisfaction and retention.

### **Company Background**

Salifort Motors is a fictional French-based company at the forefront of the alternative energy vehicle industry. With a global workforce of over 100,000 employees, Salifort is involved in every stage of the vehicle lifecycle—from research and design to production and distribution. Their innovative focus on electric, solar, algae, and hydrogen-powered vehicles has positioned them as a world leader in sustainable transportation.

Through vertical integration and a commitment to employee development, Salifort Motors aims to build a strong organizational culture. However, a recent rise in employee turnover has prompted leadership to take action. By leveraging data and predictive modeling, I will help the company identify key drivers of attrition and support efforts to foster a more engaged, satisfied, and stable workforce.

### **The Salifort Motors Scenario**

As part of Salifort Motors’ data analytics team, I have been tasked with addressing a growing concern: the high rate of employee turnover. Turnover at Salifort includes both voluntary resignations and involuntary terminations. Leadership is particularly concerned because of the high financial and operational costs associated with employee turnover (churn) trends. The company invests heavily in recruiting, onboarding, and upskilling its workforce—and losing employees means a loss of time, money, and talent.

To better understand this issue, Salifort's Human Resources department conducted an employee survey aimed at uncovering potential drivers behind the departures. Now, it’s my role as a data analytics professional to analyze that data and build a model that can help predict whether an employee is likely to leave. By identifying the key factors that contribute to turnover, the company hopes to improve employee satisfaction and retention, reduce hiring costs, and maintain a stable workforce.

### **Project Scope**

This project focuses on developing a predictive model that helps Salifort Motors proactively address employee turnover. The model will use variables such as job title, department, number of projects, average monthly hours, and other relevant features to determine the likelihood of an employee leaving the company.

The project involves the full data analytics lifecycle—starting with exploratory data analysis (EDA) to uncover patterns and trends, followed by the development and evaluation of both statistical and machine learning models. Specifically, I will build a logistic regression model and two tree-based machine learning models: decision tree and random forest. The final step will be to evaluate each model's performance and select a champion model that offers the best predictive accuracy, which can then be used to generate actionable business insights.

By integrating both EDA and model evaluation, this approach ensures the chosen solution is not only statistically sound but also actionable for business decision-makers. Insights from this project will empower HR to make informed, data-driven decisions to support a more engaged and retained workforce.

### **Business Impact**

An effective predictive model will help Salifort Motors identify employees who are at risk of leaving and understand the key reasons driving their decisions. With this knowledge, the company can implement targeted strategies to improve job satisfaction, support professional development, and foster a positive corporate culture.

Ultimately, reducing employee turnover will lead to significant cost savings by minimizing recruitment, training, and onboarding expenses. Moreover, retaining top talent will help maintain productivity and ensure continuity across teams—strengthening Salifort’s position as a leader in the alternative energy vehicle industry.

## **Overview**

### **Capstone Project Overview**

![process.png](attachment:process.png)

The **Google Advanced Data Analytics Capstone Project** serves as the integrative experience of the certificate program, enabling me to apply the full range of skills and knowledge acquired across all previous courses. As seen in the infographic above, the capstone follows a path—starting from certification, progressing through a real-world scenario-based project, and ultimately equipping me to apply these skills in the real-world projects I’m currently working on, as well as in future personal and professional projects.

Throughout this project, I had the opportunity to:

- Gather information related to a real business problem centered on employee retention.  
- Answer data-centric questions using **Python programming**.  
- Conduct thorough **advanced exploratory data analysis (EDA)**.  
- Build and evaluate **predictive models**, including **logistic regression** and machine learning algorithms like **decision trees** and **random forests**.  
- Reflect on and consider ethical implications in data handling and model deployment.  
- Communicate insights in a clear, professional manner to a general audience of stakeholders.  

This capstone not only provided valuable hands-on experience, but it demonstrates my ability to approach data projects holistically—from identifying the problem and analyzing the data, to building models and presenting results effectively.

### **Project Methodology and Documentation Strategy**

- This project strictly follows the **PACE (Plan, Analyze, Construct, Execute) framework** from its foundation. Each stage of the project is structured to align with the PACE methodology, which is described in further detail in the **Overview – Project Stages Overview** section.

- **Executive Summary**: At both the start and the end of this project, an **executive summary** is provided in the **Important Documents** section. This one-page summary are designed to **communicate essential insights** and **project milestones** to stakeholders at Salifort Motors. They ensure that cross-functional and leadership team members are kept up to date—especially those with limited time to review the complete analysis.

- **PACE Strategy Document**: At the beginning and end of each stage, the stage-specific PACE strategy document is linked.
Additionally, a **comprehensive PACE strategy document**—which includes all stage-specific PACE documents—is provided in the **Important Documents** section at both the beginning and end of the project. These documents outline my structured approach to each stage and address the key questions necessary for progressing through the project. The **Data Project Questions & Considerations** section within the strategy document is used to deepen analytical thinking and guide all decisions and actions in the current stage. Completion of these documents are essential prior to drafting the executive summary, as it ensures a coherent and concise communication of insights.

- Each stage includes the following sequence:
  - **Execute the defined tasks** outlined for the project stage in the main notebook.
  - **Complete the PACE strategy document** to clearly define the stage’s approach and reflect on important considerations.
  - **Create the executive summary** to share findings, analysis, and recommendations with project stakeholders and collaborators.

This structured approach helps ensure **strong project management, thoughtful problem-solving, and effective communication** throughout the employee retention modeling project at Salifort Motors.

## **Stakeholders & Team Members**

**Salifort Motors – Core Stakeholders:**

- **Senior Leadership Team**
  The primary audience for this analysis, they initiated the project due to increasing concern over the rising rate of employee turnover. They have tasked me with analyzing employee data to uncover actionable insights and design a predictive model that identifies employees at risk of departure. Their strategic decisions, based on the model’s insights, will directly influence organizational policies and employee engagement strategies aimed at improving retention and supporting long-term growth.
 
- **HR Department**  
  The Human Resources team is a key collaborator in this project. They provided the dataset, collected through employee surveys, and now seek the expertise of the data analytics team to interpret the results and recommend next steps. As the team responsible for employee satisfaction initiatives, HR will play a central role in validating model outcomes, executing recommended actions, and tracking and reporting on the impact of these interventions on retention. Their partnership ensures that insights are actionable, practical, and aligned with organizational goals.

- **Team Managers**  
  Managers are crucial for applying model insights in daily operations and validating data context. They bring valuable perspectives on employee engagement, stress levels, and morale within their teams. Their feedback helps confirm the timing and accuracy of key variables, ensuring that data used in the model was available before any employee decided to leave or was flagged for termination. This is essential for preventing data leakage and building a robust, predictive model. Managers will also use the findings to tailor interventions that foster better team dynamics and improve employee satisfaction.

### **Effective Communication in Each Stage**

Each stage of the capstone project not only emphasizes technical skill but also sharpens essential **data communication and project management abilities**. Throughout the project, I will:

- **Ask questions** to clarify goals, expectations, and available resources.  
- **Share updates** through well-timed executive summaries for alignment with stakeholders.  
- **Communicate analysis clearly** to both technical and non-technical audiences.  
- **Receive and incorporate feedback** from stakeholders to refine models and strategy.  
- **Foster collaboration** with cross-functional teams to maintain momentum and improve outcomes.

By maintaining strong communication and adhering to the PACE methodology, this project will deliver both analytical rigor and practical value to Salifort Motors’ retention strategy.

## **Project Stages Overview**  

![stages.png](attachment:stages.png)

- **Plan Stage:** Define the scope and objectives of the employee attrition project for Salifort Motors. This includes identifying the informational needs of the HR department and senior leadership, developing a clear project workflow, and outlining key questions to be answered through data analysis and modeling.  

- **Analyze Stage:** Gather and explore the dataset provided by HR, which includes employee-related information such as department, average monthly hours, number of projects, and more. Conduct data cleaning, formatting, and exploratory data analysis (EDA) to understand trends and identify variables that may influence employee turnover.

- **Construct Stage:** Build and evaluate a series of predictive models, beginning with logistic regression as a baseline and advancing to machine learning models such as decision tree, random forest, and XGBoost. Select modeling approaches based on performance metrics and interpretability, ensuring that the data used does not include any post-attrition information to avoid data leakage.

- **Execute Stage:** Present findings and recommendations to stakeholders through visualizations and an executive summary. Summarize the benefits and limitations of each model, explain key drivers of attrition identified in the analysis, and provide actionable next steps for the HR department to improve employee satisfaction and retention. Stakeholder feedback will be incorporated into the final presentation.

Each stage follows the **PACE framework** and includes corresponding **PACE strategy documents, stakeholder communications, and an executive summary** to ensure a clear, structured, and well-documented approach.

## **Project Structure**  
This project is divided into two logical parts to streamline the workflow from initial planning and data exploration through model construction and stakeholder reporting.

| Part      | Stages                                                                 |
|-----------|------------------------------------------------------------------------|
| **Part 1**| - **Plan Stage**<br> - **Analyze Stage**                               |
| **Part 2**| - **Construct Stage**<br> - **Execute Stage**                         |

The division also ensures optimal performance when viewing the notebook on GitHub, as the original single notebook was becoming too long.

In **Part 1**, I focused on laying the groundwork for the employee turnover project by defining objectives, aligning stakeholders, and conducting exploratory data analysis. This stage established a foundation for understanding data quality issues, baseline attrition rates, and key relationships between workplace factors and turnover risk.

![stages_part1.png](attachment:stages_part1.png)

- **Plan Stage – Define Scope & Goals:** Identified HR, senior leadership, and managers as key stakeholders; defined the problem of predicting turnover for proactive retention; and performed initial data exploration to assess quality and bias.

- **Analyze Stage – Exploratory Data Analysis:** Cleaned and formatted the dataset; visualized distributions and correlations (e.g., satisfaction vs. attrition, workload vs. performance); and uncovered insights such as higher turnover in specific departments and a link between low satisfaction or high hours and increased attrition.

For the **Part 2** of the project, refer to:

[Salifort_Motors_Turnover_Part2_Construct_and_Execute_Includes_Modeling.ipynb](https://github.com/Cyberoctane29/Salifort-Motors-Predicting-Employee-Turnover-and-Improving-Retention/blob/main/Project%20Parts/Salifort_Motors_Turnover_Part2_Construct_and_Execute_Includes_Modeling.ipynb):

https://github.com/Cyberoctane29/Salifort-Motors-Predicting-Employee-Turnover-and-Improving-Retention/blob/main/Project%20Parts/Salifort_Motors_Turnover_Part2_Construct_and_Execute_Includes_Modeling.ipynb

### **Understanding the PACE Framework**
<img src="https://i.ibb.co/tpvBbjDJ/PACE.png" alt="PACE_workflow" align=left style="margin-right: 15px;">

The **PACE** framework (**Plan, Analyze, Construct, Execute**) provides a structured yet flexible approach to managing data analysis projects. It ensures clear organization, facilitates communication, and supports an iterative workflow by allowing revisiting of previous stages without disruption. Communication is essential throughout PACE, much like electricity flowing through a circuit, enabling continuous collaboration—whether through asking questions, gathering data, updating stakeholders, or presenting findings and receiving feedback.

#### **Importance of Workflow Structures** 
Large-scale data projects require structured workflows to manage tasks efficiently. Identifying potential blockers early in the process allows for better resource planning and minimizes disruptions. A well-defined workflow promotes **efficiency, collaboration, and streamlined decision-making**.

#### **Applying PACE to This Project**  
The **project as a whole is divided into four primary stages** based on the **PACE framework**: **Plan, Analyze, Construct, and Execute**. Within each of these main stages, the work is further organized into one or more **milestones** to provide a clear, structured progression through the project. Each **milestone task** is then individually categorized into one or more **PACE stages**, as outlined in the **project proposal**. 

This means that while a milestone exists within a specific main project stage, its component tasks may align with multiple PACE stages to reflect the nuanced, iterative nature of real-world data analytics projects. This structure ensures a balanced, organized workflow that maintains alignment with both the **PACE methodology** and the specific requirements of the **Salifort Motors Employee Turnover Prediction Project**.

For example, **Milestone 1 of the Plan Stage** primarily falls under the **Plan** stage. To visually indicate the primary **PACE stage**, an **image representing the corresponding stage** is placed at the beginning of each stage — for instance, the **Plan** image appears at the start of the **Plan Stage**. Since this milestone’s tasks align exclusively with the **Plan** stage, its milestone heading is followed by a note indicating the appropriate PACE stage(s) it belongs to. As outlined in the **project proposal available in the Important Documents section**, each milestone is clearly mapped to its corresponding **PACE stage(s)**, maintaining a consistent, organized, and methodical approach throughout the entire project.

### **Overview of PACE Stages**  

Let’s take a closer look at each stage of the PACE model, along with the images that represent each stage:

<img src="https://i.ibb.co/TxW1pmpM/Plan.png" alt="Plan" align=left style="margin-right: 15px;">

#### **Plan**  


This stage establishes a solid foundation by defining the project scope, gathering requirements, and setting objectives. Key activities include:  
- Researching business data  
- Defining project scope  
- Developing a workflow  
- Assessing stakeholder needs  

<img src="https://i.ibb.co/bjR9hZ07/Analyze.png" alt="Plan" align=left style="margin-right: 15px;">

#### **Analyze**  

Here, data is acquired, cleaned, and explored through **exploratory data analysis (EDA)**. Key activities include:  
- Data collection and formatting  
- Handling missing values and inconsistencies  
- Performing initial statistical analysis  

<img src="https://i.ibb.co/fgz5hQq/Construct.png" alt="Construct" align=left style="margin-right: 15px;">

#### **Construct**

This stage focuses on building and refining models, often incorporating machine learning techniques. Key activities include:  
- Selecting modeling approaches  
- Building and training models  
- Evaluating model performance  

<img src="https://i.ibb.co/rfcQGQ6s/Execute.png" alt="Execute" align=left style="margin-right: 15px;">

#### **Execute**  

Findings are communicated to stakeholders, incorporating feedback and refining outputs. Key activities include:  
- Presenting results  
- Addressing stakeholder feedback  
- Finalizing reports  

#### **Communication in PACE**  
Effective **communication** is essential across all PACE stages. Whether clarifying project goals, presenting findings, or incorporating feedback, ongoing dialogue ensures alignment and enhances decision-making.

#### **Adaptability of PACE**  
While PACE is presented sequentially, real-world projects require flexibility. It’s common to revisit earlier stages as new insights emerge. This adaptability prepares professionals for dynamic, evolving data projects.

By following the **PACE framework**, I will structure this project efficiently, ensuring each stage progresses smoothly while maintaining the flexibility to adapt as needed.

## **Important Documents**

**Project Proposal**

A structured document that organizes tasks into clear milestones, maps them to the PACE framework, and outlines stakeholder roles and key deliverables for the project.

- [Project Proposal](https://github.com/Cyberoctane29/Salifort-Motors-Predicting-Employee-Turnover-and-Improving-Retention/blob/main/Salifort%20Motors%20Predicting%20Employee%20Turnover%20and%20Improving%20Retention%20Documents/Salifort%20Motors%20Employee%20Turnover%20Prediction%20Project%20Proposal.pdf)

https://github.com/Cyberoctane29/Salifort-Motors-Predicting-Employee-Turnover-and-Improving-Retention/blob/main/Salifort%20Motors%20Predicting%20Employee%20Turnover%20and%20Improving%20Retention%20Documents/Salifort%20Motors%20Employee%20Turnover%20Prediction%20Project%20Proposal.pdf

**Executive Summary**

A one-page document summarizing key insights, findings, and milestones for Salifort Motors, keeping leadership and teams aligned on actionable outcomes.

- [Executive Summary](https://github.com/Cyberoctane29/Salifort-Motors-Predicting-Employee-Turnover-and-Improving-Retention/blob/main/Salifort%20Motors%20Predicting%20Employee%20Turnover%20and%20Improving%20Retention%20Documents/Salifort%20Project%20-%20Executive%20summary.pdf)

https://github.com/Cyberoctane29/Salifort-Motors-Predicting-Employee-Turnover-and-Improving-Retention/blob/main/Salifort%20Motors%20Predicting%20Employee%20Turnover%20and%20Improving%20Retention%20Documents/Salifort%20Project%20-%20Executive%20summary.pdf

**Pace Strategy Document**

A consolidated document outlining my structured approach, the key project questions, and my decisions at each stage to ensure clear, objective-driven progress.

- [All Stages - Pace Strategy Document](https://github.com/Cyberoctane29/Salifort-Motors-Predicting-Employee-Turnover-and-Improving-Retention/blob/main/Salifort%20Motors%20Predicting%20Employee%20Turnover%20and%20Improving%20Retention%20Documents/All%20Stages%20-%20PACE%20strategy%20document.pdf)

https://github.com/Cyberoctane29/Salifort-Motors-Predicting-Employee-Turnover-and-Improving-Retention/blob/main/Salifort%20Motors%20Predicting%20Employee%20Turnover%20and%20Improving%20Retention%20Documents/All%20Stages%20-%20PACE%20strategy%20document.pdf

## **Dataset Structure**

This dataset, titled **HR_capstone_dataset.csv**, contains information on **15,000 employees** at a multinational vehicle manufacturing company. Each row represents a unique employee and captures their **attrition status** through the `left` variable. It includes **performance and workload indicators** such as `last_evaluation` (performance score), `number_project` (projects handled), and `average_monthly_hours` (average hours worked per month). **Tenure information** is provided via `time_spend_company` (years at the company). The dataset also includes **binary events**, such as `work_accident` and `promotion_last_5years`, and **categorical attributes** like `department` (functional area) and `salary` (low, medium, high), representing the employee's pay scale.

| Column Name             | Type   | Description                                                             |
|-------------------------|--------|-------------------------------------------------------------------------|
| satisfaction_level      | int64  | The employee’s self-reported satisfaction level [0–1]                   |
| last_evaluation         | int64  | Score of employee's last performance review [0–1]                       |
| number_project          | int64  | Number of projects employee contributes to                              |
| average_monthly_hours   | int64  | Average number of hours employee worked per month                       |
| time_spend_company      | int64  | How long the employee has been with the company (in years)              |
| work_accident           | int64  | Whether or not the employee experienced an accident while at work       |
| left                    | int64  | Whether or not the employee left the company                            |
| promotion_last_5years   | int64  | Whether or not the employee was promoted in the last 5 years            |
| department              | str    | The employee's department                                               |
| salary                  | str    | The employee's salary (low, medium, or high)                            |

This dataset offers a structured view of employee characteristics and outcomes, forming the foundation for analysis and predictive modeling in the employee turnover project.

---

# **Part 1: Plan and Analyze**  
---

<img src="https://i.ibb.co/TxW1pmpM/Plan.png" alt="Plan" align=left style="margin-right: 15px;">

# **Plan Stage**

### **Introduction**

In the **Plan stage**, as a data analyst on **Salifort Motors’ data team**, I will develop a **project proposal** based on the business challenge presented by the leadership team: predicting employee turnover and identifying actionable insights to improve retention. I will define **realistic milestones** for the required data analytical tasks to ensure a structured and methodical approach to the **employee turnover prediction project**. I will complete both the **Salifort Motors Employee Turnover Project Proposal** and the **Plan - PACE strategy document** to formally launch the project in an organized, thoughtful manner.

### **Task**

For this first task, I will create a **project proposal** that outlines clear **milestones** for the employee turnover prediction project. While planning the deliverable, I will carefully consider how each milestone and its associated tasks align with the **PACE (Plan, Analyze, Construct, Execute)** framework, as detailed in the **Project Stages Overview**. Each milestone task will be categorized into one or more PACE stages, as specified in the project proposal.

This initial stage will set the foundation for the project by establishing a clear, structured roadmap for efficient project management and data analysis execution.

### **Overview**

In this stage, I will demonstrate my understanding of a complete, structured data analytics workflow by developing a **project proposal**. This proposal will outline essential tasks, expected deliverables, and project milestones to guide the **Salifort Motors Employee Turnover Prediction Project**. To support a structured, transparent approach, I will also complete the **PACE strategy document**—a resource designed to document my task breakdown, strategic reasoning, and reflections as I work through each stage of the project.

### **Project Background**

**Salifort Motors’ data team** is in the early planning stages of the **employee turnover prediction project**. Before beginning any data analysis, the team requires:

- A detailed, structured **project proposal** that:
  - Divides project tasks into clearly defined **milestones**.
  - Categorizes each task according to the **PACE framework**.
  - Outlines stakeholder roles and expectations throughout the project.

This plan stage will ensure that the project proceeds in a logical, well-documented, and efficient manner while maintaining alignment with organizational goals and team communication best practices.

### **Stage 1 Tasks**
- Assign **PACE stages** to each requested task in the employee turnover prediction project.
- Organize project tasks into realistic, manageable **milestones**.
- Develop a **formal project proposal** for the Salifort Motors data team and leadership.


### **Stage 1 Deliverables**

This stage will allow me to apply essential **project planning** and communication skills by completing:

- **Project Proposal** – A comprehensive document outlining the project’s core tasks, milestones, deliverables, and stakeholder considerations.
- **Plan - PACE Strategy Document** – A structured planning tool that captures my task approach, decision rationale, and insights at each stage of the turnover prediction project.

### **Review the PACE Strategy Document**

📎 [Plan – PACE Strategy Document](https://github.com/Cyberoctane29/Salifort-Motors-Predicting-Employee-Turnover-and-Improving-Retention/blob/main/Salifort%20Motors%20Predicting%20Employee%20Turnover%20and%20Improving%20Retention%20Documents/Plan%20Stage%20-%20PACE%20Strategy%20Document.pdf)

https://github.com/Cyberoctane29/Salifort-Motors-Predicting-Employee-Turnover-and-Improving-Retention/blob/main/Salifort%20Motors%20Predicting%20Employee%20Turnover%20and%20Improving%20Retention%20Documents/Plan%20Stage%20-%20PACE%20Strategy%20Document.pdf

## Milestone 1: Understand the business scenario, define the problem, and prepare the project proposal - Plan