## Understanding the Project Scenario

**Problem Statement:**
Salifort Motors is facing a high employee turnover rate, leading to increased costs and decreased productivity. The leadership team seeks to understand the underlying factors contributing to turnover and develop strategies to improve retention.

**Objective:**
To build a predictive model that can accurately predict whether an employee will leave the company based on various factors such as job title, department, number of projects, average monthly hours, and other relevant data points.

**Data:**
* Employee survey data: Likely includes demographic information, job satisfaction, work-life balance, compensation, etc.
* Relevant variables: Consider factors like department, number of projects, average monthly hours, tenure, and potentially others.

**Model Approach:**
* **Statistical Model:** Logistic regression could be a suitable choice due to its ability to handle binary outcomes (leave or stay).
* **Machine Learning Models:** Decision trees, random forests, and XGBoost are potential candidates for their ability to handle complex relationships and potentially improve predictive accuracy.

**Evaluation:**
* Use appropriate metrics such as accuracy, precision, recall, F1-score, and AUC-ROC to assess the model's performance.

**Recommendations:**
* Based on the model's findings, identify key factors driving turnover.
* Propose strategies to address these factors and improve employee retention.

By following these steps, I can effectively address the employee turnover issue at Salifort Motors and contribute to the company's success.


## PACE Strategy Table for the Salifort Motors Project

| Milestone | Task | PACE Stage |
|---|---|---|
| **Data Acquisition and Exploration** | Collect employee survey data | Plan |
| | Clean and preprocess data | Do |
| | Explore data relationships and distributions | Check |
| | Identify relevant variables | Act |
| **Model Development and Selection** | Build logistic regression model | Plan |
| | Build decision tree, random forest, and XGBoost models | Do |
| | Evaluate model performance using appropriate metrics | Check |
| | Select the best-performing model | Act |
| **Model Interpretation and Insights** | Analyze model coefficients or feature importance | Plan |
| | Identify key factors driving turnover | Do |
| | Generate actionable recommendations | Check |
| | Communicate findings to leadership | Act |
| **Model Deployment and Monitoring** | Deploy model into production environment | Plan |
| | Monitor model performance and retrain as needed | Do |
| | Continuously evaluate and refine the model | Check |
| | Provide ongoing insights to leadership | Act |



## Analyzing the Salifort Motors Employee Data

### Data Understanding

**Dataset:** HR_capstone_dataset.csv

**Rows:** 14,999 (representing individual employees)

**Columns:** 10 (containing various employee attributes)

**Column Descriptions:**

| Column Name | Type | Description |
|---|---|---|
| satisfaction_level | int64 | Self-reported satisfaction level (0-1) |
| last_evaluation | int64 | Score of last performance review (0-1) |
| number_project | int64 | Number of projects contributed to |
| average_monthly_hours | int64 | Average monthly working hours |
| time_spend_company | int64 | Years with the company |
| work_accident | int64 | Whether an accident occurred |
| left | int64 | Whether the employee left the company |
| promotion_last_5years | int64 | Whether promoted in the last 5 years |
| department | str | Employee's department |
| salary | str | Salary level (low, medium, high) |

### Initial Observations

* **Target Variable:** `left` (binary indicating employee attrition)
* **Predictor Variables:** `satisfaction_level`, `last_evaluation`, `number_project`, `average_monthly_hours`, `time_spend_company`, `work_accident`, `promotion_last_5years`, `department`, and `salary`
* **Data Types:** Most variables are numerical (int64), while `department` and `salary` are categorical.

### Potential Relationships and Hypotheses

Based on the data, we can explore the following relationships and hypotheses:

* **Satisfaction and Attrition:** Employees with lower satisfaction levels may be more likely to leave.
* **Performance and Attrition:** Employees with poor performance reviews or excessive workload might be more likely to leave.
* **Tenure and Attrition:** Employees with shorter tenures may be more likely to leave due to lack of commitment or fit.
* **Promotions and Attrition:** Employees who feel undervalued or lack opportunities for growth may be more likely to leave.
* **Work-Life Balance and Attrition:** Employees with excessive working hours or poor work-life balance may be more likely to leave.
* **Department and Attrition:** Certain departments or roles might have higher turnover rates.
* **Salary and Attrition:** Employees who feel underpaid or dissatisfied with their compensation may be more likely to leave.

### Next Steps

1. **Data Cleaning and Preprocessing:**
   * Handle missing values (if any).
   * Check for outliers and inconsistencies.
   * Convert categorical variables (department, salary) to numerical format (e.g., one-hot encoding).

2. **Exploratory Data Analysis (EDA):**
   * Visualize the distribution of variables (histograms, box plots).
   * Calculate summary statistics (mean, median, mode, standard deviation).
   * Explore correlations between variables.

3. **Feature Engineering:**
   * Consider creating new features based on existing variables (e.g., calculate a work-life balance index).

4. **Model Building and Evaluation:**
   * Build and evaluate various models (logistic regression, decision trees, random forests, XGBoost).
   * Use appropriate metrics (accuracy, precision, recall, F1-score, AUC-ROC) to assess model performance.

5. **Interpretation and Recommendations:**
   * Analyze the model's results to identify key factors influencing attrition.
   * Provide actionable recommendations to improve employee retention.

