# 🛠️ Feature Selection Guidelines

## **1️⃣ Drop Unique and Constant Features**
- **Dropped `EmployeeNumber`**:  
  - This is a unique identifier and does not contribute to predictions.  
- **Dropped `YearsSinceLastPromotion`**:  
  - This feature was removed because a new transformed feature was created using **Square Root Transformation**.

---

## **2️⃣ Checking Correlation**
- Used a **heatmap** to check for highly correlated features.  
- **No highly correlated features were found**, so no additional features were dropped.  

### **📌 Heatmap Explanation**
- A **heatmap** is a graphical representation of data where different values are represented by varying colors.
- It helps in identifying **feature relationships** and potential multicollinearity.

---

## **3️⃣ Checking for Duplicates**
- **No duplicate records** were found in the dataset.

---

## **4️⃣ Principal Component Analysis (PCA)**
- **PCA was applied to reduce dimensionality**.  
- **Original feature count:** 27 (after dropping unique and constant features).  
- **PCA Results:**  
  - **25 features were retained** as they had minimal variance loss.  
  - **2 features were dropped** as they contributed very little to the overall variance.  

### **📌 What is PCA?**
- **Principal Component Analysis (PCA)** is a technique used to reduce the number of features in a dataset.  
- It **preserves the maximum amount of information** while making the data easier to interpret.  
- Helps in **reducing overfitting** and **improving model performance**.

---

## **5️⃣ Saving Pre-Processed Data**
- After applying **feature selection**, the processed dataset was saved in a new file.  
- The **target feature** (`Attrition`) was added back to the final dataset.  

---

### ✅ **Summary**
✔ **Dropped unnecessary unique and constant features**.  
✔ **Checked correlation using a heatmap** and found **no high correlations**.  
✔ **Confirmed no duplicate records** in the dataset.  
✔ **Applied PCA** to reduce dimensions while retaining **important features**.  
✔ **Saved the final pre-processed dataset** for model training.  

