
# Encoding
| Type            | When to Use                 |
| --------------- | --------------------------- |
| Label / Ordinal | Ordered categories          |
| One-Hot         | No order (Linear, SVM, KNN) |
| Target Encoding | High-cardinality + Boosting |
| Frequency       | Many categories             |


# Scaling
| Method         | Use With                   |
| -------------- | -------------------------- |
| StandardScaler | Linear, Logistic, SVM, PCA |
| MinMaxScaler   | KNN, Neural Networks       |
| RobustScaler   | Outliers                   |
| No Scaling     | Tree, RF, XGBoost          |

# Feature Selection
| Method     | Example                 |
| ---------- | ----------------------- |
| Filter     | Correlation, Chi-square |
| Wrapper    | RFE                     |
| Embedded   | L1, Tree importance     |



# Feature Engineering Guide

Feature Engineering is often more important than the model selection itself. It involves transforming raw data into a format that best represents the underlying problem to the predictive models.

---

## 1. Encoding (Categorical $\rightarrow$ Numeric)

**Why Encoding?**
Machine Learning models require numerical input. They cannot perform matrix multiplication on strings like "Red" or "Blue".

### Types of Encoding

#### A. Label Encoding / Ordinal Encoding
Assigns a unique integer to each category.
* **Mapping:** $\text{Red} \rightarrow 0, \quad \text{Blue} \rightarrow 1, \quad \text{Green} \rightarrow 2$
* **Use When:** Categories have an inherent **order** (e.g., Low $<$ Medium $<$ High).
* **Warning:** If used on non-ordered data (nominal), the model might infer false relationships (e.g., assuming $2 > 1$, so Green is "greater" than Blue).

#### B. One-Hot Encoding (The Standard )
Creates a new binary column for each category.
* **Mapping:**
$$
\begin{matrix}
\text{Color} & \rightarrow & \text{Is\_Red} & \text{Is\_Blue} & \text{Is\_Green} \\
\text{Red} & \rightarrow & 1 & 0 & 0 \\
\text{Blue} & \rightarrow & 0 & 1 & 0
\end{matrix}
$$
* **Use When:** Nominal data (No natural order). Essential for **Linear Models, Logistic Regression, SVM, KNN**.
* **Warning:** Causes **High Dimensionality** (Curse of Dimensionality) if the cardinality is high.



#### C. Target Encoding (Mean Encoding)
Replaces the category with the average target value for that category.
$$Value(c) = \text{mean}(y \mid x = c)$$
* **Use When:** High-cardinality features (e.g., Zip Codes, Product IDs).
* **Warning:** High risk of **Data Leakage**. Must be computed within Cross-Validation folds.

#### D. Frequency / Count Encoding
Replaces the category with the count of times it appears in the dataset.
* **Use When:** High cardinality; Tree-based models (XGBoost, LightGBM).

---

## 2. Scaling (Feature Magnitude Control)

**Why Scaling?**
Models based on **Distance** (KNN, SVM) or **Gradient Descent** (Linear, Logistic, Neural Networks) are sensitive to the scale of input features.
* *Example:* Age ($0-100$) vs. Salary ($10k-1M$). Salary will dominate the gradients/distances.

### Types of Scaling

#### A. Standardization (Z-Score Normalization) 
Centers data around 0 with a standard deviation of 1.
$$x' = \frac{x - \mu}{\sigma}$$
* **Use When:** Linear Regression, Logistic Regression, SVM, PCA, Neural Networks.
* **Assumption:** Assumes data is roughly Gaussian (Bell Curve).

#### B. Min-Max Scaling (Normalization)
Squishes data into a fixed range, usually $[0, 1]$.
$$x' = \frac{x - x_{min}}{x_{max} - x_{min}}$$
* **Use When:** Neural Networks, Image Data (pixels), KNN.
* **Warning:** Very sensitive to **outliers**. A single large outlier squashes all other data to $0$.



#### C. Robust Scaling
Scales using median and quantiles (Interquartile Range).
$$x' = \frac{x - \text{median}}{IQR}$$
* **Use When:** The dataset contains significant **outliers**.

###  No Scaling Needed For:
* **Tree-Based Models:** Decision Trees, Random Forests, XGBoost, LightGBM. (Splits are based on ordering, not distance).

---

## 3. Feature Selection (Keep Useful Features)

**Why?** reduces overfitting, speeds up training, and improves interpretability.

### Methods

1.  **Filter Methods (Pre-processing):**
    * Statistical tests done *before* the model runs.
    * **Techniques:** Correlation Matrix, Chi-Square ($\chi^2$), ANOVA, Mutual Information.
    * *Pros:* Fast, model-agnostic.

2.  **Wrapper Methods (Iterative):**
    * Trains the model multiple times with different subsets.
    * **Techniques:** Recursive Feature Elimination (RFE), Forward Selection, Backward Elimination.
    * *Pros:* Very accurate. *Cons:* Computationally expensive/slow.

3.  **Embedded Methods (During Training) :**
    * The model selects features as part of the learning process.
    * **Techniques:** Lasso (L1 Regularization), Ridge (L2), Tree Feature Importance.
    * *Pros:* Best balance of accuracy and speed.

---

##  The Golden Table: When to Use What

| Model | Encoding | Scaling | Feature Selection |
| :--- | :--- | :--- | :--- |
| **Linear Regression** | One-Hot | **Standard** | L1 (Lasso) / Correlation |
| **Logistic Regression** | One-Hot | **Standard** | L1 / L2 / RFE |
| **SVM** | One-Hot | **Standard** | RFE / PCA |
| **KNN** | One-Hot | **MinMax** | Correlation (Critical) |
| **Neural Networks** | One-Hot / Embedding | **Standard / MinMax** | Dropout / L1 / L2 |
| **Decision Trees** | Label / Ordinal |  None | Gini / Entropy |
| **Random Forest** | Label / Ordinal |  None | Feature Importance |
| **Boosting (XGB)** | Target / Frequency |  None | Feature Importance |