- Normalization rescales all values into a range, typically [0, 1].
It is also known as min-max scaling.

#### When to use Normalization:
- When you know the distribution is not Gaussian (not normal).
    - It doesn’t assume any shape for the distribution.
    - It simply rescales based on the min and max, which works well for bounded input features.
    - It keeps the relative distances between data points intact, regardless of skewness.

- When you want to preserve the shape of the original distribution but just rescale it.
- When algorithms rely on distance measurements, like:
    - K-Nearest Neighbors (KNN)
    - Neural Networks
    - Support Vector Machines (SVM)

- These models are sensitive to absolute magnitude differences.


| Case                                                            | What Happens                                                     | Better Choice       |
| --------------------------------------------------------------- | ---------------------------------------------------------------- | ------------------- |
| Data is bounded or not Gaussian (skewed, uniform)               | Min–max scaling keeps shape and bounds consistent                | **Normalization**   |
| Data is roughly Gaussian (bell-shaped)                          | Z-score scaling makes the distribution centered and standardized | **Standardization** |
| Neural networks (especially sigmoid/tanh)                       | Activations work best when inputs ∈ [0,1] or [-1,1]              | **Normalization**   |
| Algorithms assuming normality (Linear/Logistic Regression, PCA) | Centering helps model the distribution                           | **Standardization** |


#### Standardization

- Standardization transforms the data so it has a mean = 0 and standard deviation = 1.
- Also known as Z-score scaling

- When to use Standardization:
   - When your data follows a normal or near-normal distribution.
   - When you use algorithms that assume normality, such as:
      - Linear Regression
      - Logistic Regression
      - Principal Component Analysis (PCA)
      - Linear Discriminant Analysis (LDA)

| Feature             | Nature                                                       | Recommended Scaling | Reason                                                            |
| ------------------- | ------------------------------------------------------------ | ------------------- | ----------------------------------------------------------------- |
| **Age**             | Normally distributed (roughly linear relationship with time) | **Standardization** | Better for algorithms assuming normality (e.g., regression, PCA). |
| **Salary**          | Positively skewed (usually large variations)                 | **Normalization**   | Neural networks or KNN benefit from bounded scale (0–1).          |
| **YearsExperience** | Linear and small range                                       | **Standardization** | Range is limited; good for models assuming Gaussian input.        |
| **CreditScore**     | Typically bounded (300–850)                                  | **Normalization**   | Already limited range; normalization maintains interpretability.  |
