
# Interview Questions and Answers



## 1. What is Normalization & Standardization and how is it helpful?

**Normalization** and **Standardization** are two techniques used to rescale numerical features in a dataset:

- **Normalization** (Min-Max Scaling) transforms data to a fixed range, typically [0,1] or [-1,1], using the formula:
  \[
  X_{norm} = rac{X - X_{min}}{X_{max} - X_{min}}
  \]
  This is useful when features have different scales and when algorithms (e.g., k-NN, neural networks) assume data in a bounded range.

- **Standardization** (Z-score Scaling) rescales data to have a mean of 0 and a standard deviation of 1:
  \[
  X_{std} = rac{X - \mu}{\sigma}
  \]
  This is useful for algorithms that assume normally distributed data, like linear regression and PCA.

**Benefits:**
- Improves numerical stability of models
- Ensures fair weightage for different features
- Enhances performance of gradient-based optimizers


In [None]:

print("Normalization and Standardization are two techniques used to rescale numerical features.")
print("
Normalization (Min-Max Scaling) transforms data to a fixed range, typically [0,1] or [-1,1].")
print("Formula: X_norm = (X - X_min) / (X_max - X_min)")
print("
Standardization (Z-score Scaling) rescales data to have a mean of 0 and a standard deviation of 1.")
print("Formula: X_std = (X - mean) / std_dev")
print("
Benefits:
- Improves numerical stability of models
- Ensures fair weightage for different features
- Enhances performance of gradient-based optimizers")



## 2. What techniques can be used to address multicollinearity in multiple linear regression?

Multicollinearity occurs when independent variables are highly correlated, which can distort coefficient estimates in multiple linear regression.

**Techniques to address multicollinearity:**

1. **Variance Inflation Factor (VIF):** Compute VIF for each predictor and remove variables with high VIF (>10).
2. **Correlation Matrix:** Identify highly correlated pairs and drop or combine them.
3. **Principal Component Analysis (PCA):** Transform correlated features into uncorrelated principal components.
4. **Feature Selection:** Use stepwise regression, Lasso regression, or domain knowledge to select key predictors.
5. **Remove Redundant Variables:** Drop one of the correlated variables manually if the information overlap is high.
6. **Centering the Variables:** Mean-centering variables (subtracting the mean) can sometimes reduce collinearity effects.

These techniques help in improving the stability and interpretability of the regression model.
