## Linear Regression — Complete Guide

### 1. What is Linear Regression?

Linear Regression is a supervised learning algorithm used to predict a continuous numerical value by modeling the linear relationship between input features (X) and target (y).

> Goal: Find the best-fit straight line that minimizes prediction error.

### 2. Types of Linear Regression
**Simple Linear Regression**
- One independent variable
- Equation:
\[
y = mx + b
\]

**Multiple Linear Regression**
- Multiple independent variables
$Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \dots + \beta_kX_{ik} + \epsilon_i$

### 3. Key Terminology
- Independent Variable (X) → Input / Feature
- Dependent Variable (y) → Output / Target
- Coefficient (β) → Impact of X on y
- Intercept (β₀) → Value of y when X = 0
- Residual → Actual − Predicted value

### 4. How Linear Regression Works (Intuition)
- Draw a line that best represents the trend in data
- Minimize the distance between actual and predicted points
- Uses least squares method

### 5. Cost Function (Loss Function)
Mean Squared Error (MSE)

$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2$

**Purpose:**
- Penalizes large errors
- Differentiable → easy optimization

### 6. Optimization Techniques

`1. Normal Equation`

$\mathbf{\theta} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}$

- Fast for small datasets
- Computationally expensive for large data

`2. Gradient Descent`

$\theta_{new} = \theta_{old} - \alpha \cdot \nabla J(\theta_{old})$

- Iterative optimization
- Works well for large datasets
- Requires learning rate tuning

### 7. Assumptions of Linear Regression (Very Important)
- Linearity → Linear relation between X and y
- Independence → Observations independent
- Homoscedasticity → Constant variance of errors
- Normality of errors → Residuals are normally distributed
- No multicollinearity → Features not highly correlated

> Violating assumptions → unreliable predictions

### 8. EDA Before Linear Regression
**EDA helps check:**
- Linearity (scatter plots)
- Outliers (boxplots)
- Multicollinearity (correlation heatmap)
- Skewness (distribution plots)
- Missing values

### 9 Feature Scaling (When Needed)
- Required for gradient descent
- Not required for normal equation

**Methods:**
- Standardization
- Normalization

### 10. Handling Categorical Variables
- One-Hot Encoding
- Label Encoding (rare)

> Avoid dummy variable trap

### 11. Model Evaluation Metrics
- **MAE**: Mean Absolute Error  
- **MSE**: Mean Squared Error  
- **RMSE**: Root Mean Squared Error  
- **R²**: Variance explained  

### 12. Overfitting & Underfitting
- Underfitting → Too simple model
- Overfitting → Too complex model

**Solution:**
- Add/remove features
- Regularization

### 13. Regularization (Important)
**Ridge Regression (L2)**
- Penalizes large coefficients
- Reduces overfitting

**Lasso Regression (L1)**
- Can set coefficients to zero
- Performs feature selection

**Elastic Net**
- Combination of L1 + L2

### 14. Interpreting Coefficients
- Positive β → y increases as X increases
- Negative β → y decreases as X increases
- Larger |β| → stronger impact

### 15. When NOT to Use Linear Regression
- Non-linear relationships
- High multicollinearity
- Complex interactions
- Classification problems

### 16. Common Mistakes
- Ignoring assumptions
- Not checking residuals
- Using LR for classification
- Forgetting feature scaling
- Blindly trusting R²

### 17. Interview Questions (Must-Know)
- What is linear regression?
- What are its assumptions?
- Difference between MAE & MSE?
- What is multicollinearity?
- Ridge vs Lasso?