# Regression Learning

### 1. What is Regression?
- Regression is a supervised learning technique used to predict a continuous numeric value from one or more input features.
- Example: 
- 
| Input: Years of Experience | Output: Salary (\$) |
| -------------------------- | ------------------- |
| 1                          | 30,000              |
| 3                          | 45,000              |
| 5                          | 60,000              |

- Here, we predict salary based on experience — a continuous value.

### 2. Mathematical Foundation of Regression
- Assume a training dataset:
  D={(X1,Y1),(X2, Y2),...,(Xn, Yn)}
- Where: 
- xi ∈ Rd: input features
- yi ∈ R: continuous output/label
- We want to find a function f(x) that minimizes the prediction error: 
![image.png](attachment:image.png)

### Type Of Regression Algorithm 

| **Type**                                | **Description**                                                                                       | **Use Case**                                             | **Key Assumptions**                                                              |
| --------------------------------------- | ----------------------------------------------------------------------------------------------------- | -------------------------------------------------------- | -------------------------------------------------------------------------------- |
| **1. Linear Regression**                | Predicts a continuous dependent variable using one (simple) or more (multiple) independent variables. | House price prediction based on size, location, etc.     | Linearity, homoscedasticity, independence, normality of residuals                |
| **2. Logistic Regression**              | Used for binary classification problems (output is categorical: 0 or 1).                              | Spam vs. not spam, disease vs. no disease                | No multicollinearity, large sample size, linear relationship with logit function |
| **3. Polynomial Regression**            | Extends linear regression by fitting a polynomial equation to the data.                               | Modeling growth curves, price trends                     | Same as linear regression but allows curved relationships                        |
| **4. Ridge Regression**                 | Linear regression with L2 regularization to reduce model complexity and multicollinearity.            | When data has many features or multicollinearity         | Features are standardized; helps avoid overfitting                               |
| **5. Lasso Regression**                 | Linear regression with L1 regularization; can reduce coefficients to zero.                            | Feature selection and sparsity                           | Ideal when you suspect many variables are irrelevant                             |
| **6. Elastic Net Regression**           | Combines Ridge and Lasso; balances between L1 and L2 penalties.                                       | Useful when there are multiple correlated features       | Best when you want both feature selection and multicollinearity handling         |
| **7. Stepwise Regression**              | Automatically selects variables by iteratively adding/removing predictors.                            | Automated model building for complex datasets            | Can be biased; assumes linear relationship                                       |
| **8. Quantile Regression**              | Predicts conditional quantiles (like median) of the response variable.                                | Income analysis, predicting medians or percentiles       | Doesn’t assume constant variance (heteroscedasticity-friendly)                   |
| **9. Poisson Regression**               | Used when dependent variable is count data.                                                           | Modeling number of calls to a call center, disease cases | Assumes the dependent variable follows a Poisson distribution                    |
| **10. Ordinal Regression**              | Targets dependent variables with ordered categories.                                                  | Satisfaction ratings (e.g., poor to excellent)           | Dependent variable must be ordinal                                               |
| **11. Multinomial Regression**          | Extends logistic regression for multi-class classification problems.                                  | Predicting types of cuisine based on ingredients         | Independent variables should not be highly correlated                            |
| **12. Robust Regression**               | Designed to be resistant to outliers.                                                                 | Data with many outliers or anomalies                     | Fewer assumptions; robust to violations of normality                             |
| **13. Bayesian Regression**             | Uses Bayes' theorem to estimate distribution of model parameters.                                     | Probabilistic modeling and decision making               | Requires prior distributions; suitable for uncertainty quantification            |
| **14. Nonlinear Regression**            | Models relationships that are not linear in nature.                                                   | Drug dose-response modeling, enzyme kinetics             | Must specify a correct nonlinear function                                        |
| **15. Support Vector Regression (SVR)** | Uses Support Vector Machines to fit regression lines with margins.                                    | High-dimensional, non-linear data                        | Works well with kernel functions                                                 |
| **16. Decision Tree Regression**        | Uses decision trees for modeling non-linear relationships.                                            | Predicting values with rule-based conditions             | Prone to overfitting; doesn’t require assumptions about data distribution        |
| **17. Random Forest Regression**        | Ensemble method of decision trees to reduce variance.                                                 | More accurate predictions with noisy data                | Less interpretable but more robust                                               |


### Loss Function 

# 📘 Loss Functions in Regression

| **Loss Function**               | **Formula**                                                                                          | **Description**                                                                                 | **Typical Use Case**                             |
|--------------------------------|------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|--------------------------------------------------|
| **1. Mean Squared Error (MSE)**   | `MSE = (1/n) * Σ(yᵢ - ŷᵢ)²`                                                  | Measures the average squared difference between actual and predicted values. Penalizes large errors more. | Linear regression, general-purpose models        |
| **2. Mean Absolute Error (MAE)**  | `MAE = (1/n) * Σ[yᵢ - ŷᵢ]`                                                 | Calculates the average of the absolute differences. Less sensitive to outliers than MSE.       | Robust regression, real-world error measurement  |
| **3. Huber Loss**                 | `Huber = { 0.5*(yᵢ - ŷᵢ)²  if [yᵢ - ŷᵢ] ≤ δ; δ*([yᵢ - ŷᵢ] - 0.5*δ) otherwise }` | Combines MSE and MAE; behaves like MSE for small errors and MAE for large ones.                | Regression with outliers                         |
| **4. Log-Cosh Loss**             | `LogCosh = Σ log(cosh(yᵢ - ŷᵢ))`                                                    | Smooths the squared loss with better outlier handling. Similar to MSE but more stable.          | Deep learning models, smoother regression        |
| **5. Quantile Loss**             | `QLoss = Σ max(q*(yᵢ - ŷᵢ), (q - 1)*(ŷᵢ - yᵢ))`                               | Optimizes for quantiles (e.g., median); asymmetric error treatment.                             | Quantile regression, percentile forecasting      |
| **6. Mean Squared Log Error (MSLE)** | `MSLE = (1/n) * Σ(log(1 + yᵢ) - log(1 + ŷᵢ))²`                | Penalizes under-predictions more than over-predictions. Suitable when target values grow exponentially. | Population or financial growth prediction        |
| **7. Tweedie Loss**              | `Tweedie (complex formula depending on power p)`                                                     | Generalized loss covering MSE, Poisson, and Gamma distributions. Ideal for zero-inflated data.  | Insurance, claims modeling, GLM applications     |

- [....] is used as mode function here

### Evolution Matrics

| **Metric**                  | **Formula**                                                                 | **Description**                                                                 | **Typical Use Case**                        |
|----------------------------|-----------------------------------------------------------------------------|----------------------------------------------------------------------------------|---------------------------------------------|
| **1. Mean Absolute Error (MAE)**  | `MAE = (1/n) * Σ[yᵢ - ŷᵢ]`                                          | Average of absolute differences between actual and predicted values.             | When outliers are present; robust models    |
| **2. Mean Squared Error (MSE)**   | `MSE = (1/n) * Σ(yᵢ - ŷᵢ)²`                                          | Measures average squared differences; penalizes large errors more.               | Common in regression and optimization       |
| **3. Root Mean Squared Error (RMSE)** | `RMSE = sqrt((1/n) * Σ(yᵢ - ŷᵢ)²)`                               | Square root of MSE; interpretable in original units.                             | Reporting final model performance           |
| **4. Mean Absolute Percentage Error (MAPE)** | `MAPE = (1/n) * Σ([yᵢ - ŷᵢ]/ yᵢ) * 100`                       | Expresses error as a percentage; useful for relative accuracy.                   | Financial forecasting, sales prediction     |
| **5. R-squared (R² Score)**       | `R² = 1 - [Σ(yᵢ - ŷᵢ)² / Σ(yᵢ - ȳ)²]`                              | Proportion of variance in the target variable explained by the model.            | Model fit assessment                        |
| **6. Adjusted R-squared**         | `Adjusted R² = 1 - [(1 - R²) * (n - 1) / (n - k - 1)]`              | Adjusted for number of predictors; prevents overfitting with many features.      | Comparing models with different variables   |
| **6. Mean Bias Deviation (MBD)**  | `MBD = (1/n) * Σ(ŷᵢ - yᵢ)`                                        | Shows average prediction bias; indicates under or over-prediction trend.         | Calibration of forecasting models           |
| **7. Explained Variance Score**   | `EVS = 1 - Var(y - ŷ) / Var(y)`                                   | Measures how much of the variance is captured by the model.                      | Regression diagnostics                       |

- [...] is used as mode function above

![image.png](attachment:image.png)