# 🧠 Regression – Basic Notes

## 📌 What is Regression?

**Regression** is a supervised machine learning technique used to predict continuous numerical values based on input features (independent variables).

**Example use cases:**
- Predicting house prices
- Forecasting stock values
- Estimating salaries

---

## 🔧 Common Regression Models

| Model                     | Description |
|---------------------------|-------------|
| **Linear Regression**      | Fits a straight line to minimize error. Best for linear relationships. |
| **Ridge Regression**       | Linear model with L2 regularization to avoid overfitting. |
| **Lasso Regression**       | Linear model with L1 regularization. Also performs feature selection. |
| **SVR (Support Vector Regression)** | Tries to fit within a margin. Works well for non-linear data with tuning. |
| **KNN Regression**         | Predicts value based on the average of nearby points. Good for local patterns. |
| **Decision Tree Regressor** | Splits data into rules. Good for non-linear data, but can overfit. |

---

## 📥 Importing Models

To use these models, you need to import them from `sklearn`:
- `LinearRegression`
- `Ridge`
- `Lasso`
- `SVR`
- `KNeighborsRegressor`
- `DecisionTreeRegressor`

---

## 🔁 Training, Testing, and Predicting

1. **Training a model** involves fitting it to training data using known inputs and outputs.
2. **Testing** involves predicting on a separate test set and comparing predictions to actual outcomes.
3. **Prediction** is the process of using a trained model to estimate the output for new, unseen data.

---

## 📏 Evaluation Metrics for Regression

### 1. **MAE (Mean Absolute Error)**

- **Definition:** Measures the average magnitude of errors in predictions, without considering their direction.
- **Best Case:** MAE close to 0, indicating predictions are on average very close to actual values.
- **Worst Case:** High MAE, indicating large average error.
- **Parameter:** Takes the true values (`y_true`) and predicted values (`y_pred`).

---

### 2. **MSE (Mean Squared Error)**

- **Definition:** Measures the average of the squares of the errors. Larger errors have a disproportionately large effect due to the squaring.
- **Best Case:** MSE close to 0, indicating predictions are accurate with small errors.
- **Worst Case:** High MSE, indicating large errors.
- **Parameter:** Takes the true values (`y_true`) and predicted values (`y_pred`).

---

### 3. **RMSE (Root Mean Squared Error)**

- **Definition:** The square root of MSE. It gives a measure of the average magnitude of error in the same units as the target variable.
- **Best Case:** RMSE close to 0, indicating that predictions are very close to actual values.
- **Worst Case:** High RMSE, indicating significant deviations in predictions.
- **Parameter:** Takes the true values (`y_true`) and predicted values (`y_pred`).

---

## 📌 Summary of Best and Worst Case Evaluations

- **Best Case:** For **MAE**, **MSE**, and **RMSE**, the best case scenario is when the values are **close to 0**. This means the model has accurately predicted the target values with minimal error.
  
- **Worst Case:** A **high MAE**, **MSE**, or **RMSE** indicates poor model performance, with predictions significantly deviating from the actual values.

---

This covers the basics of regression, the common models, and how to evaluate them using popular metrics like **MAE**, **MSE**, and **RMSE**.


## Example Run on dummy data

In [1]:
from sklearn.metrics import mean_squared_error, mean_absolute_error
import numpy as np
import pandas as pd

In [2]:
from sklearn.datasets import make_regression
# Create synthetic data with 12 features
X, y = make_regression(n_samples=1000, n_features=12, noise=10, random_state=42)
X

array([[-0.42064532,  0.40405086, -0.34271452, ..., -0.80227727,
         0.17457781,  0.00511346],
       [ 0.32707237, -0.2387526 , -0.58206133, ...,  0.39904414,
        -0.05706129,  2.82422043],
       [ 0.4559042 , -1.75182881, -0.45909031, ..., -0.69460037,
         0.15805349,  1.32864083],
       ...,
       [ 1.16929559,  0.14671369,  1.38215899, ...,  0.64870989,
        -0.81693567,  0.82048218],
       [ 0.62563093, -0.53884186, -0.69677182, ...,  0.58202657,
        -1.96262569,  0.84817421],
       [ 0.02310435,  2.02681712, -0.97793884, ..., -0.61242289,
        -0.8300705 ,  0.29216883]])

### ========== train_test_split ==========

In [3]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# ========== Linear Regression ==========

In [4]:
from sklearn.linear_model import LinearRegression
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)
lr_pred = lr_model.predict(X_test)

lr_mse = mean_squared_error(y_test, lr_pred)
lr_mae = mean_absolute_error(y_test, lr_pred)
lr_rmse = np.sqrt(lr_mse)

print("Linear Regression")
print(f"  MSE : {lr_mse:.2f}")
print(f"  MAE : {lr_mae:.2f}")
print(f"  RMSE: {lr_rmse:.2f}")
print("-" * 40)

Linear Regression
  MSE : 102.48
  MAE : 8.14
  RMSE: 10.12
----------------------------------------


# ========== Ridge Regression ==========

In [5]:
from sklearn.linear_model import Ridge
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train, y_train)
ridge_pred = ridge_model.predict(X_test)

ridge_mse = mean_squared_error(y_test, ridge_pred)
ridge_mae = mean_absolute_error(y_test, ridge_pred)
ridge_rmse = np.sqrt(ridge_mse)

print("Ridge Regression")
print(f"  MSE : {ridge_mse:.2f}")
print(f"  MAE : {ridge_mae:.2f}")
print(f"  RMSE: {ridge_rmse:.2f}")
print("-" * 40)

Ridge Regression
  MSE : 102.20
  MAE : 8.12
  RMSE: 10.11
----------------------------------------


# ========== Lasso Regression ==========

In [6]:
from sklearn.linear_model import Lasso

lasso_model = Lasso(alpha=0.1)
lasso_model.fit(X_train, y_train)
lasso_pred = lasso_model.predict(X_test)

lasso_mse = mean_squared_error(y_test, lasso_pred)
lasso_mae = mean_absolute_error(y_test, lasso_pred)
lasso_rmse = np.sqrt(lasso_mse)

print("Lasso Regression")
print(f"  MSE : {lasso_mse:.2f}")
print(f"  MAE : {lasso_mae:.2f}")
print(f"  RMSE: {lasso_rmse:.2f}")
print("-" * 40)

Lasso Regression
  MSE : 102.41
  MAE : 8.14
  RMSE: 10.12
----------------------------------------


# ========== Support Vector Regression ==========

In [7]:
from sklearn.svm import SVR
svr_model = SVR()
svr_model.fit(X_train, y_train)
svr_pred = svr_model.predict(X_test)

svr_mse = mean_squared_error(y_test, svr_pred)
svr_mae = mean_absolute_error(y_test, svr_pred)
svr_rmse = np.sqrt(svr_mse)

print("Support Vector Regression (SVR)")
print(f"  MSE : {svr_mse:.2f}")
print(f"  MAE : {svr_mae:.2f}")
print(f"  RMSE: {svr_rmse:.2f}")
print("-" * 40)

Support Vector Regression (SVR)
  MSE : 28369.17
  MAE : 135.41
  RMSE: 168.43
----------------------------------------


# ========== Decision Tree Regression ==========

In [8]:
from sklearn.tree import DecisionTreeRegressor
dt_model = DecisionTreeRegressor(random_state=42)
dt_model.fit(X_train, y_train)
dt_pred = dt_model.predict(X_test)

dt_mse = mean_squared_error(y_test, dt_pred)
dt_mae = mean_absolute_error(y_test, dt_pred)
dt_rmse = np.sqrt(dt_mse)

print("Decision Tree Regression")
print(f"  MSE : {dt_mse:.2f}")
print(f"  MAE : {dt_mae:.2f}")
print(f"  RMSE: {dt_rmse:.2f}")
print("-" * 40)

Decision Tree Regression
  MSE : 17404.23
  MAE : 102.77
  RMSE: 131.93
----------------------------------------


# ========== K-Nearest Neighbors Regression ==========

In [9]:
from sklearn.neighbors import KNeighborsRegressor
knn_model = KNeighborsRegressor(n_neighbors=5)
knn_model.fit(X_train, y_train)
knn_pred = knn_model.predict(X_test)

knn_mse = mean_squared_error(y_test, knn_pred)
knn_mae = mean_absolute_error(y_test, knn_pred)
knn_rmse = np.sqrt(knn_mse)

print("K-Nearest Neighbors Regression")
print(f"  MSE : {knn_mse:.2f}")
print(f"  MAE : {knn_mae:.2f}")
print(f"  RMSE: {knn_rmse:.2f}")
print("-" * 40)

K-Nearest Neighbors Regression
  MSE : 7936.17
  MAE : 70.97
  RMSE: 89.09
----------------------------------------
