`Linear Regression`

### Types of Regression in Machine Learning (ML) – With Examples & Complete Explanation

Regression is a **supervised machine learning technique** used to predict **continuous numeric values** (as opposed to classification, which predicts categories). It estimates the relationships among variables.

---

## ✅ 1. **Linear Regression**

**Definition:**
Predicts a dependent variable (Y) based on one or more independent variables (X) by fitting a straight line (linear equation).

**Equation:**
$Y = a + bX$
Where:

* $a$: intercept
* $b$: slope (coefficient)

**Example:**
Predict house price based on its size:

```text
Size (sq ft): 1000 → Price: $200,000
Size (sq ft): 1500 → Price: $300,000
```

Model learns:
$\text{Price} = 100 \times \text{Size} + 100,000$

---

## ✅ 2. **Multiple Linear Regression**

**Definition:**
Like linear regression, but uses **more than one independent variable**.

**Equation:**
$Y = a + b_1X_1 + b_2X_2 + ... + b_nX_n$

**Example:**
Predict house price based on size, location, and number of bedrooms.

---

## ✅ 3. **Polynomial Regression**

**Definition:**
Models a **non-linear** relationship by adding higher-degree terms of the input variable.

**Equation:**
$Y = a + b_1X + b_2X^2 + b_3X^3 + ... + b_nX^n$

**Example:**
Predict car price vs. age: newer and very old cars may be cheaper than mid-aged cars → curve shape.

---

## ✅ 4. **Ridge Regression**

**Definition:**
Linear regression with **L2 regularization** (adds a penalty on large coefficients to prevent overfitting).

**Cost Function:**
$\text{Loss} = \sum (Y - \hat{Y})^2 + \lambda \sum b_i^2$

**Use Case:**
When features are highly correlated or dataset has multicollinearity.

---

## ✅ 5. **Lasso Regression**

**Definition:**
Linear regression with **L1 regularization** (can shrink coefficients to zero, useful for feature selection).

**Cost Function:**
$\text{Loss} = \sum (Y - \hat{Y})^2 + \lambda \sum |b_i|$

**Use Case:**
Useful when you want a simpler model that automatically eliminates less important variables.

---

## ✅ 6. **Elastic Net Regression**

**Definition:**
Combines **Ridge (L2)** and **Lasso (L1)** penalties.

**Use Case:**
When you have many correlated features and need both regularization and variable selection.

---

## ✅ 7. **Logistic Regression** *(technically classification)*

**Definition:**
Used to predict **binary or categorical outcomes**, not continuous values.

**Equation (Sigmoid):**
$P(Y=1) = \frac{1}{1 + e^{-(a + bX)}}$

**Example:**
Will a customer buy (yes/no) based on income, age, and previous purchases?

👉 **Note:** Despite its name, **logistic regression is not used for regression tasks**—it’s a classification algorithm.

---

## ✅ 8. **Stepwise Regression**

**Definition:**
An automatic process that selects features by adding/removing predictors based on their statistical significance.

**Use Case:**
When you have many potential predictors and want an optimal subset.

---

## ✅ 9. **Quantile Regression**

**Definition:**
Predicts a specific **quantile (e.g., 90th percentile)** of the target variable instead of the mean.

**Use Case:**
Used when the impact of variables differs across the distribution (e.g., income levels).

---

## ✅ 10. **Support Vector Regression (SVR)**

**Definition:**
An extension of SVM (Support Vector Machine) for regression. Tries to fit the best line within a margin of tolerance.

**Use Case:**
When the data is high-dimensional or not linearly separable.

---

## ✅ 11. **Decision Tree Regression**

**Definition:**
Splits data into branches using decision rules and predicts the average value in a leaf node.

**Use Case:**
Non-linear relationships, easily interpretable.

---

## ✅ 12. **Random Forest Regression**

**Definition:**
Ensemble of multiple decision trees. Averages predictions to reduce variance.

**Use Case:**
High accuracy, robust to overfitting, works well with both linear and non-linear data.

---

## ✅ 13. **Gradient Boosting Regression (GBR / XGBoost / LightGBM / CatBoost)**

**Definition:**
Ensemble method that builds trees sequentially, where each new tree corrects errors from previous ones.

**Use Case:**
Highly effective for structured/tabular data, often wins ML competitions.

---

## Summary Table:

| Type                | Linear/Non-linear | Regularized | Use Case                        |
| ------------------- | ----------------- | ----------- | ------------------------------- |
| Linear Regression   | Linear            | No          | Simple problems                 |
| Multiple Linear     | Linear            | No          | Multiple features               |
| Polynomial          | Non-linear        | No          | Curved relationships            |
| Ridge               | Linear            | L2          | Multicollinearity               |
| Lasso               | Linear            | L1          | Feature selection               |
| Elastic Net         | Linear            | L1 + L2     | Combo of Ridge & Lasso          |
| Quantile            | Linear/Non-linear | No          | Quantile prediction             |
| SVR                 | Non-linear        | Yes         | High-dimensional                |
| Decision Tree       | Non-linear        | No          | Interpretable                   |
| Random Forest       | Non-linear        | No          | Ensemble, less overfitting      |
| Gradient Boosting   | Non-linear        | No          | High performance                |
| Logistic Regression | --                | --          | Classification (not regression) |

---

Would you like code examples for any of these in Python (e.g., using scikit-learn)?


**`1. Simple Linear Regression`** <br>
In simple linear regression we have only 2 columns. Once for input and other for output. <br>
### ✅ Simple Linear Regression – Full Explanation with Example

---

### 📘 What is Simple Linear Regression?

**Simple Linear Regression** is a statistical method used in machine learning to model the relationship between:

* **One independent variable (X)**
* **One dependent variable (Y)**

It fits a **straight line** to predict values of Y from X.

---

### 📌 Mathematical Formula

$$
Y = a + bX
$$

Where:

* $Y$: Target / dependent variable
* $X$: Feature / independent variable
* $a$: Intercept (value of Y when X = 0)
* $b$: Slope (change in Y for a unit change in X)

---

### 🔍 Example Scenario

> **Problem:** Predict a student’s test score based on study hours.

| Hours Studied (X) | Test Score (Y) |
| ----------------- | -------------- |
| 1                 | 50             |
| 2                 | 55             |
| 3                 | 65             |
| 4                 | 70             |
| 5                 | 75             |

You want to find the line $Y = a + bX$ that best fits this data.

---

### 🛠️ Python Implementation (Using Scikit-learn)

```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Sample data
X = np.array([[1], [2], [3], [4], [5]])  # Study hours
y = np.array([50, 55, 65, 70, 75])       # Test scores

# Create and train the model
model = LinearRegression()
model.fit(X, y)

# Get parameters
a = model.intercept_   # Intercept
b = model.coef_[0]      # Slope

print(f"Equation: Y = {a:.2f} + {b:.2f}X")

# Predict
X_new = np.array([[6]])  # Predict score for 6 hours study
predicted = model.predict(X_new)
print(f"Predicted score for 6 hours: {predicted[0]:.2f}")

# Plot
plt.scatter(X, y, color='blue')
plt.plot(X, model.predict(X), color='red')  # Regression line
plt.xlabel("Hours Studied")
plt.ylabel("Test Score")
plt.title("Simple Linear Regression")
plt.show()
```

---

### 📈 Output Explanation

* **Line Equation** (example):

  $$
  Y = 45 + 6X
  $$

  → If you study 6 hours, predicted score = $45 + 6 \times 6 = 81$

* **Graph**: You'll see:

  * Blue dots: Original data points
  * Red line: Best fit line

---

### ✅ Key Assumptions

1. Linear relationship between X and Y
2. Homoscedasticity (equal variance)
3. Independence of errors
4. Normally distributed residuals

---

### 🔁 Use Cases

* Predicting salary based on experience
* Predicting house price based on size
* Forecasting demand based on price

---

Would you like me to generate the output graph or go through a manual calculation (without Python)?


In [1]:
import numpy as np
from sklearn.linear_model import LinearRegression

In [7]:
import plotly.express as px

In [2]:
model = LinearRegression()

In [None]:
x = [13,11,9,14,5]
y =[44,56,33,65,23]

In [15]:
px.scatter(x=x,y=y)

In [6]:
model.fit(x,y)

ValueError: Expected 2D array, got 1D array instead:
array=[3 4 2 4 5].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

In [None]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=2)
from sklearn.linear_model import LinearRegression

lr = LinearRegression()
lr.fit(X_train,y_train)


In [None]:
lr.predict(X_test)

In [1]:
import pandas as pd

In [4]:
df = pd.read_csv('https://github.com/ybifoundation/Dataset/raw/main/Salary%20Data.csv')

In [48]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40 entries, 0 to 39
Data columns (total 2 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Experience Years  40 non-null     float64
 1   Salary            40 non-null     int64  
dtypes: float64(1), int64(1)
memory usage: 772.0 bytes


In [5]:
df.head()

Unnamed: 0,Experience Years,Salary
0,1.1,39343
1,1.2,42774
2,1.3,46205
3,1.5,37731
4,2.0,43525


In [56]:
X = df[['Experience Years']]

In [50]:
y = df['Salary']

In [57]:
X

Unnamed: 0,Experience Years
0,1.1
1,1.2
2,1.3
3,1.5
4,2.0
5,2.2
6,2.5
7,2.9
8,3.0
9,3.2


In [51]:
y

0      39343
1      42774
2      46205
3      37731
4      43525
5      39891
6      48266
7      56642
8      60150
9      54445
10     64445
11     60000
12     57189
13     60200
14     63218
15     55794
16     56957
17     57081
18     59095
19     61111
20     64500
21     67938
22     66029
23     83088
24     82200
25     81363
26     93940
27     91000
28     90000
29     91738
30     98273
31    101302
32    113812
33    111620
34    109431
35    105582
36    116969
37    112635
38    122391
39    121872
Name: Salary, dtype: int64

In [12]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

In [32]:
X.set_index('Experience Years',inplace=True)

In [41]:
y.set_index('Salary',inplace=True)

In [52]:
X

0      1.1
1      1.2
2      1.3
3      1.5
4      2.0
5      2.2
6      2.5
7      2.9
8      3.0
9      3.2
10     3.2
11     3.5
12     3.7
13     3.8
14     3.9
15     4.0
16     4.0
17     4.1
18     4.3
19     4.5
20     4.7
21     4.9
22     5.1
23     5.3
24     5.5
25     5.9
26     6.0
27     6.2
28     6.5
29     6.8
30     7.1
31     7.9
32     8.2
33     8.5
34     8.7
35     9.0
36     9.5
37     9.6
38    10.3
39    10.5
Name: Experience Years, dtype: float64

In [58]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.1)

In [54]:
y_train

15     55794
31    101302
8      60150
36    116969
6      48266
13     60200
30     98273
12     57189
4      43525
1      42774
23     83088
32    113812
29     91738
35    105582
19     61111
38    122391
14     63218
34    109431
27     91000
20     64500
9      54445
22     66029
16     56957
37    112635
2      46205
18     59095
5      39891
11     60000
21     67938
26     93940
25     81363
39    121872
33    111620
3      37731
17     57081
24     82200
Name: Salary, dtype: int64

In [43]:
lr = LinearRegression()

In [59]:
lr.fit(X_train,y_train)

0,1,2
,fit_intercept,True
,copy_X,True
,tol,1e-06
,n_jobs,
,positive,False


In [62]:
y_predict = lr.predict(X_test)

In [64]:
df.sample()

Unnamed: 0,Experience Years,Salary
6,2.5,48266


In [61]:
y_test

35    105582
25     81363
18     59095
34    109431
Name: Salary, dtype: int64

In [63]:
import plotly.express as px

In [None]:
px.scatter(df,x='Experience Years',y='Salary')


Create new class for linear Regression

In [67]:
df.sample()

Unnamed: 0,Experience Years,Salary
36,9.5,116969


In [70]:
X = df.iloc[:,0].values

In [72]:
y = df.iloc[:,1].values

In [73]:
X
y

array([ 39343,  42774,  46205,  37731,  43525,  39891,  48266,  56642,
        60150,  54445,  64445,  60000,  57189,  60200,  63218,  55794,
        56957,  57081,  59095,  61111,  64500,  67938,  66029,  83088,
        82200,  81363,  93940,  91000,  90000,  91738,  98273, 101302,
       113812, 111620, 109431, 105582, 116969, 112635, 122391, 121872])

In [91]:
class MeraLr:
    def __init__(self):
        self.m = None
        self.b = None
    
    def fit(self,X_train,y_train):
        num = 0
        den = 0
        for i in range(X_train.shape[0]):
            num += (X_train[i] - X_train.mean())*(y_train[i]-y_train.mean())
            den += (X_train[i] - X_train.mean())**2
        
        self.m = num/den
        self.b = y_train.mean() - (self.m*X_train.mean())

        print(self.m,self.b)
    def predict(self,X_test):

        return self.m * X_test + self.b
        

In [74]:
from sklearn.model_selection import train_test_split

In [75]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=2)

In [76]:
X_test

array([6.2, 3.2, 3.9, 1.1, 1.3, 7.1, 3.8, 9.5])

In [95]:
lr = MeraLr()

In [96]:
lr.fit(X_train,y_train)

9629.895616355 24469.05453811407


In [97]:
lr.m

np.float64(9629.895616355)

In [82]:
lr.b

np.float64(24469.05453811407)

In [98]:
y_test

array([ 91000,  54445,  63218,  39343,  46205,  98273,  60200, 116969])

In [86]:
3.9*9629.895616355 + 24469.05453811407

62025.64744189857

In [115]:
y_predict = (lr.predict(X_test))

In [121]:
px.scatter(x=X_test,y=y_test,color=y_test)
#px.scatter(x=X_test,y=y_predict,color=X_test)

In [120]:
px.scatter(x=X_test,y=y_predict,color=X_test)

Here’s a quick **efficiency check** using standard **regression evaluation metrics** for a model. Below are the **definitions**, **formulas**, and **how to interpret** each:

---

## ✅ 1. **MAE (Mean Absolute Error)**

**Formula:**

$$
\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|
$$

**Interpretation:**

* Average absolute difference between predicted and actual values.
* Lower is better.
* Sensitive to scale but not outliers.
* Same Unit in output
* Can't Differetiate (not applicable)

---

## ✅ 2. **MSE (Mean Squared Error)**

**Formula:**

$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

**Interpretation:**

* Average of squared errors.
* Heavily penalizes large errors (outliers).
* Lower is better.
* MSE called as Loss function
* Differentiable
* MSE unit is different as y given
* If outliers than panelised it(robust outliers)

---

## ✅ 3. **RMSE (Root Mean Squared Error)**

**Formula:**

$$
\text{RMSE} = \sqrt{\text{MSE}} = \sqrt{ \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 }
$$

**Interpretation:**

* Same units as the target variable.
* Easier to interpret than MSE.
* Lower is better.
* Robust outliers
* Mostly use in DeepLearning

---

Up to there is loss function

---

## ✅ 4. **R ² Score (Coefficient of Determination) (Goodness of Fit)**

**Formula:**

$$
R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2}
$$

**Interpretation:**

* Proportion of variance in $y$ explained by the model.
* $R^2 = 1$: perfect fit
* $R^2 = 0$: model explains no variance
* Can be **negative** if model is worse than mean predictor.

---

## ✅ 5. **Adjusted R² Score**

**Formula:**

- If adding no of columns in R2 score than R2 score increase which is not correct for that Adjusted R2 score introduce

$$
R^2_{adj} = 1 - (1 - R^2) \cdot \frac{n - 1}{n - p - 1}
$$

Where:

* $n$ = number of observations
* $p$ = number of predictors

**Interpretation:**

* Adjusts R² for number of predictors.
* Penalizes unnecessary complexity.
* Only increases if new predictor improves model more than by chance.

---

## 📌 Summary Table

| Metric      | Good Value    | Sensitive to Outliers | Notes                   |
| ----------- | ------------- | --------------------- | ----------------------- |
| MAE         | ↓ Lower       | No                    | More robust to outliers |
| MSE         | ↓ Lower       | Yes                   | Penalizes large errors  |
| RMSE        | ↓ Lower       | Yes                   | Interpretable scale     |
| R²          | ↑ Closer to 1 | Yes                   | Variance explained      |
| Adjusted R² | ↑ Closer to 1 | Yes                   | Penalizes overfitting   |

---

Would you like a Python code snippet to calculate all these from `y_true` and `y_pred`?


In [122]:
# code part

#MAE
from sklearn.metrics import mean_absolute_error, mean_squared_error,r2_score


In [125]:
from sklearn.linear_model import LinearRegression
lr = LinearRegression()

In [130]:
X_train
y_train

array([ 57081, 112635, 122391,  91738,  82200,  57189,  56957,  42774,
       111620,  83088,  81363,  61111, 113812,  64445,  43525,  48266,
        37731, 109431,  39891,  90000,  64500,  93940, 121872,  67938,
       105582, 101302,  56642,  60000,  59095,  66029,  60150,  55794])

In [131]:
lr.fit(X_train.reshape(-1,1),y_train)

0,1,2
,fit_intercept,True
,copy_X,True
,tol,1e-06
,n_jobs,
,positive,False


In [136]:
y_predict = lr.predict(X_test.reshape(-1,1))

In [138]:
y_predict

array([ 84174.40735952,  55284.72051045,  62025.6474419 ,  35061.9397161 ,
        36987.91883938,  92841.31341423,  61062.65788026, 115953.06289349])

In [135]:
y_test

array([ 91000,  54445,  63218,  39343,  46205,  98273,  60200, 116969])

In [137]:
print('MAE',mean_absolute_error(y_test,y_predict))

MAE 3708.261090762284


In [139]:
print('MSE',mean_squared_error(y_test,y_predict))

MSE 22909642.289620496


In [141]:
print('R2',r2_score(y_test,y_predict))
r2 = r2_score(y_test,y_predict)

R2 0.9655807830897453


In [142]:
X_test.shape

(8,)

In [144]:
a = 1-((1-r2)*(8-1)/(8-2))

In [145]:
print('Adjust R2 score',a)

Adjust R2 score 0.9598442469380362
