
# **🎯 Linear Regression  🎯**

### 📌 **Definition:**
Linear Regression predicts a continuous target variable based on one or more input features by fitting a straight line.

### 📈 **Equation:**
**Y = β₀ + β₁X₁ + β₂X₂ + ... + βₙXₙ + ε**

→ Y = Target

→ X = Features

→ β = Coefficients

→ ε = Error term



### 🧠 **Key Assumptions (VERY IMPORTANT!)**

✅ Linearity – Relationship between X and Y is linear

✅ Independence – Observations are independent

✅ Homoscedasticity – Constant variance of errors

✅ Normality – Errors are normally distributed

✅ No multicollinearity – Features shouldn't be highly correlated



### 🧮 **Types:**

* **Simple Linear Regression** → 1 INDEPENDENT VARIABLE ( PREDICTOR ) AND OTHER DEPENDENT VARIABLE (RESPONSE)

* **Multiple Linear Regression** → 2 INDEPENDENT VARIABLE( PREDICTORS )



### 📊 **Evaluation Metrics:**

* **R² (R-squared)** – Goodness of fit

* **Adjusted R²** – Penalizes extra features

* **MSE / RMSE / MAE** – Error metrics



# **🎯 OLS Method – Ordinary Least Squares 🎯**

### 📌 **Definition:**

- OLS is the most common method used to estimate the coefficients (**β**) in **linear regression** by **minimizing the sum of squared errors** between actual and predicted values.

- OLS finds the **best-fitting line** by choosing **β₀, β₁, ..., βₙ** that minimize SSR.
---

### 🧮 **Objective Function:**

👉 Minimize: **Σ (Yᵢ − Ŷᵢ)²**

➡️ Yᵢ = Actual value

➡️ Ŷᵢ = Predicted value

➡️ This is the **Sum of Squared Residuals (SSR)**

✅The LinearRegression class in scikit-learn uses OLS (Ordinary Least Squares) internally.


In [7]:
# SIMPLE LINEAR REGRESSION:

DATA={"salary":[200000,400000,500000,300000,800000],"experience":[1,2,4,2,10]}
import pandas as pd
DATA=pd.DataFrame(DATA)
x = DATA[['experience']]  # Independent variable (2D)
y = DATA['salary']        # Dependent variable

from sklearn.linear_model import LinearRegression

LR=LinearRegression()
LR.fit(x,y)


In [10]:
y_pred=LR.predict(x)
y_pred

array([268181.81818182, 329545.45454545, 452272.72727273, 329545.45454545,
       820454.54545455])

In [18]:
# 1. **R² Score** 📊
r_squared=LR.score(x, y)
print("R² Score:", r_squared)  # R²

# 2. **Intercept (β₀)** and **Slope (β₁)** of the Model
print("Intercept (β₀):", LR.intercept_)  # Intercept
print("Slope (β₁):", LR.coef_[0])  # Slope

# 3. **Adjusted R²** 🔧
n = len(y)  # Number of data points
p = x.shape[1]  # Number of features
adjusted_r_squared = 1 - (1 - r_squared) * (n - 1) / (n - p - 1)
print("Adjusted R²:", adjusted_r_squared)

from sklearn.metrics import mean_squared_error

# 4. **Mean Squared Error (MSE)** 🧮
mse = mean_squared_error(y, y_pred)
print("MSE (Mean Squared Error):", mse)

import numpy as np
# 5. **Root Mean Squared Error (RMSE)** √
rmse = np.sqrt(mse)
print("RMSE (Root Mean Squared Error):", rmse)

from sklearn.metrics import mean_absolute_error
# 6. **Mean Absolute Error (MAE)** 📏
mae = mean_absolute_error(y, y_pred)
print("MAE (Mean Absolute Error):", mae)


R² Score: 0.9378216123499142
Intercept (β₀): 206818.18181818182
Slope (β₁): 61363.63636363637
Adjusted R²: 0.917095483133219
MSE (Mean Squared Error): 2636363636.363635
RMSE (Root Mean Squared Error): 51345.53180524704
MAE (Mean Absolute Error): 47272.72727272726


| **Metric**                           | **Definition**                                                                | **Code**                                                           | **Interpretation**                                                                                                               |
| ------------------------------------ | ----------------------------------------------------------------------------- | ------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------- |
| **R² Score** 📊                      | Measures how well the model explains the variance in the data.                | `model.score(X, y)`                                                | - **R² = 1**: Perfect model 🎯<br> - **R² = 0**: No better than the mean 📉<br> - **Negative R²**: Poor model ❌                  |
| **Adjusted R²** 🔧                   | Adjusts R² by penalizing models with too many features.                       | `adjusted_r_squared = 1 - (1 - r_squared) * (n - 1) / (n - p - 1)` | - **Higher Adjusted R²**: Good model with balanced features 🌟<br> - **Low Adjusted R²**: Potential overfitting ⚠️               |
| **MSE (Mean Squared Error)** 🧮      | Measures the average squared difference between actual and predicted values.  | `mean_squared_error(y, y_pred)`                                    | - **Low MSE**: Better fit 🔧<br> - **High MSE**: Poor model 🛑<br> - **MSE = 0**: Perfect model (rare) 🎯                        |
| **RMSE (Root Mean Squared Error)** √ | The square root of MSE, showing error in original units.                      | `sqrt(mse)`                                                        | - **Low RMSE**: Small errors, good model 👍<br> - **High RMSE**: Large errors, needs improvement 🔴                              |
| **MAE (Mean Absolute Error)** 📏     | Measures the average absolute difference between actual and predicted values. | `mean_absolute_error(y, y_pred)`                                   | - **Low MAE**: More accurate predictions 📈<br> - **High MAE**: Large error margin 🛑<br> - **MAE = 0**: Perfect model (rare) 🌟 |


In [21]:
# MULTI LINEAR REGRESSION:
# Example dataset: Experience (in years) and Age (in years)
data = {
    'Experience': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Age': [25, 28, 30, 35, 38, 40, 45, 50, 55, 60],
    'Salary': [30000, 32000, 34000, 36000, 38000, 42000, 45000, 48000, 51000, 55000]
}
# Creating a DataFrame
df = pd.DataFrame(data)

# Features (X) - Experience and Age
X = df[['Experience', 'Age']]

# Target variable (y) - Salary
y = df['Salary']

# Splitting data into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initializing the Linear Regression model
from sklearn.linear_model import LinearRegression
model = LinearRegression()

# Fitting the model to the training data
model.fit(X_train, y_train)

# Making predictions
y_pred = model.predict(X_test)

# Model Evaluation
print(f"Intercept (β₀): {model.intercept_}")
print(f"Coefficients (β₁, β₂): {model.coef_}")

# R² Score
from sklearn.metrics import r2_score 
print(f"R² Score: {r2_score(y_test, y_pred)}")

# Mean Squared Error (MSE)
from sklearn.metrics import mean_squared_error
print(f"MSE: {mean_squared_error(y_test, y_pred)}")

# Output the predictions
print(f"Predicted Salaries: {y_pred}")



Intercept (β₀): 16444.790046656304
Coefficients (β₁, β₂): [982.50388802 474.33903577]
R² Score: 0.9986888847387978
MSE: 118328.1523235004
Predicted Salaries: [51375.97200622 31691.29082426]




## 🎯 **Types of Linear Regression** 🎯


---

### 1️⃣ **Simple Linear Regression**

📌 1 independent variable

🧮 Y = β₀ + β₁X + ε

📊 Example: Predict salary using experience



### 2️⃣ **Multiple Linear Regression**

📌 2 or more independent variables

🧮 Y = β₀ + β₁X₁ + β₂X₂ + ... + βₙXₙ + ε

📊 Example: Predict house price using area, rooms, and location



### 3️⃣ **Ridge Regression** (L2)

🔧 Adds penalty to reduce overfitting

➕ Minimizes: MSE + λΣβ²

✅ Use when features are correlated



### 4️⃣ **Lasso Regression** (L1)

✂️ Shrinks some coefficients to 0

➕ Minimizes: MSE + λΣ|β|

✅ Good for feature selection



### 5️⃣ **Elastic Net**

⚖️ Combo of Ridge + Lasso

➕ Minimizes: MSE + λ₁Σ|β| + λ₂Σβ²

✅ Works well with many correlated features




### 🧠 **Remember:**

* Use **Simple/Multiple** for basic problems

* Use **Ridge/Lasso/Elastic Net** to handle **overfitting** or **multicollinearity**

