# ðŸ”¹ Linear Regression

**Linear Regression** is one of the simplest and most widely used algorithms in **supervised machine learning**.  
It is mainly used for **regression tasks** â†’ predicting a **continuous value**.

---

## âœ¨ Key Idea

Linear regression tries to model the relationship between:

- **Independent variable(s) (X):** Input(s)  
- **Dependent variable (Y):** Output (target)  

using a **straight-line equation**:

\[
Y = mX + c
\]

where:  
- \( Y \) = predicted output  
- \( X \) = input feature  
- \( m \) = slope (coefficient/weight)  
- \( c \) = intercept (bias)  

For multiple features:

\[
Y = w_1X_1 + w_2X_2 + \dots + w_nX_n + b
\]

---

## ðŸ”¸ Example: House Price Prediction

Suppose we want to predict **house price** based on its **size (sqft)**.

| Size (sqft) | Price (Lakhs) |
|-------------|---------------|
| 1000        | 50            |
| 1500        | 65            |
| 2000        | 80            |

- The algorithm fits a **straight line** through the data points.  
- Equation might look like:  

\[
Price = 0.03 \times (Size) + 20
\]

- For a **1700 sqft** house â†’ Predicted price â‰ˆ **71 Lakhs**.  


# ðŸ“ˆ Linear Regression Concepts

---

## ðŸ”¹ Best Fit Line
- The line that minimizes error between actual points and predicted values.
- Equation:  
  $$
  y = mx + c
  $$
- General hypothesis form:  
  $$
  h_\theta(x) = \theta_0 + \theta_1x_1 + \theta_2x_2 + \dots + \theta_nx_n
  $$

---

## ðŸ”¹ Hypothesis vs Actual
- **Hypothesis (Prediction):** $h_\theta(x)$ â†’ Predicted value from model  
- **Actual Point:** $y$ â†’ True data value  

---

## ðŸ”¹ Parameters
- **Slope ($m$ or $\theta_1$):** Change in $y$ for one unit change in $x$  
- **Intercept ($c$ or $\theta_0$):** Value of $y$ when $x = 0$  

---

## ðŸ”¹ Cost Function (Error)
Measures how far predictions are from actual values.  
For Linear Regression, we use **Mean Squared Error (MSE):**

$$
J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} \Big(h_\theta(x^{(i)}) - y^{(i)}\Big)^2
$$

---

## ðŸ”¹ Residuals
The error for each data point:  

$$
\text{Residual} = h_\theta(x) - y
$$

---

## ðŸ”¹ Gradient Descent
Optimization algorithm to minimize the cost function.  
Update rule for each parameter $\theta_j$:

$$
\theta_j := \theta_j - \alpha \frac{\partial J(\theta)}{\partial \theta_j}
$$

- $\alpha$ = Learning rate (step size)  

---

## ðŸ”¹ Global Minima
- The point where cost function $J(\theta)$ is **minimum**.  
- Our goal = adjust parameters until we reach this point.  

---

## ðŸ”¹ Convergence Theorem
- Repeat gradient descent updates until cost function stops decreasing (or changes are negligible).  

---

## ðŸ”¹ Learning Rate ($\alpha$)
- Controls step size in gradient descent.  
- Too small â†’ very slow convergence  
- Too large â†’ may overshoot and never converge  

---

## ðŸ”¹ Why adjust parameters ($\theta$)?
- To move step by step towards the **global minima**  
- At global minima, we get the **best-fit line** with **minimum error**  

---




