<a href="https://colab.research.google.com/github/GavishKapoor/ML-Assignment-Regression-Gavish-Kapoor.ipynb/blob/main/ML_Assignement_Regression_Gavish_Kapoor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##  **Simple Linear Regression**

### 1. **What is Simple Linear Regression?**

It's a method to predict one variable (Y) using another variable (X). It fits a straight line through the data.

---

### 2. **Key assumptions of Simple Linear Regression:**

* There is a **linear** relationship between X and Y.
* The errors (residuals) are **normally distributed**.
* Errors have **constant variance** (homoscedasticity).
* Observations are **independent**.

---

### 3. **What does 'm' represent in Y = mX + c?**

**m is the slope**, meaning how much Y changes when X increases by 1 unit.

---

### 4. **What does 'c' represent in Y = mX + c?**

**c is the intercept**, the value of Y when X = 0.

---

### 5. **How do we calculate slope (m) in Simple Linear Regression?**

Using the formula:

$$
m = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}
$$

---

### 6. **Purpose of the least squares method:**

It finds the **best-fitting line** by minimizing the **sum of squared errors** (differences between actual and predicted Y).

---

### 7. **What is R² (coefficient of determination)?**

It shows how well the model explains the data.

* **R² = 1** → perfect fit
* **R² = 0** → no relationship

---

##  **Multiple Linear Regression**

### 8. **What is Multiple Linear Regression?**

It predicts Y using **two or more** input variables (X₁, X₂, X₃...).

---

### 9. **Main difference between Simple and Multiple Linear Regression:**

* **Simple**: 1 input variable
* **Multiple**: 2 or more input variables

---

### 10. **Key assumptions of Multiple Linear Regression:**

Same as simple linear +

* No **multicollinearity** between variables
* Linearity between each X and Y

---

### 11. **What is heteroscedasticity?**

When the variance of errors **changes** across data points.
It makes the model unreliable.

---

### 12. **How to improve model with high multicollinearity?**

* **Remove** highly correlated variables
* Use **Principal Component Analysis (PCA)**
* Use **Ridge/Lasso** regression

---

### 13. **Transforming categorical variables:**

* **Label Encoding**
* **One-Hot Encoding**

---

### 14. **Role of interaction terms:**

They check if two variables combined have an extra effect on Y.
Example: X₁ \* X₂

---

### 15. **Intercept interpretation (Simple vs Multiple):**

* **Simple**: Value of Y when X = 0
* **Multiple**: Y when **all** Xs = 0 (not always meaningful)

---

### 16. **Significance of slope in regression:**

It shows **how much Y changes** when X increases by 1 unit. It helps in **predicting trends**.

---

### 17. **How intercept provides context:**

The intercept gives the **starting point** of Y when all inputs are zero.

---

### 18. **Limitations of R² as the only measure:**

* It increases with more variables even if they don’t help
* Doesn’t tell about **overfitting**
  Use **Adjusted R²** or **Cross-validation** instead.

---

### 19. **Large standard error for a coefficient:**

It means the model is **not confident** in the estimate. Coefficient might be **insignificant**.

---

### 20. **Heteroscedasticity in residual plots:**

You’ll see a **fan or cone shape** instead of a flat spread.
Important to fix it for better predictions.

---

### 21. **High R² but low Adjusted R²?**

It means you added **useless variables** that don’t help the model.

---

### 22. **Why scale variables in Multiple Regression?**

To ensure all variables are on the **same scale**.
This helps model learn better and faster, especially for regularized models like Ridge/Lasso.

---

## **Polynomial Regression**

### 23. **What is Polynomial Regression?**

It fits a **curved line** instead of a straight one by adding powers of X (like X², X³...).

---

### 24. **Difference from Linear Regression:**

* Linear: Straight line
* Polynomial: **Curved line** (can handle complex relationships)

---

### 25. **When is Polynomial Regression used?**

When the data has a **non-linear trend** that a straight line can’t fit well.

---

### 26. **General equation for Polynomial Regression:**

$$
Y = a + b_1X + b_2X^2 + b_3X^3 + \dots + b_nX^n
$$

---

### 27. **Can it be applied to multiple variables?**

Yes, you can apply polynomial terms to multiple variables (like X₁², X₁X₂, etc.)

---

### 28. **Limitations of Polynomial Regression:**

* **Overfitting** if degree is too high
* **Hard to interpret**
* Sensitive to outliers

---

### 29. **How to evaluate model fit for polynomial degree?**

* Use **R² and Adjusted R²**
* Use **Cross-validation**
* **Plot the curve** to see fit visually

---

### 30. **Why is visualization important?**

It helps you **see** if the curve fits the data well or is too complex or too simple.

---

### 31. **How is Polynomial Regression implemented in Python?**

```python
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

model = make_pipeline(PolynomialFeatures(degree=2), LinearRegression())
model.fit(X_train, y_train)
