### **Regression Assignment**

1. **What is Simple Linear Regression?**
   
   A statistical method used to model the relationship between a single independent variable and a dependent variable using a linear equation of the form $Y = mX + c$.


2. **What are the key assumptions of Simple Linear Regression?**

   * Linearity
   * Independence of errors
   * Homoscedasticity (constant variance of errors)
   * Normality of errors

3. **What does the coefficient m represent in the equation Y=mx+c?**
   It represents the **slope** of the regression line, indicating the change in Y for a unit change in X.

4. **What does the intercept c represent in the equation Y=mx+c?**
   It’s the **value of Y when X = 0**, essentially where the line crosses the Y-axis.

5. **How do we calculate the slope m in Simple Linear Regression?**

   $$
   m = \frac{n(\Sigma xy) - (\Sigma x)(\Sigma y)}{n(\Sigma x^2) - (\Sigma x)^2}
   $$

6. **What is the purpose of the least squares method in Simple Linear Regression?**

   To minimize the sum of squared residuals (differences between actual and predicted values).

7. **How is the coefficient of determination (R²) interpreted in Simple Linear Regression?**

   It measures the proportion of variance in the dependent variable explained by the independent variable.



8. **What is Multiple Linear Regression?**
   
   It models the relationship between two or more independent variables and a single dependent variable.

9. **What is the main difference between Simple and Multiple Linear Regression?**
   
   Simple uses one independent variable; multiple uses two or more.

10. **What are the key assumptions of Multiple Linear Regression?**

    * Linearity
    * Independence
    * Homoscedasticity
    * No multicollinearity
    * Normality of residuals

11. **What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?**
   
    It refers to unequal variance of residuals, which can bias standard errors and affect hypothesis testing.

12. **How can you improve a Multiple Linear Regression model with high multicollinearity?**

    * Remove or combine correlated predictors
    * Use dimensionality reduction (e.g., PCA)
    * Use regularization (Ridge, Lasso)

13. **What are some common techniques for transforming categorical variables for use in regression models?**

    * One-hot encoding
    * Label encoding
    * Ordinal encoding

14. **What is the role of interaction terms in Multiple Linear Regression?**
    
    They capture combined effects of two or more predictors on the dependent variable.

15. **How can the interpretation of intercept differ between Simple and Multiple Linear Regression?**
    
    In MLR, the intercept represents the predicted value of Y when all Xs are zero, which may not be realistic or interpretable.

16. **What is the significance of the slope in regression analysis, and how does it affect predictions?**
    
    It shows how much the dependent variable changes with a unit change in the independent variable.

17. **How does the intercept in a regression model provide context for the relationship between variables?**
    
    It gives the baseline value of the dependent variable when all predictors are zero.

18. **What are the limitations of using R² as a sole measure of model performance?**

    * It doesn’t account for overfitting
    * Doesn’t indicate causality
    * Can be misleading with many predictors

19. **How would you interpret a large standard error for a regression coefficient?**
    
    The coefficient estimate is unstable and may not be statistically significant.

20. **How can heteroscedasticity be identified in residual plots, and why is it important to address it?**
   
    Uneven spread or funnel shape in residual vs. fitted plots. It affects validity of p-values and confidence intervals.

21. **What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?**
    
    The model may be overfitting with irrelevant predictors.

22. **Why is it important to scale variables in Multiple Linear Regression?**
    
    Scaling ensures all variables contribute equally, especially in regularized models (like Ridge or Lasso).


23. **What is polynomial regression?**
    
    A form of regression where the relationship between the independent variable and dependent variable is modeled as an nth degree polynomial.

24. **How does polynomial regression differ from linear regression?**
    
    Linear regression fits a straight line; polynomial regression fits a curved line.

25. **When is polynomial regression used?**
    
    When the data shows a nonlinear relationship between variables.

26. **What is the general equation for polynomial regression?**

    
  $$
    Y = b_0 + b_1X + b_2X^2 + ... + b_nX^n
    $$

27. **Can polynomial regression be applied to multiple variables?**
    
    Yes, multivariate polynomial regression includes terms like $X_1^2, X_1X_2, X_2^2$, etc.

28. **What are the limitations of polynomial regression?**

    * Overfitting with high degrees
    * Poor extrapolation
    * Sensitive to outliers

29. **What methods can be used to evaluate model fit when selecting the degree of a polynomial?**

    * Cross-validation
    * Adjusted R²
    * AIC/BIC
    * Visualization of fit and residuals

30. **Why is visualization important in polynomial regression?**
    
    To verify how well the curve fits the data and to identify overfitting or underfitting.

31. **How is polynomial regression implemented in Python?**

```python
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

model = make_pipeline(PolynomialFeatures(degree=2), LinearRegression())
model.fit(X_train, y_train)
predictions = model.predict(X_test)
```