# Regression

1. What is Simple Linear Regression?
- Simple Linear Regression is a method used to model the relationship between a dependent variable Y and a single independent variable X using the equation:
  - Y=mX+c
- Here, m is the slope and c is the intercept.

2. What are the key assumptions of Simple Linear Regression?
- Linearity: The relationship between X and Y is linear.
- Independence: Observations are independent of each other.
- Homoscedasticity: Constant variance of residuals across all levels of X.
- Normality of Residuals: Residuals are normally distributed.
- No significant outliers.

3. What does the coefficient m represent in the equation Y = mX + c?
- m is the slope of the line. It shows the amount by which Y changes for each one-unit increase in X.

4. What does the intercept c represent in the equation Y = mX + c?
- c is the value of Y when X = 0. It represents the starting point of the line on the Y-axis.

5. How do we calculate the slope m in Simple Linear Regression?
- m= n∑XY−∑X∑Y/n∑X
  - Where n is the number of observations.

6. What is the purpose of the least squares method in Simple Linear Regression?
- To minimize the sum of the squared differences between actual and predicted values of Y (i.e., residuals).

7. How is the coefficient of determination (R²) interpreted in Simple Linear Regression?
- R² represents the proportion of the variance in Y explained by X.

- R² = 1 → perfect fit

- R² = 0 → no linear relationship

8. What is Multiple Linear Regression?
It is an extension of Simple Linear Regression involving two or more independent variables:

𝑌
=
𝑏
0
+
𝑏
1
𝑋
1
+
𝑏
2
𝑋
2
+
⋯
+
𝑏
𝑛
𝑋
𝑛
Y=b
0
​
 +b
1
​
 X
1
​
 +b
2
​
 X
2
​
 +⋯+b
n
​
 X
n
​

9. What is the main difference between Simple and Multiple Linear Regression?
- Simple: One independent variable

- Multiple: Two or more independent variables

10. What are the key assumptions of Multiple Linear Regression?
- Linearity

- Independence

- Homoscedasticity

- Normality of residuals

- No multicollinearity between predictors

11. What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?
- Heteroscedasticity occurs when residuals have unequal variance across levels of an independent variable. It makes standard errors unreliable, affecting confidence intervals and hypothesis tests.

12. How can you improve a Multiple Linear Regression model with high multicollinearity?
- Remove or combine correlated predictors

- Use Principal Component Analysis (PCA)

- Apply regularization (e.g., Ridge, Lasso)

13. What are some common techniques for transforming categorical variables for use in regression models?
- One-Hot Encoding

- Label Encoding

- Ordinal Encoding

- Binary Encoding

14. What is the role of interaction terms in Multiple Linear Regression?
- They help model situations where the effect of one variable on Y depends on the level of another variable (e.g.,
𝑋
1
×
𝑋
2
X
1
​
 ×X
2
​
 ).

15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?
- Simple: Intercept is the value of Y when X = 0

- Multiple: Intercept is Y when all predictors = 0, which may not be meaningful

16. What is the significance of the slope in regression analysis, and how does it affect predictions?
- The slope indicates how much the dependent variable changes with a one-unit increase in the independent variable. It’s crucial for prediction and interpretation.

17. How does the intercept in a regression model provide context for the relationship between variables?
- It gives the baseline value of the dependent variable when all independent variables are zero.

18. What are the limitations of using R² as a sole measure of model performance?
- Doesn’t indicate if the model is good

- Can be artificially high with many variables

- Doesn’t capture overfitting

- Not useful for comparing models with different numbers of predictors

19. How would you interpret a large standard error for a regression coefficient?
- It means the coefficient estimate is uncertain and possibly not statistically significant.

20. How can heteroscedasticity be identified in residual plots, and why is it important to address it?
- It appears as a funnel or pattern in residual vs. fitted value plots. It’s important to fix it because it leads to invalid inference (e.g., biased standard errors).

21. What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?
- Some predictors may be irrelevant or overfitting the data. Adjusted R² accounts for the number of predictors, penalizing complexity.

22. Why is it important to scale variables in Multiple Linear Regression?
- To ensure equal weighting, improve model convergence, and interpret coefficients consistently, especially when regularization is used.

23. What is polynomial regression?
- A form of regression where the relationship between the independent and dependent variable is modeled as a polynomial equation.

24. How does polynomial regression differ from linear regression?
- Linear: Models a straight-line relationship

- Polynomial: Models curved relationships using powers of the independent variable

25. When is polynomial regression used?
- When the data shows a nonlinear pattern that a straight line cannot capture.

26. What is the general equation for polynomial regression?
𝑌
=
𝑏
0
+
𝑏
1
𝑋
+
𝑏
2
𝑋
2
+
⋯
+
𝑏
𝑛
𝑋
𝑛
Y=b
0
​
 +b
1
​
 X+b
2
​
 X
2
 +⋯+b
n
​
 X
n

27. Can polynomial regression be applied to multiple variables?
- Yes, it can include polynomial terms for each variable and their interactions.

28. What are the limitations of polynomial regression?
- Risk of overfitting

- Poor extrapolation

- Sensitive to outliers

- Complexity increases rapidly with degree

29. What methods can be used to evaluate model fit when selecting the degree of a polynomial?
- Cross-validation

- Adjusted R²

- AIC/BIC

- Residual analysis

- Visualization

30. Why is visualization important in polynomial regression?
- It helps you assess how well the model fits the data and detect overfitting or underfitting.



31. How is polynomial regression implemented in Python?

In [1]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

# Example: 2nd-degree polynomial regression
model = make_pipeline(PolynomialFeatures(degree=2), LinearRegression())
model.fit(X, y)
predictions = model.predict(X)


NameError: name 'X' is not defined