1. What is Simple Linear Regression?
Simple Linear Regression is a statistical method used to model the relationship between two continuous variables: one independent variable (X) and one dependent variable (Y). The goal is to find a linear equation that best fits the data, typically in the form:
[ Y = mX + c ]
where m is the slope and c is the intercept.


2. What are the key assumptions of Simple Linear Regression?
- Linearity: The relationship between X and Y is linear.
- Independence: Observations are independent of each other.
- Homoscedasticity: The variance of residuals is constant across all levels of X.
- Normality of Errors: Residuals (errors) should be normally distributed.
- No Perfect Multicollinearity: Only one independent variable is present, avoiding collinearity issues.


3. What does the coefficient m represent in the equation (Y = mX + c)?
m (slope) represents the rate of change in Y for a one-unit increase in X. It quantifies how much the dependent variable changes for every unit increase in the independent variable.


4. What does the intercept c represent in the equation (Y = mX + c)?
c is the value of Y when X = 0. It represents the starting point or baseline of the relationship in the absence of any influence from X.


5. How do we calculate the slope m in Simple Linear Regression?
The slope is calculated using:
[ m = \frac{\sum (X_i - \bar{X}) (Y_i - \bar{Y})}{\sum (X_i - \bar{X})^2} ]
where X̄ and Ȳ are the mean values of X and Y, respectively.


7. What is the purpose of the least squares method in Simple Linear Regression?
The Least Squares Method minimizes the sum of squared residuals (differences between actual and predicted values) to find the best-fitting line.


8. How is the coefficient of determination (R²) interpreted in Simple Linear Regression?
( R^2 ) measures how well the model explains variability in Y.
- ( R^2 = 1 ) → Perfect fit
- ( R^2 = 0 ) → No relationship


8. What is Multiple Linear Regression?
Multiple Linear Regression extends Simple Linear Regression by using multiple independent variables to predict a dependent variable:
[ Y = b_0 + b_1X_1 + b_2X_2 + ... + b_nX_n ]


9. What is the main difference between Simple and Multiple Linear Regression?
- Simple Linear Regression: One independent variable.
- Multiple Linear Regression: Two or more independent variables.


10. What are the key assumptions of Multiple Linear Regression?
- Linearity
- Independence
- Homoscedasticity
- Normality of errors
- No multicollinearity


11. What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?
Heteroscedasticity occurs when residuals' variance is not constant, making predictions less reliable. It may indicate model misspecification.


12. How can you improve a Multiple Linear Regression model with high multicollinearity?
- Remove highly correlated predictors
- Use Principal Component Analysis (PCA)
- Apply Ridge or Lasso Regression


13. What are some common techniques for transforming categorical variables for use in regression models?
- One-Hot Encoding
- Label Encoding
- Dummy Variables


14. What is the role of interaction terms in Multiple Linear Regression?
Interaction terms capture effects when two independent variables influence the dependent variable jointly.


15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?
In Simple Linear Regression, the intercept represents Y when X = 0.
In Multiple Linear Regression, the intercept represents Y when all predictors are set to zero.


16. What is the significance of the slope in regression analysis, and how does it affect predictions?
The slope indicates how much Y changes when a predictor increases by one unit.


17. How does the intercept in a regression model provide context for the relationship between variables?
The intercept serves as a baseline value of Y when all independent variables are zero.


18. What are the limitations of using R² as a sole measure of model performance?
- It does not indicate causality.
- It cannot detect model overfitting.
- It does not tell if coefficients are statistically significant.


19. How would you interpret a large standard error for a regression coefficient?
It suggests uncertainty in the estimated coefficient, making predictions less reliable.


20. How can heteroscedasticity be identified in residual plots, and why is it important to address it?
Look for a funnel-shaped pattern in residual plots. Addressing it ensures consistent variance assumptions hold.


21. What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?
It indicates too many predictors, possibly causing overfitting.


22. Why is it important to scale variables in Multiple Linear Regression?
Scaling prevents large magnitude differences from dominating the regression.


23. What is polynomial regression?
Polynomial regression extends linear regression by modeling curved relationships:
[ Y = b_0 + b_1X + b_2Xn ]


24. How does polynomial regression differ from linear regression?
Linear regression models straight-line relationships.
Polynomial regression captures non-linear trends.


25. When is polynomial regression used?
When data exhibits a curved relationship.


26. What is the general equation for polynomial regression?
[ Y = b_0 + b_1X + b_2Xn ]

27. Can polynomial regression be applied to multiple variables?
Yes, it can include interaction terms for multiple predictors.

28. What are the limitations of polynomial regression?
- Overfitting risk
- Higher complexity
- Interpolation issues


29. What methods can be used to evaluate model fit when selecting the degree of a polynomial?
- R² score
- Cross-validation
- Residual plots


30. Why is visualization important in polynomial regression?
It helps detect patterns, overfitting, and relationships in the data.




In [1]:
# 31. How is polynomial regression implemented in Python?

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

X = [[1], [2], [3], [4], [5]]
y = [2, 3, 5, 7, 11]

model = make_pipeline(PolynomialFeatures(degree=2), LinearRegression())
model.fit(X, y)
print(model.predict([[6]]))  # Predict for X=6

[15.2]
