Simple Linear Regression
1. What is Simple Linear Regression?
Simple Linear Regression is a statistical method used to model the relationship between a dependent variable (Y) and a single independent variable (X) using a linear equation. It assumes that changes in the independent variable directly influence the dependent variable.
2. What are the key assumptions of Simple Linear Regression?
Key assumptions include:
- Linearity: The relationship between X and Y is linear.
- Independence: Observations are independent.
- Homoscedasticity: Constant variance in residuals.
- Normality: Residuals follow a normal distribution.
3. What does the coefficient m represent in the equation Y = mx + c?
The coefficient m (slope) represents the rate of change in Y for every one-unit increase in X. A positive m indicates an increasing trend, while a negative m suggests a decreasing trend.
4. What does the intercept c represent in the equation Y = mx + c?
The intercept c is the value of Y when X = 0. It provides the baseline prediction in a regression model.
5. How do we calculate the slope m in Simple Linear Regression?
The slope m is calculated as:
[ m = \frac{\sum (X_i - \bar{X}) (Y_i - \bar{Y})}{\sum (X_i - \bar{X})^2} ]
This formula helps determine how X influences Y.
6. What is the purpose of the least squares method in Simple Linear Regression?
The least squares method minimizes the sum of squared errors between observed and predicted values, ensuring the best-fit line for the given data.
7. How is the coefficient of determination (R²) interpreted in Simple Linear Regression?
R² measures how well the regression model explains the variability of Y. A higher R² value means better prediction accuracy.

Multiple Linear Regression
8. What is Multiple Linear Regression?
Multiple Linear Regression extends Simple Linear Regression by modeling the relationship between a dependent variable and multiple independent variables, allowing for complex predictions.
9. What is the main difference between Simple and Multiple Linear Regression?
Simple Linear Regression has one independent variable, while Multiple Linear Regression uses two or more independent variables.
10. What are the key assumptions of Multiple Linear Regression?
Same as Simple Linear Regression, plus:
- No multicollinearity: Independent variables should not be highly correlated.
- No autocorrelation: Errors should not be correlated across observations.
11. What is heteroscedasticity, and how does it affect Multiple Linear Regression results?
Heteroscedasticity occurs when residual variance is not constant, leading to unreliable coefficient estimates. It can be identified using residual plots.
12. How can you improve a Multiple Linear Regression model with high multicollinearity?
Techniques include feature selection, Principal Component Analysis (PCA), and removing redundant variables.
13. What are common techniques for transforming categorical variables for use in regression models?
Methods include One-Hot Encoding, Label Encoding, and Binary Encoding.
14. What is the role of interaction terms in Multiple Linear Regression?
Interaction terms model how two independent variables jointly influence the dependent variable.
15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?
In Simple Regression, the intercept represents the baseline Y value when X = 0. In Multiple Regression, the intercept holds when all independent variables are set to zero.
16. What is the significance of the slope in regression analysis, and how does it affect predictions?
A higher slope suggests stronger influence on Y, guiding better predictive insights.
17. How does the intercept in a regression model provide context for the relationship between variables?
The intercept sets a reference point for Y when independent variables hold their base values.
18. What are the limitations of using R² as a sole measure of model performance?
High R² does not mean better predictions—it can be misleading if overfitting exists.
19. How would you interpret a large standard error for a regression coefficient?
A large standard error indicates high variability in coefficient estimation, reducing reliability.
20. How can heteroscedasticity be identified in residual plots, and why is it important to address it?
Heteroscedasticity appears as non-uniform spread in residual plots. Fixing it improves model accuracy.
21. What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?
It suggests that irrelevant variables are present in the model, reducing generalization ability.
22. Why is it important to scale variables in Multiple Linear Regression?
Scaling ensures variables have equal influence on predictions, especially in gradient-based methods.

Polynomial Regression
23. What is polynomial regression?
Polynomial regression extends linear regression by introducing higher-degree polynomial terms to better fit curved data trends.
24. How does polynomial regression differ from linear regression?
Linear regression fits straight lines, while polynomial regression fits curves to capture non-linear relationships.
25. When is polynomial regression used?
When data shows non-linear trends, polynomial regression provides better predictions than a straight line.
26. What is the general equation for polynomial regression?
[ Y = b_0 + b_1X + b_2Xn ]
where n is the polynomial degree.
27. Can polynomial regression be applied to multiple variables?
Yes, by adding polynomial terms for each independent variable.
28. What are the limitations of polynomial regression?
Excessively high-degree polynomials may cause overfitting, reducing generalization.
29. What methods can be used to evaluate model fit when selecting the degree of a polynomial?
Using cross-validation, R², Adjusted R², and Mean Squared Error (MSE) to compare polynomial degrees.
30. Why is visualization important in polynomial regression?
Visualization helps assess if the fitted polynomial curve aligns well with data trends.
31. How is polynomial regression implemented in Python?
Using Scikit-Learn:


In [1]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

X = [[1], [2], [3], [4], [5]]
y = [2, 4, 6, 8, 10]

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

model = LinearRegression()
model.fit(X_poly, y)

print("Predicted values:", model.predict(X_poly))

Predicted values: [ 2.  4.  6.  8. 10.]
