1. What is Simple Linear Regression?

  - Simple Linear Regression is a statistical method used to model the relationship between a dependent variable (Y) and a single independent variable (X) using a straight line. The equation is:
  - Y = mX + c, where m is the slope and c is the intercept.

2. What are the key assumptions of Simple Linear Regression?

  - Linearity: The relationship between X and Y is linear.

  - Independence: Observations are independent of each other.

  - Homoscedasticity: Constant variance of residuals.

  - Normality: Residuals are normally distributed.

3. What does the coefficient m represent in the equation Y = mX + c?

  - m is the slope of the line. It represents the change in the dependent variable (Y) for a one-unit increase in the independent variable (X).

4. What does the intercept c represent in the equation Y = mX + c?

 - c is the intercept. It is the value of Y when X = 0. It shows where the regression line crosses the Y-axis.

5. How do we calculate the slope m in Simple Linear Regression?

 - The slope m is calculated using the formula:
 - m = Σ[(X - X̄)(Y - Ȳ)] / Σ[(X - X̄)²]

6. What is the purpose of the least squares method in Simple Linear Regression?

 - The least squares method minimizes the sum of the squares of the residuals (differences between observed and predicted values) to find the best-fitting line.

7. How is the coefficient of determination (R²) interpreted in Simple Linear Regression?

 - R² shows how well the model explains the variability in the dependent variable.

 - R² = 1 means perfect fit

 - R² = 0 means no explanatory power

8. What is Multiple Linear Regression?

 - Multiple Linear Regression models the relationship between a dependent variable and two or more independent variables. The equation is:
   - Y = b₀ + b₁X₁ + b₂X₂ + ... + bₙXₙ

9. What is the main difference between Simple and Multiple Linear Regression?

 - Simple Linear Regression has one independent variable.

 - Multiple Linear Regression has two or more independent variables.

10. What are the key assumptions of Multiple Linear Regression?

 - Linearity

 - Independence

 - Homoscedasticity

 - Normality of residuals

 - No multicollinearity

11. What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?

 - Heteroscedasticity refers to non-constant variance of residuals. It can lead to inefficient estimates and unreliable hypothesis tests.

12. How can you improve a Multiple Linear Regression model with high multicollinearity?

 - Remove or combine correlated variables

 - Use Principal Component Analysis (PCA)

 - Use regularization techniques like Ridge or Lasso

13. What are some common techniques for transforming categorical variables for use in regression models?

 - One-Hot Encoding

 - Label Encoding

 - Ordinal Encoding

14. What is the role of interaction terms in Multiple Linear Regression?

 - Interaction terms allow the effect of one independent variable to depend on the level of another. They help model more complex relationships.

15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?

 - In Simple Linear Regression, the intercept is the value of Y when X = 0.
In Multiple Linear Regression, it’s the predicted Y when all X variables = 0, which might not always be meaningful.

16. What is the significance of the slope in regression analysis, and how does it affect predictions?

 - The slope shows how much Y changes for each one-unit increase in X. A positive slope means a positive relationship, and vice versa.

17. What are the limitations of using R² as a sole measure of model performance?

 - It always increases with more variables (even irrelevant ones)

 - It doesn’t detect overfitting

 - It doesn’t tell if predictors are significant

18. How would you interpret a large standard error for a regression coefficient?

 - A large standard error means the coefficient estimate is unstable and likely not significantly different from zero.

19. What is polynomial regression?

 - Polynomial regression is a form of regression where the relationship between the independent and dependent variable is modeled as an nth-degree polynomial.

20. When is polynomial regression used?

 - It’s used when the relationship between variables is non-linear, but you still want to use a linear model with transformed features.

21. How does the intercept in a regression model provide context for the relationship between variables?

 - The intercept provides the starting value of Y when all predictors are 0. It helps interpret where the line (or curve) begins.

22. How can heteroscedasticity be identified in residual plots, and why is it important to address it?

 - In residual plots, heteroscedasticity appears as a funnel shape. It’s important to address it because it violates regression assumptions, affecting accuracy.

23. What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?

 - It means that irrelevant variables are included. Adjusted R² penalizes unnecessary predictors, so the model may be overfitted.

24. Why is it important to scale variables in Multiple Linear Regression?

 - Scaling ensures all features contribute equally and helps improve performance, especially with regularization techniques.

25. How does polynomial regression differ from linear regression?

 - Linear regression models straight-line relationships.
 - Polynomial regression models curved (non-linear) relationships using powers of X.

26. What is the general equation for polynomial regression?

 - Y = b₀ + b₁X + b₂X² + b₃X³ + ... + bₙXⁿ

27. Can polynomial regression be applied to multiple variables?

 - Yes, it can. This is known as multivariate polynomial regression, where multiple features can have powers and interaction terms.

28. What are the limitations of polynomial regression?

 - Overfitting with high degrees

 - Sensitive to outliers

 - Hard to interpret for higher-order models

29. What methods can be used to evaluate model fit when selecting the degree of a polynomial?

 - Cross-validation

 - Adjusted R²

 - Mean Squared Error (MSE)

 - Visualization of residuals

30. Why is visualization important in polynomial regression?

 - Visualization helps understand how well the model captures the pattern, detect overfitting, and interpret the curve's shape.

31. How is polynomial regression implemented in Python?

 - Using PolynomialFeatures from sklearn.preprocessing, then fitting a LinearRegression model on the transformed features.


In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
import numpy as np

X = np.array([[1], [2], [3], [4]])
y = np.array([1, 4, 9, 16])

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

model = LinearRegression()
model.fit(X_poly, y)
print("Coefficients:", model.coef_)
