# Theory Questions

1. What is Simple Linear Regression?
    
    - Simple Linear Regression is a statistical method used to model the relationship between one independent variable (X) and one dependent variable (Y) using a straight line. It predicts Y from X using the equation `Y = mX + c`, where `m` is the slope and `c` is the intercept.

2. What are the key assumptions of Simple Linear Regression?

    - Linearity between X and Y

    - Homoscedasticity (constant variance of errors)

    - Independence of errors

    - Normality of residuals

    - No multicollinearity (only one X)



3. What does the coefficient m represent in the equation Y = mX + c?

    - The coefficient **m** is the **slope** of the line. It represents the change in Y for a one-unit increase in X. For example, if m = 2, then Y increases by 2 for each 1 unit increase in X.



4. What does the intercept c represent in the equation Y = mX + c?

    - The intercept **c** is the value of Y when X = 0. It shows where the regression line crosses the Y-axis.



5. How do we calculate the slope m in Simple Linear Regression?

    - Using the least squares method:  
        **m = Σ((X - X̄)(Y - Ȳ)) / Σ((X - X̄)²)**  
        This minimizes the squared differences between predicted and actual Y values.

6. What is the purpose of the least squares method in Simple Linear Regression?

    - It minimizes the sum of squared errors (residuals) between the actual and predicted Y values, helping to find the best-fitting line.

7. How is the coefficient of determination (R²) interpreted in Simple Linear Regression?
    
    - R² shows how well the model explains the variance in Y.  
        For example, R² = 0.85 means 85% of the variation in Y is explained by X.


8. What is Multiple Linear Regression?

    - Multiple Linear Regression involves more than one independent variable to predict a dependent variable. The equation is:  
        **Y = b₀ + b₁X₁ + b₂X₂ + ... + bnXn**

9. What is the main difference between Simple and Multiple Linear Regression?

    - Simple Linear Regression uses **one independent variable**, while Multiple Linear Regression uses **two or more** independent variables to predict the outcome.

10. What are the key assumptions of Multiple Linear Regression?

    - Linear relationship

    - No multicollinearity

    - Homoscedasticity

    - Independence of residuals

    - Normality of residuals



11. What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?

    - Heteroscedasticity means the variance of residuals is not constant across values of X. It makes confidence intervals unreliable and affects prediction accuracy.



12. How can you improve a Multiple Linear Regression model with high multicollinearity?

    - Remove highly correlated predictors

    - Use Principal Component Analysis (PCA)

    - Combine variables

    - Use regularization methods like Ridge or Lasso

13. What are some common techniques for transforming categorical variables for use in regression models?
    
    - **One-hot encoding**: creates binary columns for each category

    - **Label encoding**: assigns numeric values  
            For example, "Red", "Blue" becomes [1, 2]




14. What is the role of interaction terms in Multiple Linear Regression?

    - Interaction terms capture the combined effect of two variables on the dependent variable.  
        For example, adding `X1 * X2` to the model shows how the effect of X1 on Y changes with X2.



15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?

    - In Simple Linear Regression, the intercept is the value of Y when X = 0.  
        In Multiple Regression, it's the value of Y when all X variables = 0, which may not always be meaningful.



16. What is the significance of the slope in regression analysis, and how does it affect predictions?

    - The slope indicates the strength and direction of the relationship between X and Y.  
        A positive slope means Y increases with X, and a negative slope means Y decreases with X.



17. How does the intercept in a regression model provide context for the relationship between variables?

    - It gives the baseline value of the outcome variable when all inputs are zero, helping to understand the starting point or reference level.



18. What are the limitations of using R² as a sole measure of model performance?

    - Doesn’t show if predictions are biased

    - Increases with more variables, even if not useful

    - Doesn’t detect overfitting  
        Better to also use Adjusted R², RMSE, etc.



19. How would you interpret a large standard error for a regression coefficient?

    - A large standard error means high uncertainty in the coefficient estimate, suggesting the variable may not be a strong predictor of Y.

20. How can heteroscedasticity be identified in residual plots, and why is it important to address it?

    - In a residual plot, if the spread of residuals increases with X (funnel shape), it suggests heteroscedasticity.  
        It’s important to fix it as it can lead to unreliable predictions and biased test results.

21. What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?

    - It means irrelevant variables are included. Adjusted R² penalizes for extra predictors, so a drop indicates they don’t add value.



22. Why is it important to scale variables in Multiple Linear Regression?

    - Scaling ensures all features contribute equally, especially when using regularization techniques like Ridge or Lasso that are sensitive to scale.

23. What is polynomial regression?

    - Polynomial regression fits a nonlinear relationship using a polynomial equation.  
        Example: Y = b0 + b1X + b2X² + b3X³

24. How does polynomial regression differ from linear regression?

    - Linear regression fits a straight line, while polynomial regression fits a curved line by including higher-degree terms (X², X³, etc.).



25. When is polynomial regression used?

    - It’s used when the data shows a **nonlinear** trend, like curves or bends, which linear models can’t capture.

26. What is the general equation for polynomial regression?

    - Y = b0 + b1X + b2X² + b3X³ + ... + bnXⁿ  
        Where **n** is the degree of the polynomial.

27. Can polynomial regression be applied to multiple variables?

    - Yes, but it becomes complex. You can include interactions and powers of multiple features (e.g., X1², X1*X2, etc.).



28. What are the limitations of polynomial regression?

    - Prone to **overfitting** with high-degree polynomials

    - Harder to interpret

    - Can lead to **unstable** models for extrapolation



29. What methods can be used to evaluate model fit when selecting the degree of a polynomial?

    - Use **cross-validation**

    - Check **adjusted R²**

    - Look at **RMSE (Root Mean Squared Error)**

    - Use **learning curves**

30. Why is visualization important in polynomial regression?

    - It helps identify overfitting, underfitting, and the shape of the curve, making it easier to understand how well the model fits the data.



31. How is polynomial regression implemented in Python?

    - Using `PolynomialFeatures` and `LinearRegression` from `sklearn`:

        ```python
        from sklearn.preprocessing import PolynomialFeatures  
        from sklearn.linear_model import LinearRegression  
        poly = PolynomialFeatures(degree=2)  
        X_poly = poly.fit_transform(X)  
        model = LinearRegression().fit(X_poly, y)
        ```