# Regression

1. What is Simple Linear Regression?
  - A technique that models the relationship between a dependent variable (Y) and one independent variable (X) using a straight line:
  - Y = mX + c


2. What are the key assumptions of Simple Linear Regression?
  Key Assumptions of Simple Linear Regression:

  - Linearity: The relationship between X and Y is linear
  - Independence: Observations are independent of each other
  - Homoscedasticity: Constant variance of residuals across all levels of X
  - Normality: Residuals are normally distributed
  - No outliers: Extreme values don't unduly influence the model

3. What does the coefficient m represent in the equation Y = mX + c?
  -    The coefficient m is the slope of the regression line. It represents the average change in Y for each one-unit increase in X.

4. What does the intercept c represent in the equation Y = mX + c?
  - The intercept c is the value of Y when X equals zero. It's where the regression line crosses the Y-axis.  

5. How do we calculate the slope m in Simple Linear Regression?
  - The slope is calculated using: m = Σ[(Xi - X̄)(Yi - Ȳ)] / Σ[(Xi - X̄)²]
Where X̄ and Ȳ are the means of X and Y respectively.

6. What is the purpose of the least squares method in Simple Linear Regression?
  - The least squares method finds the best-fitting line by minimizing the sum of squared residuals (differences between actual and predicted values). This ensures the line is as close as possible to all data points collectively.

7. How is the coefficient of determination (R²) interpreted in Simple Linear Regression?
  -
R² represents the proportion of variance in the dependent variable explained by the independent variable.

8. What is Multiple Linear Regression?
  - A regression model with two or more independent variables predicting the dependent variable.

9. What is the main difference between Simple and Multiple Linear Regression?
  -   Simple regression uses one predictor variable, while multiple regression uses two or more predictor variables simultaneously. Multiple regression can capture more complex relationships and typically provides better predictions.

10. What are the key assumptions of Multiple Linear Regression?
  - Linearity

  - Independence

  - Homoscedasticity

  - Normality of errors

  - No multicollinearity

11. What is heteroscedasticity, and how does it affect the results?
  - Heteroscedasticity occurs when the variance of residuals changes across different levels of independent variables. Effects include:

  - Standard errors become unreliable
  - Confidence intervals and hypothesis tests become invalid
  - Coefficients remain unbiased but inefficient   

12. How can you improve a Multiple Linear Regression model with high multicollinearity?
  - Improving models with high multicollinearity:

    - Remove highly correlated variables
    - Use Ridge or Lasso regression
    - Apply Principal Component Analysis (PCA)
    -Combine correlated variables into composite scores
    - Collect more data to reduce correlation  

13.  What are some common techniques for transforming categorical variables for use in regression model?
  - One-hot encoding

  - Label encoding

  - Ordinal encoding (if categories have order)    

14.  What is the role of interaction terms in Multiple Linear Regression?

  - Interaction terms capture how the effect of one variable depends on the level of another variable. For example, X₁ × X₂ shows whether the impact of X₁ on Y changes based on the value of X₂.

15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?
  -   Intercept interpretation differences:

  - Simple regression: Y-value when X = 0
  - Multiple regression: Y-value when all independent variables equal zero (often not meaningful in practice)

16.  What is the significance of the slope in regression analysis, and how does it affect predictions?
  - The slope quantifies the strength and direction of relationships between variables. It's crucial for:

  - Understanding variable importance
Making predictions
  - Determining effect sizes
  -Comparing impact across different variables

  -In predictions, slopes determine how much the predicted value changes when input variables change, making them essential for scenario analysis and decision-making.

17.  How does the intercept in a regression model provide context for the relationship between variables?
  - In a regression model, the intercept provides context by representing the predicted value of the dependent variable when all independent variables are zero. It essentially tells you where the regression line crosses the y-axis.

18. What are the limitations of using R² as a sole measure of model performance?
  - Doesn't penalize for overfitting

  - Doesn't show if relationships are statistically significant

  - Can be high even for a poor model


19. How would you interpret a large standard error for a regression coefficient?
  - A large standard error for a regression coefficient indicates high uncertainty in the estimate. This suggests the coefficient is imprecisely estimated, possibly due to insufficient data, multicollinearity, or high variability in the data.


20.  How can heteroscedasticity be identified in residual plots, and why is it important to address it?
  - n residual plots, heteroscedasticity appears as a funnel or cone shape where residuals spread out (or contract) as fitted values increase. You might see patterns like increasing variance with larger predictions. It's crucial to address because it violates the constant variance assumption, leading to inefficient estimates and incorrect standard errors, which affects hypothesis testing and confidence intervals.

21.  What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?
  - This indicates the model includes too many variables relative to the sample size, with many variables likely being irrelevant. The adjusted R² penalizes for the number of predictors, so this pattern suggests overfitting where the model fits the training data well but may not generalize effectively.

22. Why is it important to scale variables in Multiple Linear Regression?
  -caling ensures all variables contribute proportionally to the model, prevents variables with larger scales from dominating, stabilizes numerical computations, and makes coefficient interpretation more meaningful when variables have different units or ranges.

23.  What is polynomial regression?
  - Polynomial regression is a type of regression analysis where the relationship between the independent variable (x) and the dependent variable (y) is modeled as an nth degree polynomial. It extends linear regression to capture non-linear relationships by fitting a curve to the data instead of a straight line

24.  How does polynomial regression differ from linear regression?
  - hile linear regression assumes straight-line relationships, polynomial regression can model curved relationships. However, it's still "linear" in the coefficients, meaning the parameters appear linearly in the equation.

25. When is polynomial regression used?
  - Apply polynomial regression when scatter plots reveal curved patterns, residual analysis of linear models shows systematic patterns, or domain knowledge suggests non-linear relationships exist


26.  What is the general equation for polynomial regression?
  - General Equation: For a single variable: y = β₀ + β₁x + β₂x² + β₃x³ + ... + βₙxⁿ + ε

27.  Can polynomial regression be applied to multiple variables?
  - Yes, polynomial regression can handle multiple variables by including polynomial terms for each variable and their interactions, though complexity increases rapidly with the number of variables and degree.


28.  What are the limitations of polynomial regression?
  - Polynomial regression is prone to overfitting with high degrees, can be unstable at boundaries, requires careful degree selection, may not extrapolate well beyond the training data range, and becomes computationally expensive with multiple variables.

29.  What methods can be used to evaluate model fit when selecting the degree of a polynomial?
  - R² / Adjusted R²

  - Cross-validation

  - AIC/BIC

  - Residual analysis

30.   Why is visualization important in polynomial regression?
  - Plots help identify the appropriate degree, reveal overfitting through comparison of training vs. validation curves, show model behavior at boundaries, and communicate results effectively to stakeholders.

31. How is polynomial regression implemented in Python?
  -

```

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

model = make_pipeline(PolynomialFeatures(degree=2), LinearRegression())
model.fit(X, y)
```


















