**1- What is Simple Linear Regression?**

A) Simple Linear Regression is a statistical method used to model the relationship between a single independent variable (X) and a dependent variable (Y) by fitting a straight line in the form **Y = mX + c**. It helps in predicting the value of Y based on X.

**2- What are the key assumptions of Simple Linear Regression?**

A)

* Linearity: Relationship between X and Y is linear.
* Independence: Observations are independent.
* Homoscedasticity: Constant variance of residuals.
* Normality: Residuals are normally distributed.
* No multicollinearity (trivial here since only one predictor).

**3- What does the coefficient m represent in the equation Y = mX + c?**

A) The coefficient **m** represents the **slope** of the regression line, i.e., the change in Y for a one-unit change in X.

**4- What does the intercept c represent in the equation Y = mX + c?**

A) The intercept **c** is the value of Y when X = 0. It represents the starting point of the regression line.

**5- How do we calculate the slope m in Simple Linear Regression?**

A) The slope is calculated as:

$$
m = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sum (X_i - \bar{X})^2}
$$

**6- What is the purpose of the least squares method in Simple Linear Regression?**

A) The least squares method minimizes the sum of squared residuals (errors) between observed and predicted values, ensuring the best-fit line.

**7- How is the coefficient of determination (R²) interpreted in Simple Linear Regression?**

A) R² measures the proportion of variance in the dependent variable explained by the independent variable. A higher R² means better model fit.

**8- What is Multiple Linear Regression?**

A) Multiple Linear Regression models the relationship between a dependent variable (Y) and two or more independent variables (X₁, X₂, …, Xn).

**9- What is the main difference between Simple and Multiple Linear Regression?**

A) Simple Linear Regression has only one independent variable, while Multiple Linear Regression has two or more.

**10- What are the key assumptions of Multiple Linear Regression?**

A)

* Linearity between predictors and outcome.
* Independence of observations.
* Homoscedasticity of residuals.
* Normal distribution of residuals.
* No or low multicollinearity.

**11- What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?**

A) Heteroscedasticity occurs when residuals have non-constant variance. It leads to inefficient estimates and invalid standard errors, affecting hypothesis testing.

**12- How can you improve a Multiple Linear Regression model with high multicollinearity?**

A)

* Remove highly correlated variables.
* Use dimensionality reduction (e.g., PCA).
* Apply regularization (Ridge or Lasso regression).

**13- What are some common techniques for transforming categorical variables for use in regression models?**

A)

* One-hot encoding.
* Label encoding.
* Dummy variables.

**14- What is the role of interaction terms in Multiple Linear Regression?**

A) Interaction terms capture the combined effect of two variables on the outcome when their joint effect differs from the sum of individual effects.

**15- How can the interpretation of intercept differ between Simple and Multiple Linear Regression?**

A) In Simple Linear Regression, the intercept represents Y when X=0. In Multiple Linear Regression, it represents Y when all independent variables are zero, which may or may not be meaningful.

**16- What is the significance of the slope in regression analysis, and how does it affect predictions?**

A) The slope indicates how much the dependent variable changes with a one-unit increase in the independent variable, holding others constant. It directly affects predictions.

**17- How does the intercept in a regression model provide context for the relationship between variables?**

A) The intercept provides the baseline value of the dependent variable when predictors are zero, serving as a reference point for interpretation.

**18- What are the limitations of using R² as a sole measure of model performance?**

A)

* R² always increases with more variables, even if they add no predictive power.
* It does not indicate overfitting.
* It does not confirm causation.

**19- How would you interpret a large standard error for a regression coefficient?**

A) A large standard error suggests the coefficient estimate is unstable and not significantly different from zero, lowering confidence in the predictor’s effect.

**20- How can heteroscedasticity be identified in residual plots, and why is it important to address it?**

A) Heteroscedasticity is identified when residual plots show a funnel or pattern rather than random scatter. Addressing it is important to ensure valid statistical inference.

**21- What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?**

A) It means that additional predictors are not contributing useful information and may just be overfitting the model.

**22- Why is it important to scale variables in Multiple Linear Regression?**

A) Scaling helps when predictors have different units or magnitudes, ensuring coefficients are comparable and avoiding numerical instability.

**23- What is polynomial regression?**

A) Polynomial regression is a type of regression where the relationship between X and Y is modeled as an nth-degree polynomial.

**24- How does polynomial regression differ from linear regression?**

A) Linear regression models a straight-line relationship, while polynomial regression models curved relationships by adding higher-order terms.

**25- When is polynomial regression used?**

A) It is used when data shows a nonlinear relationship between X and Y that cannot be captured by a straight line.

**26- What is the general equation for polynomial regression?**

A)

$$
Y = b_0 + b_1X + b_2X^2 + b_3X^3 + \dots + b_nX^n + \epsilon
$$

**27- Can polynomial regression be applied to multiple variables?**

A) Yes, polynomial regression can be extended to multiple predictors with polynomial combinations of variables.

**28- What are the limitations of polynomial regression?**

A)

* Overfitting with high-degree polynomials.
* Poor extrapolation beyond observed data.
* Increased computational complexity.

**29- What methods can be used to evaluate model fit when selecting the degree of a polynomial?**

A)

* Cross-validation.
* Adjusted R².
* AIC/BIC criteria.
* Residual analysis.

**30- Why is visualization important in polynomial regression?**

A) Visualization helps identify nonlinear trends, assess curve fitting, and avoid overfitting by visually inspecting how well the polynomial fits the data.

**31- How is polynomial regression implemented in Python?**

A) Polynomial regression can be implemented using:

```python
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

model = make_pipeline(PolynomialFeatures(degree=3), LinearRegression())
model.fit(X, y)
y_pred = model.predict(X)
```

