### 1. What is Simple Linear Regression?

Simple Linear Regression is a statistical method where we predict one dependent variable (Y) using one independent variable (X). It tries to find the best straight-line relationship between X and Y.

### 2. What are the key assumptions of Simple Linear Regression?

- The relationship between X and Y is linear.
- Residuals (errors) are normally distributed.
- Homoscedasticity: constant variance of residuals.
- No or minimal multicollinearity.
- Residuals are independent of each other.

### 3. What does the coefficient m represent in the equation Y=mX+c?

The coefficient 'm' is the **slope** of the line. It tells how much Y changes when X increases by 1.

### 4. What does the intercept c represent in the equation Y=mX+c?

'c' is the **intercept**. It's the value of Y when X is 0. Basically, it's where the line cuts the Y-axis.

### 5. How do we calculate the slope m in Simple Linear Regression?

m = (mean(XY) - mean(X)*mean(Y)) / (mean(X²) - mean(X)²)
It shows how Y changes with respect to X.

### 6. What is the purpose of the least squares method in Simple Linear Regression?

Least squares finds the best-fit line by minimizing the **sum of squared residuals** (difference between actual and predicted Y values).

### 7. How is the coefficient of determination (R²) interpreted in Simple Linear Regression?

R² shows how much of the variation in Y is explained by X. If R² = 0.85, it means 85% of the variation in Y is explained by the model.

### 8. What is Multiple Linear Regression?

Multiple Linear Regression predicts a dependent variable using **two or more independent variables**. It extends simple linear regression to handle more predictors.

### 9. What is the main difference between Simple and Multiple Linear Regression?

Simple Linear Regression uses **one** independent variable.
Multiple Linear Regression uses **two or more** independent variables.

### 10. What are the key assumptions of Multiple Linear Regression?

- Linearity
- No multicollinearity
- Homoscedasticity
- Independence of residuals
- Normally distributed residuals

### 11. What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?

Heteroscedasticity means that the variance of the residuals (errors) is not constant across all levels of the independent variables. It can lead to unreliable p-values and confidence intervals, making our model less trustworthy.

### 12. How can you improve a Multiple Linear Regression model with high multicollinearity?

To deal with multicollinearity:
- Remove or combine highly correlated variables.
- Use techniques like Principal Component Analysis (PCA).
- Use Ridge or Lasso Regression to reduce the impact.

### 13. What are some common techniques for transforming categorical variables for use in regression models?

We usually use:
- One-Hot Encoding
- Label Encoding
- Ordinal Encoding
These let regression models understand non-numeric data.

### 14. What is the role of interaction terms in Multiple Linear Regression?

Interaction terms show how two variables together affect the outcome. They help model situations where the effect of one variable depends on another.

### 15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?

In Simple Linear Regression, the intercept is the value of Y when X = 0.
In Multiple Linear Regression, it’s the value of Y when all independent variables = 0. But sometimes, this value doesn’t make real-world sense.

### 16. What is the significance of the slope in regression analysis, and how does it affect predictions?

The slope shows the effect of each independent variable on the dependent one. A higher or lower slope tells us how sensitive Y is to changes in X.

### 17. How does the intercept in a regression model provide context for the relationship between variables?

The intercept gives a starting point for predictions — the value of Y when all Xs are zero. It gives context to how the rest of the variables affect the output.

### 18. What are the limitations of using R² as a sole measure of model performance?

R² doesn’t tell us:
- If the model is the best fit.
- If the predictors are significant.
- How it performs on new data.
Also, it always increases with more variables, even if they’re not useful.

### 19. How would you interpret a large standard error for a regression coefficient?

A large standard error means there's more uncertainty in that coefficient. It might not be statistically significant, and its real effect could be weak or unclear.

### 20. How can heteroscedasticity be identified in residual plots, and why is it important to address it?

In residual plots, heteroscedasticity looks like a funnel shape — the spread of residuals increases or decreases. It’s important to fix it because it messes up confidence intervals and hypothesis tests.

### 21. What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?

It means the model might have too many unnecessary variables. R² increases with more variables, even if they aren’t useful. Adjusted R² corrects for this, so a low adjusted R² shows that many predictors might not actually improve the model.

### 22. Why is it important to scale variables in Multiple Linear Regression?

Scaling makes sure that all variables are on the same scale. This is important when:
- We’re comparing coefficients.
- Using regularization (like Lasso or Ridge).
It avoids giving more importance to variables with larger values.

### 23. What is polynomial regression?

Polynomial regression is a type of regression where the relationship between X and Y is modeled as an nth-degree polynomial instead of a straight line. It fits curves to the data.

### 24. How does polynomial regression differ from linear regression?

In linear regression, we fit a straight line.
In polynomial regression, we fit a curved line using powers of X (like X², X³, etc.).

### 25. When is polynomial regression used?

It’s used when the data shows a non-linear relationship. If the line doesn’t fit well and there's a curve, polynomial regression gives better results.

### 26. What is the general equation for polynomial regression?

Y = b₀ + b₁X + b₂X² + b₃X³ + ... + bₙXⁿ

### 27. Can polynomial regression be applied to multiple variables?

Yes, we can apply it to multiple variables — that’s called Multivariate Polynomial Regression. Each variable can have higher-order powers and interaction terms.

### 28. What are the limitations of polynomial regression?

- It can easily overfit if degree is too high.
- It’s sensitive to outliers.
- Doesn’t work well outside the range of training data (poor extrapolation).

### 29. What methods can be used to evaluate model fit when selecting the degree of a polynomial?

We can use:
- Adjusted R²
- Cross-validation scores
- AIC/BIC values
- Train-test split performance

### 30. Why is visualization important in polynomial regression?

Visualization helps us see the curve and check if it fits the data well. It makes it easier to understand whether the model is underfitting, overfitting, or just right.

### 31. How is polynomial regression implemented in Python?

```python
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

model = LinearRegression()
model.fit(X_poly, y)
```