## Polynomial and Linear Regression Q&A
This notebook contains a collection of questions and answers related to regression analysis.

### What is Simple Linear Regression?

Simple Linear Regression is a statistical method used to model the relationship between a dependent variable and a single independent variable using a linear equation.

### What are the key assumptions of Simple Linear Regression?

The key assumptions include linearity, independence of errors, homoscedasticity, normality of residuals, and no multicollinearity.

### What does the coefficient m represent in the equation Y=mx+c?

The coefficient 'm' represents the slope, indicating the change in the dependent variable for a one-unit change in the independent variable.

### What does the intercept c represent in the equation Y=mx+c?

The intercept 'c' represents the value of the dependent variable when the independent variable is zero.

### How do we calculate the slope m in Simple Linear Regression?

The slope is calculated using the formula: m = (Σ(xi - x̄)(yi - ȳ)) / (Σ(xi - x̄)²).

### What is the purpose of the least squares method in Simple Linear Regression?

The least squares method minimizes the sum of the squared differences between observed and predicted values to find the best-fitting line.

### How is the coefficient of determination (R²) interpreted in Simple Linear Regression?

R² measures the proportion of variance in the dependent variable explained by the independent variable. A higher R² indicates a better fit.

### What is Multiple Linear Regression?

Multiple Linear Regression is an extension of Simple Linear Regression where multiple independent variables predict a dependent variable.

### What is the main difference between Simple and Multiple Linear Regression?

Simple Linear Regression uses one independent variable, whereas Multiple Linear Regression uses two or more independent variables.

### What are the key assumptions of Multiple Linear Regression?

The assumptions include linearity, independence, homoscedasticity, normality of residuals, and no multicollinearity.

### What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?

Heteroscedasticity refers to non-constant variance in residuals, leading to unreliable standard errors and inefficient estimators.

### How can you improve a Multiple Linear Regression model with high multicollinearity?

Techniques include removing correlated variables, using Principal Component Analysis (PCA), or applying Ridge/Lasso regression.

### What are some common techniques for transforming categorical variables for use in regression models?

Techniques include one-hot encoding, label encoding, and ordinal encoding.

### What is the role of interaction terms in Multiple Linear Regression?

Interaction terms capture the combined effect of two or more independent variables on the dependent variable.

### How can the interpretation of intercept differ between Simple and Multiple Linear Regression?

In Simple Linear Regression, the intercept represents the predicted value when x=0. In Multiple Linear Regression, it represents the predicted value when all independent variables are zero.

### What is the significance of the slope in regression analysis, and how does it affect predictions?

The slope indicates the strength and direction of the relationship between the independent and dependent variables.

### How does the intercept in a regression model provide context for the relationship between variables?

The intercept provides a baseline value for the dependent variable when independent variables are zero.

### What are the limitations of using R² as a sole measure of model performance?

R² does not indicate causality, is sensitive to outliers, and does not assess model complexity or overfitting.

### How would you interpret a large standard error for a regression coefficient?

A large standard error suggests variability in coefficient estimates, implying low reliability in predictions.

### How can heteroscedasticity be identified in residual plots, and why is it important to address it?

Heteroscedasticity is identified when residuals exhibit a funnel-like pattern in plots. Addressing it ensures valid statistical inferences.

### What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?

It indicates that additional independent variables do not significantly contribute to explaining the variance in the dependent variable.

### Why is it important to scale variables in Multiple Linear Regression?

Scaling ensures that variables with different units or ranges contribute equally, preventing biased coefficient estimates.

### What is polynomial regression?

Polynomial regression is a type of regression analysis that models the relationship between the dependent and independent variable as an nth-degree polynomial.

### How does polynomial regression differ from linear regression?

Polynomial regression captures nonlinear relationships by introducing polynomial terms of the independent variable.

### When is polynomial regression used?

Polynomial regression is used when the relationship between variables is nonlinear but can be approximated using polynomial functions.

### What is the general equation for polynomial regression?

The general equation is Y = b0 + b1X + b2X² + ... + bnX^n.

### Can polynomial regression be applied to multiple variables?

Yes, polynomial regression can be extended to multiple variables, creating interaction and higher-order terms.

### What are the limitations of polynomial regression?

Limitations include overfitting, high sensitivity to outliers, and difficulty in interpretability.

### What methods can be used to evaluate model fit when selecting the degree of a polynomial?

Methods include cross-validation, adjusted R², and Akaike Information Criterion (AIC).

### Why is visualization important in polynomial regression?

Visualization helps in understanding the fit of the model and detecting underfitting or overfitting.

### How is polynomial regression implemented in Python?

Polynomial regression is implemented using `PolynomialFeatures` from scikit-learn, followed by linear regression on the transformed features.