## Regression

### Question 1 : What is Simple Linear Regression?

Answer : Simple Linear Regression is a statistical method used to model the relationship between one independent variable (X) and one dependent variable (Y) by fitting a linear equation to observed data. The equation is of the form:

Y=mX+c
where:

Y is the dependent variable.
X is the independent variable.
m is the slope (coefficient).
c is the intercept.

### Question 2 : What are the key assumptions of Simple Linear Regression?

Answer : The key assumptions are:

Linearity: The relationship between X and Y is linear.

Independence: The residuals (errors) are independent of each other.

Homoscedasticity: The residuals have constant variance across all levels of 
X

Normality: The residuals are normally distributed (especially important for small sample sizes).

No Multicollinearity: Only one independent variable is used, so this is automatically satisfied.

### Question 3 : What does the coefficient m represent in the equation Y=mX+c ?

Answer : The coefficient 
m represents the slope of the regression line. It indicates the change in the dependent variable 
Y for a one-unit increase in the independent variable X.

###  Question 4 : What does the intercept c represent in the equation Y=mX+c?

Answer : The intercept c represents the value of Y when 
X=0. It is the point where the regression line crosses the Y-axis

###  Question 5 : How do we calculate the slope m in Simple Linear Regression?

Answer : The slope m is calculated using the formula:
m=∑(Xi−X_)(Yi− Y_) / ∑(Xi−X)^2
    
where:
Xi and Yi are individual data points.
X_ and Y_ are the means of X and Y respectively.

### Question 6 : What is the purpose of the least squares method in Simple Linear Regression?

Answer : The least squares method is used to find the best-fitting line by minimizing the sum of the squared differences (residuals) between the observed values and the values predicted by the model.

### Question 7 : How is the coefficient of determination (R²) interpreted in Simple Linear Regression?

Answer : R2 measures the proportion of variance in the dependent variable Y that is explained by the independent variable X. It ranges from 0 to 1, where:

R2=1: The model explains all the variability in Y.

R2=0: The model explains none of the variability in Y.

### Question 8 : What is Multiple Linear Regression?

Answer : Multiple Linear Regression extends Simple Linear Regression by modeling the relationship between multiple independent variables and one dependent variable. The equation is:

Y=β0+β1X1+β2X2+⋯+βnXn+ϵ

### Question 9 : What is the main difference between Simple and Multiple Linear Regression?

Answer : 

Simple Linear Regression: Uses one independent variable.

Multiple Linear Regression: Uses two or more independent variables.

### Question 10 : What are the key assumptions of Multiple Linear Regression?

Answer : The assumptions are similar to Simple Linear Regression but extended to multiple variables:

Linearity.

Independence of residuals.

Homoscedasticity.

Normality of residuals.

No multicollinearity (independent variables should not be highly correlated).

###  Question 11 : What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?

Answer : Heteroscedasticity occurs when the variance of residuals is not constant across all levels of the independent variables. It can lead to inefficient estimates and unreliable hypothesis tests.

### Question 12 : How can you improve a Multiple Linear Regression model with high multicollinearity?

Answer : Remove highly correlated variables.

Use dimensionality reduction techniques like PCA.

Apply regularization methods (e.g., Ridge or Lasso Regression).

### Question 13 : What are some common techniques for transforming categorical variables for use in regression models?

Answer : One-Hot Encoding: Create binary columns for each category.

Label Encoding: Assign integer values to categories (for ordinal data).

Target Encoding: Replace categories with the mean of the target variable.

### Question 14 : What is the role of interaction terms in Multiple Linear Regression?

Answer : Interaction terms capture the effect of two or more independent variables acting together on the dependent variable. They are added to the model as a product of the variables (e.g., X1×X2).

### Question 15 : How can the interpretation of intercept differ between Simple and Multiple Linear Regression?

Answer : In Simple Linear Regression, the intercept is the value of Y whenX=0.
In Multiple Linear Regression, the intercept is the value of Y when all independent variables are 0.

### Question 16 : What is the significance of the slope in regression analysis, and how does it affect predictions?

Answer : The slope indicates the strength and direction of the relationship between Xand Y. A larger slope means a stronger relationship, and it directly affects the predicted values of Y.

### Question 17 : How does the intercept in a regression model provide context for the relationship between variables?

Answer : The intercept provides a baseline value for 
Y when all predictors are zero, helping to contextualize the starting point of the regression line.

### Question 18 : What are the limitations of using R² as a sole measure of model performance?

Answer :
R2 does not indicate causality.

It can be artificially inflated by adding more variables.

It does not account for overfitting.

### Question 19 : How would you interpret a large standard error for a regression coefficient?

Answer : A large standard error indicates uncertainty in the estimate of the coefficient, suggesting that the variable may not be a reliable predictor.

### Question 20 : How can heteroscedasticity be identified in residual plots, and why is it important to address it?

Answer: Heteroscedasticity can be identified if the residuals fan out or form a pattern in a residual plot. It is important to address because it violates regression assumptions and can lead to biased results.

### Question 21 : What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?

Answer: It suggests that the model may be overfitted, with too many variables that do not contribute meaningfully to explaining the variance in Y.

### Question 22 : Why is it important to scale variables in Multiple Linear Regression?

Answer: Scaling ensures that all variables contribute equally to the model, especially when using regularization techniques or comparing coefficients.

### Question 23 : What is polynomial regression?

Answer : Polynomial regression models the relationship between X and Y as an n-th degree polynomial. It is used when the relationship is nonlinear.

### Question 24 : How does polynomial regression differ from linear regression?

Answer : Linear Regression: Models a straight-line relationship.

Polynomial Regression: Models curved relationships using higher-degree terms (e.g., X^2, X^3).

### Question 25 : When is polynomial regression used?

Answer: It is used when the relationship between X and Y is nonlinear and cannot be adequately modeled by a straight line.

### Question 26 : What is the general equation for polynomial regression?

Answer : Y=β0+β1X+β2X2+⋯+βnXn+ϵ

###  Question 27 : Can polynomial regression be applied to multiple variables?

Answer : Yes, polynomial regression can include multiple independent variables and interaction terms.

### Question 28 : What are the limitations of polynomial regression?

Answer : It can lead to overfitting, especially with high-degree polynomials.
It may not generalize well to new data.

### Question 29 : What methods can be used to evaluate model fit when selecting the degree of a polynomial?

Answer :
Cross-validation.

Adjusted R2.

AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion).

### Question 30 : Why is visualization important in polynomial regression?

Answer : Visualization helps identify the appropriate degree of the polynomial and assess the fit of the model to the data.

### Question 31 : How is polynomial regression implemented in Python?

In [7]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np

#Create polynomial features
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

#Fit the model
model = LinearRegression()
model.fit(X_poly, y)

#Provide X and Y as per your need.

NameError: name 'X' is not defined

## Thank You !!! 