# Regression Assignment

### Q1: What is Simple Linear Regression

Simple Linear Regression is a statistical technique that models the relationship between a dependent variable (Y) and a single independent variable (X) using a straight line. The relationship is defined by the equation Y = mX + c, where m is the slope and c is the intercept.

### Q2: What are the key assumptions of Simple Linear Regression

- Linearity: The relationship between the independent and dependent variable is linear.
- Independence: Observations are independent of each other.
- Homoscedasticity: Constant variance of residuals across all levels of the independent variable.
- Normality: Residuals should be approximately normally distributed.
- No multicollinearity: (for multiple regression) Predictors should not be too highly correlated.

### Q3: What does the coefficient m represent in the equation Y=mX+c

The coefficient **m** in the equation Y = mX + c represents the **slope** of the regression line. It shows the rate at which Y changes with respect to X. In other words, it indicates how much Y will increase or decrease when X increases by one unit.

### Q4: What does the intercept c represent in the equation Y=mX+c

The intercept **c** in the equation Y = mX + c represents the **value of Y when X is 0**. It is the point where the regression line crosses the Y-axis.

### Q5: How do we calculate the slope m in Simple Linear Regression

The slope **m** in Simple Linear Regression is calculated using the formula:

m = Σ((X - X̄)(Y - Ȳ)) / Σ((X - X̄)^2)

Where X̄ is the mean of X, and Ȳ is the mean of Y. This formula computes the best fit line by minimizing the sum of squared errors.

### Q6: What is the purpose of the least squares method in Simple Linear Regression



The cost function in Simple Linear Regression is **Mean Squared Error (MSE)**. It measures the average squared difference between the predicted values and the actual values.
Formula:
$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n}(y_i - \hat{y}_i)^2
$$

Where:
* $y_i$ is the actual value
* $\hat{y}_i$ is the predicted value
* $n$ is the number of observations

The cost function is used to evaluate how well the linear regression model fits the data. The goal is to minimize the cost function** to find the best-fitting line.

### Q7: How is the coefficient of determination (R²) interpreted in Simple Linear Regression


Gradient Descent is an optimization algorithm used to minimize the cost function in Linear Regression. It updates the model’s parameters (slope and intercept) step by step to reduce the error.

**Steps:**

1. Initialize slope and intercept.
2. Calculate the gradient (partial derivatives of the cost function).
3. Update the parameters using:

   $$
   m = m - \alpha \cdot \frac{\partial \text{Cost}}{\partial m}
   $$

   $$
   c = c - \alpha \cdot \frac{\partial \text{Cost}}{\partial c}
   $$
4. Repeat until convergence.

Here, $\alpha$ is the **learning rate**, controlling the size of steps taken toward the minimum.

### Q8: What is Multiple Linear Regression

**Multiple Linear Regression** is an extension of Simple Linear Regression where there are **two or more independent variables** predicting a dependent variable.

**Equation:**

$$
Y = b_0 + b_1X_1 + b_2X_2 + \dots + b_nX_n
$$

**Difference:**

* Simple Linear Regression uses **1 predictor variable**.
* Multiple Linear Regression uses **2 or more predictor variables**.

This allows modeling more complex relationships in the data.


### Q9: What is the main difference between Simple and Multiple Linear Regression

**Polynomial Regression** is a form of regression where the relationship between the independent variable and the dependent variable is modeled as an **nth-degree polynomial**.

**Equation (for degree 2):**

$$
Y = b_0 + b_1X + b_2X^2
$$

**Difference:**

* Linear Regression models straight-line relationships.
* Polynomial Regression models **curved** relationships (non-linear) while still being a linear model in terms of coefficients.

It’s useful when data shows a non-linear trend.


### Q10: What are the key assumptions of Multiple Linear Regression

Simple Linear Regression is a statistical technique that models the relationship between a dependent variable (Y) and a single independent variable (X) using a straight line. The relationship is defined by the equation Y = mX + c, where m is the slope and c is the intercept.

### Q11: What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model

Simple Linear Regression is a statistical technique that models the relationship between a dependent variable (Y) and a single independent variable (X) using a straight line. The relationship is defined by the equation Y = mX + c, where m is the slope and c is the intercept.

### Q12: How can you improve a Multiple Linear Regression model with high multicollinearity

Simple Linear Regression is a statistical technique that models the relationship between a dependent variable (Y) and a single independent variable (X) using a straight line. The relationship is defined by the equation Y = mX + c, where m is the slope and c is the intercept.

### Q13: What are some common techniques for transforming categorical variables for use in regression models

Simple Linear Regression is a statistical technique that models the relationship between a dependent variable (Y) and a single independent variable (X) using a straight line. The relationship is defined by the equation Y = mX + c, where m is the slope and c is the intercept.

### Q14: What is the role of interaction terms in Multiple Linear Regression

Simple Linear Regression is a statistical technique that models the relationship between a dependent variable (Y) and a single independent variable (X) using a straight line. The relationship is defined by the equation Y = mX + c, where m is the slope and c is the intercept.

### Q15: How can the interpretation of intercept differ between Simple and Multiple Linear Regression

Simple Linear Regression is a statistical technique that models the relationship between a dependent variable (Y) and a single independent variable (X) using a straight line. The relationship is defined by the equation Y = mX + c, where m is the slope and c is the intercept.

### Q16: What is the significance of the slope in regression analysis, and how does it affect predictions

Simple Linear Regression is a statistical technique that models the relationship between a dependent variable (Y) and a single independent variable (X) using a straight line. The relationship is defined by the equation Y = mX + c, where m is the slope and c is the intercept.

### Q17: How does the intercept in a regression model provide context for the relationship between variables

Simple Linear Regression is a statistical technique that models the relationship between a dependent variable (Y) and a single independent variable (X) using a straight line. The relationship is defined by the equation Y = mX + c, where m is the slope and c is the intercept.

### Q18: What are the limitations of using R² as a sole measure of model performance

Simple Linear Regression is a statistical technique that models the relationship between a dependent variable (Y) and a single independent variable (X) using a straight line. The relationship is defined by the equation Y = mX + c, where m is the slope and c is the intercept.

### Q19: How would you interpret a large standard error for a regression coefficient

Simple Linear Regression is a statistical technique that models the relationship between a dependent variable (Y) and a single independent variable (X) using a straight line. The relationship is defined by the equation Y = mX + c, where m is the slope and c is the intercept.

### Q20: How can heteroscedasticity be identified in residual plots, and why is it important to address it

- Linearity: The relationship between the independent and dependent variable is linear.
- Independence: Observations are independent of each other.
- Homoscedasticity: Constant variance of residuals across all levels of the independent variable.
- Normality: Residuals should be approximately normally distributed.
- No multicollinearity: (for multiple regression) Predictors should not be too highly correlated.

### Q21: What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²

- Linearity: The relationship between the independent and dependent variable is linear.
- Independence: Observations are independent of each other.
- Homoscedasticity: Constant variance of residuals across all levels of the independent variable.
- Normality: Residuals should be approximately normally distributed.
- No multicollinearity: (for multiple regression) Predictors should not be too highly correlated.

### Q22: Why is it important to scale variables in Multiple Linear Regression

- Linearity: The relationship between the independent and dependent variable is linear.
- Independence: Observations are independent of each other.
- Homoscedasticity: Constant variance of residuals across all levels of the independent variable.
- Normality: Residuals should be approximately normally distributed.
- No multicollinearity: (for multiple regression) Predictors should not be too highly correlated.

### Q23: What is polynomial regression

- Linearity: The relationship between the independent and dependent variable is linear.
- Independence: Observations are independent of each other.
- Homoscedasticity: Constant variance of residuals across all levels of the independent variable.
- Normality: Residuals should be approximately normally distributed.
- No multicollinearity: (for multiple regression) Predictors should not be too highly correlated.

### Q24: How does polynomial regression differ from linear regression

- Linearity: The relationship between the independent and dependent variable is linear.
- Independence: Observations are independent of each other.
- Homoscedasticity: Constant variance of residuals across all levels of the independent variable.
- Normality: Residuals should be approximately normally distributed.
- No multicollinearity: (for multiple regression) Predictors should not be too highly correlated.

### Q25: When is polynomial regression used

- Linearity: The relationship between the independent and dependent variable is linear.
- Independence: Observations are independent of each other.
- Homoscedasticity: Constant variance of residuals across all levels of the independent variable.
- Normality: Residuals should be approximately normally distributed.
- No multicollinearity: (for multiple regression) Predictors should not be too highly correlated.

### Q26: What is the general equation for polynomial regression

- Linearity: The relationship between the independent and dependent variable is linear.
- Independence: Observations are independent of each other.
- Homoscedasticity: Constant variance of residuals across all levels of the independent variable.
- Normality: Residuals should be approximately normally distributed.
- No multicollinearity: (for multiple regression) Predictors should not be too highly correlated.

### Q27: Can polynomial regression be applied to multiple variables

- Linearity: The relationship between the independent and dependent variable is linear.
- Independence: Observations are independent of each other.
- Homoscedasticity: Constant variance of residuals across all levels of the independent variable.
- Normality: Residuals should be approximately normally distributed.
- No multicollinearity: (for multiple regression) Predictors should not be too highly correlated.

### Q28: What are the limitations of polynomial regression

- Linearity: The relationship between the independent and dependent variable is linear.
- Independence: Observations are independent of each other.
- Homoscedasticity: Constant variance of residuals across all levels of the independent variable.
- Normality: Residuals should be approximately normally distributed.
- No multicollinearity: (for multiple regression) Predictors should not be too highly correlated.

### Q29: What methods can be used to evaluate model fit when selecting the degree of a polynomial

- Linearity: The relationship between the independent and dependent variable is linear.
- Independence: Observations are independent of each other.
- Homoscedasticity: Constant variance of residuals across all levels of the independent variable.
- Normality: Residuals should be approximately normally distributed.
- No multicollinearity: (for multiple regression) Predictors should not be too highly correlated.

### Q30: Why is visualization important in polynomial regression

The coefficient **m** in the equation Y = mX + c represents the **slope** of the regression line. It shows the rate at which Y changes with respect to X. In other words, it indicates how much Y will increase or decrease when X increases by one unit.

### Q31: How is polynomial regression implemented in Python?

The coefficient **m** in the equation Y = mX + c represents the **slope** of the regression line. It shows the rate at which Y changes with respect to X. In other words, it indicates how much Y will increase or decrease when X increases by one unit.