## Q1. Explain the difference between simple linear regression and multiple linear regression. Provide an example of each.

**Answer:**

Simple linear regression involves one independent variable and one dependent variable. It models the relationship between these two variables by fitting a linear equation. The general form of a simple linear regression equation is:
\[ y = \beta_0 + \beta_1x + \epsilon \]

- **Example:** Predicting a person's weight (`y`) based on their height (`x`).

Multiple linear regression, on the other hand, involves two or more independent variables. The relationship is modeled using a linear equation that fits multiple predictors. The general form of a multiple linear regression equation is:  
$$ y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_nx_n + \epsilon $$

- **Example:** Predicting a person's weight (`y`) based on their height (`x_1`), age (`x_2`), and exercise frequency (`x_3`).

## Q2. Discuss the assumptions of linear regression. How can you check whether these assumptions hold in a given dataset?

**Answer:**

The main assumptions of linear regression are:

1. **Linearity:** The relationship between the independent and dependent variables is linear.
2. **Independence:** The observations are independent of each other.
3. **Homoscedasticity:** The residuals (errors) have constant variance at every level of the independent variable.
4. **Normality:** The residuals are normally distributed.
5. **No Multicollinearity:** In multiple linear regression, the independent variables are not highly correlated with each other.

To check whether these assumptions hold:

- **Linearity:** Use scatterplots of the independent variables against the dependent variable.
- **Independence:** This can be checked by plotting residuals against time or sequence if data is time-series. Durbin-Watson test can also be used.
- **Homoscedasticity:** Plot residuals vs. predicted values or independent variables.
- **Normality:** Use Q-Q plots or histograms of residuals.
- **No Multicollinearity:** Check the Variance Inflation Factor (VIF) or correlation matrix.

## Q3. How do you interpret the slope and intercept in a linear regression model? Provide an example using a real-world scenario.

**Answer:**

In a linear regression model:
- The **intercept** $(\beta_0)$ is the predicted value of the dependent variable when all independent variables are zero.
- The **slope** $ (\beta_1, \beta_2) $, etc.) represents the change in the dependent variable for a one-unit change in the respective independent variable.

**Example:** If we are predicting salary based on years of experience:

- **Intercept:** The starting salary when experience is zero years.
- **Slope:** The increase in salary for each additional year of experience.

Suppose the equation is:
$ \text{Salary} = 30,000 + 5,000 \times (\text{Years of Experience})$
- The intercept is 30,000, indicating the starting salary.
- The slope is 5,000, meaning each additional year of experience increases the salary by $5,000.

## Q4. Explain the concept of gradient descent. How is it used in machine learning?

**Answer:**

Gradient descent is an optimization algorithm used to minimize the cost function by iteratively adjusting the model parameters (weights). It involves calculating the gradient of the cost function with respect to the parameters and moving the parameters in the direction of the negative gradient to reduce the error.

In machine learning, gradient descent is commonly used in training models, particularly in linear regression, logistic regression, and neural networks.

- **Steps:**
  1. Initialize parameters (weights) randomly.
  2. Compute the gradient of the cost function.
  3. Update the parameters in the opposite direction of the gradient.
  4. Repeat until convergence.

## Q5. Describe the multiple linear regression model. How does it differ from simple linear regression?

**Answer:**

Multiple linear regression models the relationship between one dependent variable and two or more independent variables. It extends simple linear regression, which only involves one independent variable.

- **Equation for Simple Linear Regression:**
  $ y = \beta_0 + \beta_1x + \epsilon $

- **Equation for Multiple Linear Regression:**
  $ y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_nx_n + \epsilon $

The key difference is that multiple linear regression can capture the effect of several predictors on the dependent variable, allowing for more complex models and potentially better predictions.

## Q6. Explain the concept of multicollinearity in multiple linear regression. How can you detect and address this issue?

**Answer:**

Multicollinearity occurs when two or more independent variables in a multiple linear regression model are highly correlated, making it difficult to isolate the effect of each variable on the dependent variable. This can lead to inflated standard errors and unreliable estimates of the coefficients.

- **Detection:**
  - **Variance Inflation Factor (VIF):** A high VIF (> 10) indicates multicollinearity.
  - **Correlation Matrix:** A high correlation coefficient (close to 1 or -1) between independent variables suggests multicollinearity.

- **Addressing Multicollinearity:**
  - **Remove one of the correlated variables.**
  - **Combine correlated variables into a single predictor.**
  - **Use regularization techniques like Ridge or Lasso regression.**

## Q7. Describe the polynomial regression model. How is it different from linear regression?

**Answer:**

Polynomial regression is a type of regression that models the relationship between the dependent and independent variables as an nth-degree polynomial. It is used when the relationship between the variables is not linear.

- **Equation for Polynomial Regression:**
  $ y = \beta_0 + \beta_1x + \beta_2x^2 + \cdots + \beta_nx^n + \epsilon $

The key difference from linear regression is that in polynomial regression, the independent variable is raised to a power greater than one, allowing the model to fit nonlinear relationships.

## Q8. What are the advantages and disadvantages of polynomial regression compared to linear regression? In what situations would you prefer to use polynomial regression?

**Answer:**

**Advantages of Polynomial Regression:**
- Can model nonlinear relationships between variables.
- Provides a better fit for data that exhibits curvature.

**Disadvantages of Polynomial Regression:**
- Can lead to overfitting, especially with high-degree polynomials.
- More complex to interpret than linear regression.

**When to Use Polynomial Regression:**
- When the relationship between the independent and dependent variables shows curvature.
- When linear regression does not provide a good fit and a higher-degree polynomial better captures the trend in the data.
