Q1. Explain the difference between simple linear regression and multiple linear regression. Provide an
example of each.

- Simple Linear Regression
    - Models the relationship between one dependent and one independent variable.
    - Equation: Y = C0 + C1X + e
    - Suitable when there is one clear predictor.
    - Assumptions: 	Linearity, Independence, Homoscedasticity, Normality.
    - Visulaization: Typically visualized with a 2D scatter plot and a line of best fit.
    - Risk of Overfitting: Lower, as it deals with only one predictor.
    - Applications: Basic research, simple predictions, understanding a singular relationship.
    
- Multiple Linear Regression
    - Models the relationship between one dependent and two or more independent variables.
    - Eqaution: Y = C0 + C1X1 + C2X2 + C3X3 + ….. + CnXn + e
    - Suitable when multiple factors affect the outcome.
    - Assumptions: Same as linear regression, with the added concern of multicollinearity.
    - Visulaization: Requires 3D or multi-dimensional space, often represented using partial regression plots.
    - Risk of Overfitting: Higher, especially if too many predictors are used without adequate data.
    - Applications: Complex research, multifactorial predictions, studying interrelated systems. 

Q2. Discuss the assumptions of linear regression. How can you check whether these assumptions hold in
a given dataset?

Linear regression relies on several key assumptions. Ensuring these assumptions hold is crucial for the validity of the regression model. Here are the main assumptions and how to check them:

1. Linearity: The relationship between the independent variables and the dependent variable is linear.

How to Check: Scatter Plot- Plot the dependent variable against each independent variable to check for a linear relationship.

2. Independence: Observations should be independent of each other.

How to check: Durbin-Watson Test- This test checks for the presence of autocorrelation in the residuals.

3. Homoscedasticity: The residuals have constant variance at every level of the independent variables.

How to check: Residual Plot- Plot the residuals against the predicted values. The spread of residuals should be consistent across all levels of predicted values.

4. Normality of Residuals:  The residuals are normally distributed.

How to check: Histogram- Plot a histogram of the residuals to visually check for normality.

5. No Multicollinearity:  The independent variables are not highly correlated with each other.

How to check: Correlation Matrix- Calculate and inspect the correlation matrix of the independent variables. High correlations (e.g., above 0.8) indicate multicollinearity.

Q3. How do you interpret the slope and intercept in a linear regression model? Provide an example using
a real-world scenario.

In a linear regression model, the slope and intercept have specific interpretations that relate to the relationship between the independent variable(s) and the dependent variable.

Intercept: The intercept is the expected value of the dependent variable when all the independent variables are equal to zero. It is the point at which the regression line crosses the y-axis. In the equation y = ß0 + ß1x1 + ß2x2 +...+ ßnxn, the intercept ß0 represents the value of y when x1, x2, x3,..., xn are all zero.

Slope: The slope represents the change in the dependent variable for a one-unit change in the independent variable, holding all other variables constant. In th equation y = ß0 + ß1x1 + ß2x2 +...+ ßnxn, the slope ßi for a particular xi indicates how much y changes for a one-unit increase in xi, assuming all other predictors are constant. It shows the strength and direction of the relationship between an independent variable and the dependent variable.

Example: Researchers might administer various dosages of a certain drug to patients and observe how their blood pressure responds. They might fit a simple linear regression model using dosage as the predictor variable and blood pressure as the response variable. The regression model would take the following form:

blood pressure = β0 + β1(dosage)

The coefficient β0 would represent the expected blood pressure when dosage is zero.

The coefficient β1 would represent the average change in  blood pressure when dosage is increased by one unit.

If β1 is negative, it would mean that an increase in dosage is associated with a decrease in blood pressure.

If β1 is close to zero, it would mean that an increase in dosage is associated with no change in blood pressure.

If β1 is positive, it would mean that an increase in dosage is associated with an increase in blood pressure.

Depending on the value of β1, researchers may decide to change the dosage given to a patient.

Q4. Explain the concept of gradient descent. How is it used in machine learning?

Gradient Descent is known as one of the most commonly used optimization algorithms to train machine learning models by means of minimizing errors between actual and expected results.

The main objective of using a gradient descent algorithm is to minimize the cost function using iteration. To achieve this goal, it performs two steps iteratively:

- Calculates the first-order derivative of the function to compute the gradient or slope of that function.
- Move away from the direction of the gradient, which means slope increased from the current point by alpha times, where Alpha is defined as Learning Rate. It is a tuning parameter in the optimization process which helps to decide the length of the steps.

The cost function is defined as the measurement of difference or error between actual values and expected values at the current position and present in the form of a single real number. It helps to increase and improve machine learning efficiency by providing feedback to this model so that it can minimize error and find the local or global minimum.

![image.png](attachment:e181dd77-721b-485f-9913-8fc5af34ef00.png)

The starting point(shown in above fig.) is used to evaluate the performance as it is considered just as an arbitrary point. At this starting point, we will derive the first derivative or slope and then use a tangent line to calculate the steepness of this slope. Further, this slope will inform the updates to the parameters (weights and bias).

The slope becomes steeper at the starting point or arbitrary point, but whenever new parameters are generated, then steepness gradually reduces, and at the lowest point, it approaches the lowest point, which is called a point of convergence.

The main objective of gradient descent is to minimize the cost function or the error between expected and actual.

 If the learning rate is high, it results in larger steps but also leads to risks of overshooting the minimum. At the same time, a low learning rate shows the small step sizes, which compromises overall efficiency but gives the advantage of more precision.
 
 

Q5. Describe the multiple linear regression model. How does it differ from simple linear regression?

In machine learning and data analysis, multiple linear regression (MLR) is a statistical technique used to predict the relationship between one dependent variable and two or more independent variables. the main objective of multiple linear regression (MLR) is to forecast the dependent variable's value based on the independent variables' values. 

Let k represent the number of variables denoted by x1, x2, x3, ……, xk. 
Furthermore, we assume that Y is linearly dependent on the factors according to 

Y = β0 + β1x1 + β2x2 + · · · + βkxk + ε

- The variable yi is dependent or predicted.
- The slope of y depends on the y-intercept, that is, when xi and x2 are both zero, y will be β0.
- The regression coefficients β1 and β2 represent the change in y as a result of one-unit changes in xi1 and xi2.
- βp refers to the slope coefficient of all independent variables.
- ε term describes the random error (residual) in the model.

Assumptions of Multiple Linear Regression:

- Linearity: The relationship between the dependent variable and each independent variable is linear. This means the change in the dependent variable is proportional to the change in each independent variable.
- Independence: The observations are independent of each other. This assumption ensures that the value of the dependent variable for one observation is not influenced by the value for another.
-  Homoscedasticit: The variance of the residuals (errors) is constant across all levels of the independent variables. This means that the spread of residuals should be roughly the same for all predicted values.
- Normality of Residuals: The residuals (differences between observed and predicted values) are typically distributed. This is particularly important for hypothesis testing and constructing confidence intervals.
- No Multicollinearity: The independent variables are not too highly correlated. However, high multicollinearity can make it difficult to determine the individual effect of each independent variable.

A multiple regression model uses more than one independent variable. It does not suffer from the same limitations as the simple regression equation, and it is thus able to fit curved and non-linear relationships. The following are the uses of multiple linear regression.

- Planning and Control.
- Prediction or Forecasting.

Q6. Explain the concept of multicollinearity in multiple linear regression. How can you detect and
address this issue?

Multicollinearity refers to a situation in multiple linear regression where two or more independent variables are highly correlated with each other. This correlation means that one independent variable can be linearly predicted from the others with a substantial degree of accuracy. 

Detecting Multicollinearity

1. Correlation Matrix: A correlation matrix shows the correlation coefficients between pairs of variables. High correlation coefficients (close to 1 or -1) between independent variables indicate multicollinearity.

2. Variance Inflation Factor (VIF): IF quantifies the amount of variance in the estimated regression coefficients that is inflated due to multicollinearity. VIF values above 10 are often taken as indicating significant multicollinearity, though some researchers use a threshold of 5.

Addressing Multicollinearity

1. Remove Highly Correlated Predictors
If two or more predictors are highly correlated, consider removing one of them from the model.
2. Combine Predictors
Create a new predictor by combining the highly correlated predictors. For instance, you can take their average or use principal component analysis (PCA) to create uncorrelated components.
3. Regularization Techniques
Use regularization techniques like Ridge Regression or Lasso Regression, which can handle multicollinearity by adding a penalty term to the regression.

Q7. Describe the polynomial regression model. How is it different from linear regression?

Polynomial regression is a type of regression analysis where the relationship between the independent variable x and the dependent variable y is modeled as an nth degree polynomial. It extends linear regression by allowing for non-linear relationships between the predictors and the outcome variable.

A polynomial regression model of degree n is given by:

y= b0+b1x1+ b2x12+ b2x13+...... bnx1n

where, y = dependent variable,
x = independent variable,
b0,b1,b2,... = coefficients

Steps to Implement Polynomial Regression
Generate Polynomial Features: Create new features that are powers of the original feature(s).
Fit a Linear Model: Fit a linear regression model using the polynomial features.

The main difference between linear regression and polynomial regression is the nature of the relationship between the dependent variable and the independent variable.
In linear regression, the relationship is assumed to be linear, while in polynomial regression, the relationship can be non-linear and can take on a more complex shape. Polynomial regression allows for more flexibility in modeling the relationship between the variables and can provide a better fit to the data when the relationship is non-linear.

Q8. What are the advantages and disadvantages of polynomial regression compared to linear
regression? In what situations would you prefer to use polynomial regression?

Advantages of Polynomial regression:
    
- Flexibility: Polynomial regression can model non-linear relationships between the independent and dependent variables, providing more flexibility than linear regression. It can fit a wider variety of curves, capturing more complex patterns in the data.
- Better Fit for Non-Linear Data: When the true relationship between variables is non-linear, polynomial regression can provide a better fit and more accurate predictions than linear regression.
- Captures Local Effects: Polynomial regression can capture local trends and variations that linear regression might miss, especially in datasets where the effect of predictors on the response variable changes at different levels of the predictors.

Disadvantages of Polynomial regression:

- Overfitting: With higher-degree polynomials, the model can become overly complex and start to fit the noise in the data rather than the underlying trend, leading to overfitting. Overfitting reduces the model's generalizability to new data.
- Extrapolation Issues: Polynomial models can produce extreme and unreliable predictions outside the range of the observed data (extrapolation), particularly for high-degree polynomials.
- Multicollinearity: Polynomial features can be highly correlated with each other, leading to multicollinearity, which can make the coefficient estimates unstable and difficult to interpret.
- Computationally Intensive: As the degree of the polynomial increases, the computational complexity increases, which can be a concern for very large datasets or very high-degree polynomials.

Situations to Prefer Polynomial Regression

- Use polynomial regression when there is a known or suspected non-linear relationship between the independent and dependent variables, and this relationship can be well-approximated by a polynomial function.
- When a simple linear model is inadequate, but the relationship is not overly complex, polynomial regression of a reasonable degree can provide a good balance between fit and complexity.
- When you have enough data points to support the fitting of a polynomial model without the risk of overfitting. A higher number of data points can justify the use of higher-degree polynomials.