Q1. Difference Between Simple Linear Regression and Multiple Linear Regression
Simple Linear Regression:

Definition: A statistical method to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered the independent variable (predictor), and the other is the dependent variable (response).

Equation: 
𝑦
=
𝑏
0
+
𝑏
1
𝑥
y=b 
0
​
 +b 
1
​
 x

Example: Predicting a person's height (y) based on their shoe size (x).

from sklearn.linear_model import LinearRegression
import numpy as np

# Example data
x = np.array([5, 6, 7, 8, 9]).reshape(-1, 1)
y = np.array([150, 160, 170, 180, 190])

# Simple linear regression
model = LinearRegression()
model.fit(x, y)
Multiple Linear Regression:

Definition: An extension of simple linear regression that uses multiple predictors to model the relationship with the dependent variable.

Equation: 
𝑦
=
𝑏
0
+
𝑏
1
𝑥
1
+
𝑏
2
𝑥
2
+
⋯
+
𝑏
𝑛
𝑥
𝑛
y=b 
0
​
 +b 
1
​
 x 
1
​
 +b 
2
​
 x 
2
​
 +⋯+b 
n
​
 x 
n
​
 

Example: Predicting a person's weight (y) based on their height (x1) and age (x2).

# Example data
x = np.array([[5, 20], [6, 25], [7, 30], [8, 35], [9, 40]])
y = np.array([55, 60, 65, 70, 75])

# Multiple linear regression
model = LinearRegression()
model.fit(x, y)

In [None]:
Q2. Assumptions of Linear Regression
Linearity: The relationship between the predictors and the response is linear.
Independence: Observations are independent of each other.
Homoscedasticity: The residuals have constant variance at every level of the predictor.
Normality: The residuals of the model are normally distributed.
No multicollinearity: Predictors are not highly correlated with each other.

Checking Assumptions:
Linearity: Scatter plots and residual plots.
Independence: Durbin-Watson test.
Homoscedasticity: Residual vs. fitted values plot.
Normality: Q-Q plot and Shapiro-Wilk test.
Multicollinearity: Variance Inflation Factor (VIF).

In [None]:
Q3. Interpreting the Slope and Intercept in a Linear Regression Model
Intercept (b0): The expected value of the dependent variable when all predictors are zero. It represents the starting point of the model.
Slope (b1): The change in the dependent variable for a one-unit change in the predictor variable, holding all other predictors constant.
Example:
Predicting a person's salary based on their years of experience.

Intercept: The expected salary with 0 years of experience.
Slope: The increase in salary for each additional year of experience.

In [None]:
Q4. Concept of Gradient Descent in Machine Learning
Gradient Descent: An optimization algorithm used to minimize the cost function in machine learning models, particularly linear regression.

Process:

Initialize the parameters (weights and biases) randomly.
Calculate the gradient of the cost function with respect to each parameter.
Update the parameters by subtracting the gradient scaled by a learning rate.
Repeat until convergence (i.e., the change in the cost function is below a certain threshold).

In [None]:
Q5. Multiple Linear Regression Model
Definition: A regression model that uses multiple predictors to model the relationship with the dependent variable.

Equation: 
𝑦
=
𝑏
0
+
𝑏
1
𝑥
1
+
𝑏
2
𝑥
2
+
⋯
+
𝑏
𝑛
𝑥
𝑛
y=b 
0
​
 +b 
1
​
 x 
1
​
 +b 
2
​
 x 
2
​
 +⋯+b 
n
​
 x 
n
​
 

Differences from Simple Linear Regression:

Uses multiple predictors.
Accounts for the combined effect of several variables on the response variable.

In [None]:
6. Multicollinearity in Multiple Linear Regression
Definition: A situation where two or more predictors in the model are highly correlated, making it difficult to isolate the individual 
effect of each predictor.

Detection:

Variance Inflation Factor (VIF): If VIF > 10, it indicates high multicollinearity.
from statsmodels.stats.outliers_influence import variance_inflation_factor

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
Addressing Multicollinearity:

Remove highly correlated predictors.
Use dimensionality reduction techniques like PCA.
Regularization methods like Ridge Regression.

In [None]:
Q7. Polynomial Regression Model
Definition: A type of regression model that fits a polynomial equation to the data. It can model non-linear relationships between the
predictors and the response variable.

Equation: 
𝑦
=
𝑏
0
+
𝑏
1
𝑥
+
𝑏
2
𝑥
2
+
⋯
+
𝑏
𝑛
𝑥
𝑛
y=b 
0
​
 +b 
1
​
 x+b 
2
​
 x 
2
 +⋯+b 
n
​
 x 
n
 

Differences from Linear Regression:

Polynomial regression can capture non-linear relationships.
Linear regression fits a straight line, while polynomial regression fits a curve.

In [None]:
Q8. Advantages and Disadvantages of Polynomial Regression
Advantages:

Can model non-linear relationships.
More flexible compared to linear regression.
Disadvantages:

Prone to overfitting, especially with high-degree polynomials.
Computationally more expensive.
When to Use:

When the relationship between the predictors and the response variable is non-linear.
When the data shows curvature that cannot be captured by a linear model.
Example of Polynomial Regression in Python
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np

# Example data
x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([1, 4, 9, 16, 25])

# Polynomial features
poly = PolynomialFeatures(degree=2)
x_poly = poly.fit_transform(x)

# Polynomial regression
model = LinearRegression()
model.fit(x_poly, y)