### Simple Linear Regression
Simple Linear Regression is a statistical method that models the relationship between a dependent variable (Y) and a single independent variable (X) by fitting a straight line.

Equation:
Y = b0 + b1*X + ε

where:
- b0 = intercept
- b1 = slope
- ε = error term


### Key assumptions of Simple Linear Regression
1. Linearity – The relationship between X and Y is linear.
2. Independence – Observations are independent of each other.
3. Homoscedasticity – Constant variance of residuals.
4. Normality of errors – Residuals follow a normal distribution.
5. No multicollinearity – Not applicable for single predictor, but relevant for multiple regression.


### Heteroscedasticity
Heteroscedasticity occurs when the variance of residuals changes with the value of X.

Impact: It can make coefficient estimates inefficient and affect hypothesis testing.

Importance: Detecting and correcting it ensures accurate prediction intervals and valid significance tests.


### Multiple Linear Regression
Multiple Linear Regression models the relationship between a dependent variable and two or more independent variables.

Equation:
Y = b0 + b1*X1 + b2*X2 + ... + bn*Xn + ε


### Polynomial Regression
Polynomial Regression is an extension of linear regression where the relationship between independent and dependent variables is modeled as an nth-degree polynomial.

Example:
Y = b0 + b1*X + b2*X^2 + ... + bn*X^n

- Linear regression fits a straight line.
- Polynomial regression fits a curved line to capture non-linear trends.


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
Y = np.array([2.1, 4.3, 6.1, 7.9, 10.2])

model = LinearRegression()
model.fit(X, Y)

y_pred = model.predict(X)

plt.scatter(X, Y, color='blue')
plt.plot(X, y_pred, color='red')
plt.title("Simple Linear Regression")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

print("Intercept:", model.intercept_)
print("Slope:", model.coef_[0])


In [None]:
import pandas as pd
from sklearn.linear_model import LinearRegression
from statsmodels.stats.outliers_influence import variance_inflation_factor

Area = [1200, 1500, 1800, 2000]
Rooms = [2, 3, 3, 4]
Price = [250000, 300000, 320000, 370000]

df = pd.DataFrame({'Area': Area, 'Rooms': Rooms, 'Price': Price})

X = df[['Area', 'Rooms']]
y = df['Price']

model = LinearRegression()
model.fit(X, y)

# VIF Calculation
vif_data = pd.DataFrame()
vif_data["Feature"] = X.columns
vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print("VIF Results:\n", vif_data)


In [None]:
from sklearn.preprocessing import PolynomialFeatures

X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
Y = np.array([2.2, 4.8, 7.5, 11.2, 14.7])

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

model = LinearRegression()
model.fit(X_poly, Y)
y_pred = model.predict(X_poly)

plt.scatter(X, Y, color='blue')
plt.plot(X, y_pred, color='red')
plt.title("Polynomial Regression (Degree 2)")
plt.show()


In [None]:
X = np.array([10, 20, 30, 40, 50]).reshape(-1, 1)
Y = np.array([15, 35, 40, 50, 65])

model = LinearRegression()
model.fit(X, Y)
y_pred = model.predict(X)

residuals = Y - y_pred

plt.scatter(X, residuals, color='purple')
plt.axhline(y=0, color='black', linestyle='--')
plt.xlabel("X")
plt.ylabel("Residuals")
plt.title("Residuals Plot")
plt.show()


### Handling heteroscedasticity & multicollinearity
1. Addressing heteroscedasticity:
   - Use weighted least squares regression.
   - Transform dependent variable (log, sqrt).
   - Identify and remove outliers if necessary.

2. Addressing multicollinearity:
   - Check VIF values; remove variables with high VIF (>10).
   - Combine correlated variables.
   - Use regularization methods like Ridge or Lasso regression.

3. Ensure model robustness:
   - Cross-validation.
   - Residual analysis.
   - Feature scaling.
