
# Regression Assignment Solutions

---

## Question 1: What is Simple Linear Regression?
**Answer:**  
Simple Linear Regression is a statistical method used to model the relationship between one independent variable (X) and one dependent variable (Y) using a straight-line equation:

Y = β0 + β1X

Where:
- β0 = Intercept
- β1 = Slope (coefficient)

---

## Question 2: What are the key assumptions of Simple Linear Regression?
**Answer:**
1. Linearity
2. Independence of errors
3. Homoscedasticity
4. Normality of residuals
5. No multicollinearity (for multiple regression)

---

## Question 3: What is heteroscedasticity?
**Answer:**  
Heteroscedasticity occurs when the variance of residuals is not constant across all levels of the independent variable.  
It is important because it can lead to inefficient estimates and incorrect statistical inferences.

---

## Question 4: What is Multiple Linear Regression?
**Answer:**  
Multiple Linear Regression models the relationship between one dependent variable and two or more independent variables.

Y = β0 + β1X1 + β2X2 + ... + βnXn

---

## Question 5: What is Polynomial Regression?
**Answer:**  
Polynomial Regression models nonlinear relationships by adding polynomial terms (X², X³, etc.).  
Unlike simple linear regression, it can fit curved relationships.

---


## Question 6: Simple Linear Regression Implementation

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

X = np.array([1, 2, 3, 4, 5]).reshape(-1,1)
Y = np.array([2.1, 4.3, 6.1, 7.9, 10.2])

model = LinearRegression()
model.fit(X, Y)
predictions = model.predict(X)

plt.figure()
plt.scatter(X, Y)
plt.plot(X, predictions)
plt.title("Simple Linear Regression")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

print("Intercept:", model.intercept_)
print("Slope:", model.coef_[0])


## Question 7: Multiple Linear Regression & VIF

In [None]:

import pandas as pd
from sklearn.linear_model import LinearRegression
from statsmodels.stats.outliers_influence import variance_inflation_factor

data = pd.DataFrame({
    'Area': [1200, 1500, 1800, 2000],
    'Rooms': [2, 3, 3, 4],
    'Price': [250000, 300000, 320000, 370000]
})

X = data[['Area', 'Rooms']]
y = data['Price']

model = LinearRegression()
model.fit(X, y)

vif_data = pd.DataFrame()
vif_data["Feature"] = X.columns
vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print(vif_data)


## Question 8: Polynomial Regression

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

X = np.array([1, 2, 3, 4, 5]).reshape(-1,1)
Y = np.array([2.2, 4.8, 7.5, 11.2, 14.7])

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

model = LinearRegression()
model.fit(X_poly, Y)

X_range = np.linspace(1,5,100).reshape(-1,1)
X_range_poly = poly.transform(X_range)
Y_pred = model.predict(X_range_poly)

plt.figure()
plt.scatter(X, Y)
plt.plot(X_range, Y_pred)
plt.title("Polynomial Regression (Degree 2)")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()


## Question 9: Residual Plot & Heteroscedasticity Check

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

X = np.array([10, 20, 30, 40, 50]).reshape(-1,1)
Y = np.array([15, 35, 40, 50, 65])

model = LinearRegression()
model.fit(X, Y)
predictions = model.predict(X)
residuals = Y - predictions

plt.figure()
plt.scatter(predictions, residuals)
plt.axhline(y=0)
plt.title("Residual Plot")
plt.xlabel("Predicted Values")
plt.ylabel("Residuals")
plt.show()



## Question 10: Handling Heteroscedasticity & Multicollinearity

**Answer:**

Steps to address heteroscedasticity:
1. Apply log or Box-Cox transformation
2. Use Weighted Least Squares
3. Use robust standard errors

Steps to address multicollinearity:
1. Check VIF values
2. Remove highly correlated features
3. Use Ridge or Lasso Regression
4. Apply PCA (Principal Component Analysis)

These techniques improve model stability and predictive performance.
