# Multiple Linear Regression

This notebook demonstrates the implementation of a **Multiple Linear Regression** model using the **California Housing dataset**.  
We explore how **multiple features** affect **median house prices**. The notebook covers:

- Theory and assumptions of multiple linear regression
- Data loading and preprocessing
- Model training and evaluation
- Visualization and conclusion


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score


data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['label'] = data.target
X = df[data.feature_names]
y = df['label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error (MSE):", round(mse, 4))
print("R-squared (R²) Score:", round(r2, 4))

plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, alpha=0.4)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--')  # Reference line
plt.xlabel('Actual Median House Value')
plt.ylabel('Predicted Median House Value')
plt.title('Multiple Linear Regression: Actual vs Predicted')
plt.grid(True)

plt.show()


![Multiple Linear Regression Output](multiple_linear_regression_output.png)

## Conclusion: Multiple Linear Regression

This experiment used **Multiple Linear Regression** to model the relationship between multiple features and median house prices.

### Key Findings

- Including multiple features improves model performance compared to a single feature.
- Metrics:
  - **Mean Squared Error (MSE)**: Lower than simple linear regression, indicating better fit.
  - **R-squared (R²) Score**: Higher than simple linear regression, showing more variance explained.

### Limitations

- Model assumes linear relationships and no strong multicollinearity.
- Real-world housing prices depend on many complex factors beyond this model.

---

Further exploration can include polynomial regression, regularization, or more complex models.
