# 🏡 🤖✨
# California House Price Prediction using Linear Regression

This project demonstrates how to build a simple **machine learning model** to predict house prices using the **California Housing Dataset** provided by `scikit-learn`.

It is working as following:
- Load and explore real-world housing data
- Use **Linear Regression** to model the relationship between features (income, age, rooms,..) and house prices
- Evaluate the model's performance using metrics like **Mean Squared Error (MSE)** and **R² score**
(( **R² score**:coefficient of determination that tells you how much variance in the target variable your model explains.))
- **MSE**: The smaller the MSE, the better the model's predictions are (closer to actual values).
- **R²**: The closer R² is to 1, the better the model explains the variation in the data.

- If the dots are close to the line, the MSE is low, and the R² is high.
- If the dots are far from the line, the MSE is high, and the R² is low.
- Visualize the relationship between actual and predicted prices


In [29]:

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

 ## 📥 Load and Prepare Data

We load the dataset and convert it to a pandas DataFrame for easier handling.


In [None]:
# Load the California Housing dataset
data = fetch_california_housing()

# Convert to DataFrame
df = pd.DataFrame(data.data, columns=data.feature_names)
df['MedianHouseValue'] = data.target

# Show first few rows
df.head()

## 📊 Visualize the Data

Let's see how **Median Income** affects **House Prices**.

In [None]:
plt.scatter(df['MedInc'], df['MedianHouseValue'], alpha=0.3)
plt.xlabel('Median Income')
plt.ylabel('Median House Value')
plt.title('Income vs House Price')
plt.grid(True)
plt.show()

## 🧪 Prepare Training and Testing Data

We separate the features (`X`) and the target (`y`), then split the dataset into training and testing sets.


In [32]:
X = df.drop('MedianHouseValue', axis=1)  # Features
y = df['MedianHouseValue']               # Target

# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 🧠 Train the Model

We train a **Linear Regression** model using the training data.

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

## 📈 Make Predictions

We use the trained model to predict house prices on the test set.

In [34]:
y_pred = model.predict(X_test)

## 🧮 Evaluate the Model

We check the model’s performance using:
- **Mean Squared Error (MSE)**
- **R² Score**

In [None]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")
print(f"R² Score: {r2:.2f}")

## 🖼️ Visualize Predictions

Compare actual vs predicted prices to see how well the model performs.

In [None]:
plt.figure(figsize=(6,6))
plt.scatter(y_test, y_pred, alpha=0.3, color='teal')
plt.xlabel('Actual Price')
plt.ylabel('Predicted Price')
plt.title('Actual vs Predicted House Prices')
plt.plot([0, 5], [0, 5], 'r--')  # 45-degree line
plt.grid(True)
plt.show()

# 🧾 Enter custom house features manually
### House Price Prediction Formula

The formula to predict the house price using the linear regression model is:


**text{Predicted Price} = w_1 \times \text{MedInc} + w_2 \times \text{HouseAge} + \dots + w_8 \times \text{Longitude} + b**

Where:
- \( w_1, w_2, \dots, w_8 \) are the learned **coefficients** for each feature (Ex., **Median Income**, **House Age**, etc.).
- \( b \) is the **intercept** term (the constant added to the prediction).
- **MedInc, HouseAge, Longitude, etc.** are the input features (the values you provide for each feature).



In [None]:
print("🔢 Enter the values for the following features:")

MedInc = float(input("Median Income (Ex: 8.0): "))
HouseAge = float(input("House Age in years (Ex: 30): "))
AveRooms = float(input("Average number of rooms (Ex: 6): "))
AveBedrms = float(input("Average number of bedrooms (Ex: 1): "))
Population = float(input("Population in the area (Ex: 1000): "))
AveOccup = float(input("Average Occupancy (Ex: 3): "))
Latitude = float(input("Latitude (Ex: 34.0): "))
Longitude = float(input("Longitude (Ex: -118.0): "))

# 📦 Create DataFrame for prediction
custom_input = pd.DataFrame([{
    'MedInc': MedInc,         #Median income in the area (in thousands of dollars).
    'HouseAge': HouseAge,     #Median age of houses in the area.
    'AveRooms': AveRooms,     #Average number of rooms per household.
    'AveBedrms': AveBedrms,   #Average number of bedrooms per household.
    'Population': Population, #Total population in the area.
    'AveOccup': AveOccup,     #Average number of people per household.
    'Latitude': Latitude,     #Latitude of the area. <__>
    'Longitude': Longitude    #Longitude of the area. <|>

}])

# 🔮 Predict price
predicted_price = model.predict(custom_input)[0]
print(f"\n💰 Predicted House Price: ${predicted_price * 100000:.2f}")


In [38]:
import pickle

# Save Model to Pickle
with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)
