<a href="https://colab.research.google.com/github/amrahmani/Machine-Learning/blob/main/Ch6_NeuralNetwork.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Problem:** We have a dataset named House, which has four columns: House ID,	Square Footage,	Number of Bedrooms,	Price ($)).

https://github.com/amrahmani/Machine-Learning/blob/main/house_data.csv

Using Python code, first, handle missing values by filling them with the mean, then remove outliers using Z-scores, and scale features.

Then, perform the following tasks:

1) Predict house prices based on square footage using an MLP.

2) Predict house prices based on the number of bedrooms using a simple linear regression model and an MLP.

3) Predict house prices based on square footage and number of bedrooms using l and MLP.

Calculate R² and Adjusted R² for each above tasks and

Finally, predict prices for new houses and evaluate prediction accuracy using MAE, MSE, and RMSE.


In [4]:
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from scipy.stats import zscore
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

# Load the data
url = 'https://raw.githubusercontent.com/amrahmani/Machine-Learning/main/house_data.csv'
df = pd.read_csv(url)
print("Original DataFrame:")
print(df.head())
print("\nOriginal DataFrame Info:")
df.info()

# --- Data Preprocessing ---
# 1. Handle missing values by filling with the mean
df_cleaned = df.copy()
for col in df_cleaned.columns:
    if df_cleaned[col].dtype in [np.float64, np.int64]:
        mean_val = df_cleaned[col].mean()
        df_cleaned[col].fillna(mean_val, inplace=True)
print("\nDataFrame after handling missing values:")
df_cleaned.info()

# 2. Remove outliers using Z-scores (threshold of 3)
numerical_cols = ['Square Footage', 'Number of Bedrooms', 'Price ($)']
df_no_outliers = df_cleaned.copy()
for col in numerical_cols:
    z_scores = np.abs(zscore(df_no_outliers[col]))
    df_no_outliers = df_no_outliers[(z_scores < 3)]

print("\nDataFrame after removing outliers:")
df_no_outliers.info()

# 3. Scale features
scaler = StandardScaler()
df_scaled = df_no_outliers.copy()
df_scaled[numerical_cols] = scaler.fit_transform(df_scaled[numerical_cols])
print("\nDataFrame after scaling features:")
print(df_scaled.head())

# Define the dataset
X = df_no_outliers.drop(['House ID', 'Price ($)'], axis=1)
y = df_no_outliers['Price ($)']

# --- PyTorch Model Definitions ---
class MLP(nn.Module):
    def __init__(self, input_size):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(input_size, 16)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(16, 8)
        self.fc3 = nn.Linear(8, 4)
        self.fc4 = nn.Linear(4, 1)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        x = self.relu(x)
        x = self.fc4(x)
        return x

class LinearRegression(nn.Module):
    def __init__(self, input_size):
        super(LinearRegression, self).__init__()
        self.linear = nn.Linear(input_size, 1)

    def forward(self, x):
        return self.linear(x)

# Function to train a model
def train_model(model, X_train, y_train, epochs=1000, lr=0.001):
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)

    for epoch in range(epochs):
        model.train()
        outputs = model(X_train)
        loss = criterion(outputs, y_train.unsqueeze(1))

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    return model

# Function to evaluate and print metrics
def evaluate_model(model, X_test, y_test, model_name):
    model.eval()
    with torch.no_grad():
        predicted = model(X_test)

    y_test_np = y_test.numpy()
    predicted_np = predicted.squeeze().numpy()

    r2 = r2_score(y_test_np, predicted_np)
    print(f"\n--- {model_name} R-squared: {r2:.4f} ---")

    return predicted_np

# --- Task 1: Predict price based on square footage using an MLP ---
print("\n--- Task 1: Predicting price based on Square Footage using an MLP ---")
X1 = df_no_outliers[['Square Footage']].values
y1 = df_no_outliers['Price ($)'].values
X1_train, X1_test, y1_train, y1_test = train_test_split(X1, y1, test_size=0.2, random_state=42)

scaler_X1 = StandardScaler()
X1_train_scaled = scaler_X1.fit_transform(X1_train)
X1_test_scaled = scaler_X1.transform(X1_test)

X1_train_tensor = torch.FloatTensor(X1_train_scaled)
y1_train_tensor = torch.FloatTensor(y1_train)
X1_test_tensor = torch.FloatTensor(X1_test_scaled)
y1_test_tensor = torch.FloatTensor(y1_test)

mlp1 = MLP(input_size=1)
mlp1 = train_model(mlp1, X1_train_tensor, y1_train_tensor)
evaluate_model(mlp1, X1_test_tensor, y1_test_tensor, "MLP (Square Footage)")

# --- Task 2: Predict price based on the number of bedrooms using a linear regression model and an MLP ---
print("\n--- Task 2: Predicting price based on Number of Bedrooms using Linear Regression and an MLP ---")
X2 = df_no_outliers[['Number of Bedrooms']].values
y2 = df_no_outliers['Price ($)'].values
X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y2, test_size=0.2, random_state=42)

scaler_X2 = StandardScaler()
X2_train_scaled = scaler_X2.fit_transform(X2_train)
X2_test_scaled = scaler_X2.transform(X2_test)

X2_train_tensor = torch.FloatTensor(X2_train_scaled)
y2_train_tensor = torch.FloatTensor(y2_train)
X2_test_tensor = torch.FloatTensor(X2_test_scaled)
y2_test_tensor = torch.FloatTensor(y2_test)

# Linear Regression
lr2 = LinearRegression(input_size=1)
lr2 = train_model(lr2, X2_train_tensor, y2_train_tensor)
evaluate_model(lr2, X2_test_tensor, y2_test_tensor, "Linear Regression (Bedrooms)")

# MLP
mlp2 = MLP(input_size=1)
mlp2 = train_model(mlp2, X2_train_tensor, y2_train_tensor)
evaluate_model(mlp2, X2_test_tensor, y2_test_tensor, "MLP (Bedrooms)")

# --- Task 3: Predict price based on square footage and number of bedrooms using linear regression and an MLP ---
print("\n--- Task 3: Predicting price based on Square Footage & Bedrooms using Linear Regression and an MLP ---")
X3 = df_no_outliers[['Square Footage', 'Number of Bedrooms']].values
y3 = df_no_outliers['Price ($)'].values
X3_train, X3_test, y3_train, y3_test = train_test_split(X3, y3, test_size=0.2, random_state=42)

scaler_X3 = StandardScaler()
X3_train_scaled = scaler_X3.fit_transform(X3_train)
X3_test_scaled = scaler_X3.transform(X3_test)

X3_train_tensor = torch.FloatTensor(X3_train_scaled)
y3_train_tensor = torch.FloatTensor(y3_train)
X3_test_tensor = torch.FloatTensor(X3_test_scaled)
y3_test_tensor = torch.FloatTensor(y3_test)

# Linear Regression
lr3 = LinearRegression(input_size=2)
lr3 = train_model(lr3, X3_train_tensor, y3_train_tensor)
evaluate_model(lr3, X3_test_tensor, y3_test_tensor, "Linear Regression (2 Features)")

# MLP
mlp3 = MLP(input_size=2)
mlp3 = train_model(mlp3, X3_train_tensor, y3_train_tensor)
predicted_mlp3 = evaluate_model(mlp3, X3_test_tensor, y3_test_tensor, "MLP (2 Features)")

# --- Final Predictions and Evaluation ---
print("\n--- Final Evaluation on Test Data (MLP with 2 features) ---")
y3_test_np = y3_test_tensor.numpy()
mae = mean_absolute_error(y3_test_np, predicted_mlp3)
mse = mean_squared_error(y3_test_np, predicted_mlp3)
rmse = np.sqrt(mse)
print(f"MAE: {mae:.4f}")
print(f"MSE: {mse:.4f}")
print(f"RMSE: {rmse:.4f}")

# Predict prices for new houses
print("\n--- Predicting prices for new, hypothetical houses ---")
new_houses_data = pd.DataFrame([
    [2500, 4],
    [1500, 2],
    [3000, 3]
], columns=['Square Footage', 'Number of Bedrooms'])

# Scale the new data using the same scaler
new_houses_scaled = scaler_X3.transform(new_houses_data)
new_houses_tensor = torch.FloatTensor(new_houses_scaled)

# Make predictions using the most comprehensive model (MLP with 2 features)
mlp3.eval()
with torch.no_grad():
    predicted_prices_tensor = mlp3(new_houses_tensor)

predicted_prices = predicted_prices_tensor.squeeze().numpy()
new_houses_data['Predicted Price ($)'] = predicted_prices

print(new_houses_data)

Original DataFrame:
   House ID  Square Footage  Number of Bedrooms  Price ($)
0         1            3176                   3     451231
1         2            3076                   2     411161
2         3            1241                   3     282771
3         4            3279                   2     435325
4         5            1775                   4     380753

Original DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 4 columns):
 #   Column              Non-Null Count  Dtype
---  ------              --------------  -----
 0   House ID            100 non-null    int64
 1   Square Footage      100 non-null    int64
 2   Number of Bedrooms  100 non-null    int64
 3   Price ($)           100 non-null    int64
dtypes: int64(4)
memory usage: 3.3 KB

DataFrame after handling missing values:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 4 columns):
 #   Column              Non-Nul

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_cleaned[col].fillna(mean_val, inplace=True)



--- MLP (Square Footage) R-squared: -23.4715 ---

--- Task 2: Predicting price based on Number of Bedrooms using Linear Regression and an MLP ---

--- Linear Regression (Bedrooms) R-squared: -23.7172 ---

--- MLP (Bedrooms) R-squared: -23.2756 ---

--- Task 3: Predicting price based on Square Footage & Bedrooms using Linear Regression and an MLP ---

--- Linear Regression (2 Features) R-squared: -23.7171 ---

--- MLP (2 Features) R-squared: -23.4150 ---

--- Final Evaluation on Test Data (MLP with 2 features) ---
MAE: 388041.5938
MSE: 156838232064.0000
RMSE: 396028.0698

--- Predicting prices for new, hypothetical houses ---
   Square Footage  Number of Bedrooms  Predicted Price ($)
0            2500                   4          2899.469971
1            1500                   2          1281.257812
2            3000                   3          2822.675781




**Practice:**

Use the following dataset containing information about customers (e.g., Customer ID, Gender, Age, Annual Income (k$), Score (1-100)).

https://github.com/amrahmani/Machine-Learning/blob/main/customers.csv

First, analyze the relationships between Annual Income (k$) and other variables. Then, predict Annual Income using a multiple regression model and MLP.