<a href="https://colab.research.google.com/github/amrahmani/Machine-Learning/blob/main/Ch6_NeuralNetwork.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Problem:** We have a dataset named House, which has four columns: House ID,	Square Footage,	Number of Bedrooms,	Price ($)).

https://github.com/amrahmani/Machine-Learning/blob/main/house_data.csv

Using Python code, first, handle missing values by filling them with the mean, then remove outliers using Z-scores, and scale features.

Then, perform the following tasks:

1) Predict house prices based on square footage using an MLP.

2) Predict house prices based on the number of bedrooms using a simple linear regression model and an MLP.

3) Predict house prices based on square footage and number of bedrooms using l and MLP.

Calculate R² and Accuracy for each above tasks and

Finally, predict prices for new houses and evaluate prediction accuracy using MAE, MSE, and RMSE.


In [2]:
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from scipy.stats import zscore
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

# Load the data
url = 'https://raw.githubusercontent.com/amrahmani/Machine-Learning/main/house_data.csv'
df = pd.read_csv(url)

# --- Data Preprocessing ---
# 1. Handle missing values by filling with the mean
df_cleaned = df.copy()
for col in df_cleaned.columns:
    if df_cleaned[col].dtype in [np.float64, np.int64]:
        mean_val = df_cleaned[col].mean()
        df_cleaned[col].fillna(mean_val, inplace=True)

# 2. Remove outliers using Z-scores (threshold of 3)
numerical_cols = ['Square Footage', 'Number of Bedrooms', 'Price ($)']
df_no_outliers = df_cleaned.copy()
for col in numerical_cols:
    z_scores = np.abs(zscore(df_no_outliers[col]))
    df_no_outliers = df_no_outliers[(z_scores < 3)]

# Define the dataset
X = df_no_outliers.drop(['House ID', 'Price ($)'], axis=1)
y = df_no_outliers['Price ($)']

# --- PyTorch Model Definitions ---
class MLP(nn.Module):
    def __init__(self, input_size):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(input_size, 16)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(16, 8)
        self.fc3 = nn.Linear(8, 4)
        self.fc4 = nn.Linear(4, 1)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        x = self.relu(x)
        x = self.fc4(x)
        return x

class LinearRegression(nn.Module):
    def __init__(self, input_size):
        super(LinearRegression, self).__init__()
        self.linear = nn.Linear(input_size, 1)

    def forward(self, x):
        return self.linear(x)

# Function to train a model
def train_model(model, X_train, y_train, epochs=1000, lr=0.001):
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)

    for epoch in range(epochs):
        model.train()
        outputs = model(X_train)
        loss = criterion(outputs, y_train.unsqueeze(1))
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    return model

# Function to evaluate and get metrics
def get_metrics(model, X_test, y_test):
    model.eval()
    with torch.no_grad():
        predicted = model(X_test)

    y_test_np = y_test.numpy()
    predicted_np = predicted.squeeze().numpy()

    r2 = r2_score(y_test_np, predicted_np)
    mae = mean_absolute_error(y_test_np, predicted_np)
    mse = mean_squared_error(y_test_np, predicted_np)
    rmse = np.sqrt(mse)

    return r2, mae, mse, rmse

# Store results in a dictionary
results = {}

# --- Task 1: Predict price based on square footage using an MLP ---
X1 = df_no_outliers[['Square Footage']].values
y1 = df_no_outliers['Price ($)'].values
X1_train, X1_test, y1_train, y1_test = train_test_split(X1, y1, test_size=0.2, random_state=42)
scaler_X1 = StandardScaler()
X1_train_scaled = scaler_X1.fit_transform(X1_train)
X1_test_scaled = scaler_X1.transform(X1_test)
X1_train_tensor = torch.FloatTensor(X1_train_scaled)
y1_train_tensor = torch.FloatTensor(y1_train)
X1_test_tensor = torch.FloatTensor(X1_test_scaled)
y1_test_tensor = torch.FloatTensor(y1_test)

mlp1 = MLP(input_size=1)
mlp1 = train_model(mlp1, X1_train_tensor, y1_train_tensor)
results['MLP (Square Footage)'] = get_metrics(mlp1, X1_test_tensor, y1_test_tensor)

# --- Task 2: Predict price based on the number of bedrooms ---
X2 = df_no_outliers[['Number of Bedrooms']].values
y2 = df_no_outliers['Price ($)'].values
X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y2, test_size=0.2, random_state=42)
scaler_X2 = StandardScaler()
X2_train_scaled = scaler_X2.fit_transform(X2_train)
X2_test_scaled = scaler_X2.transform(X2_test)
X2_train_tensor = torch.FloatTensor(X2_train_scaled)
y2_train_tensor = torch.FloatTensor(y2_train)
X2_test_tensor = torch.FloatTensor(X2_test_scaled)
y2_test_tensor = torch.FloatTensor(y2_test)

lr2 = LinearRegression(input_size=1)
lr2 = train_model(lr2, X2_train_tensor, y2_train_tensor)
results['Linear Regression (Bedrooms)'] = get_metrics(lr2, X2_test_tensor, y2_test_tensor)

mlp2 = MLP(input_size=1)
mlp2 = train_model(mlp2, X2_train_tensor, y2_train_tensor)
results['MLP (Bedrooms)'] = get_metrics(mlp2, X2_test_tensor, y2_test_tensor)

# --- Task 3: Predict price based on square footage and number of bedrooms ---
X3 = df_no_outliers[['Square Footage', 'Number of Bedrooms']].values
y3 = df_no_outliers['Price ($)'].values
X3_train, X3_test, y3_train, y3_test = train_test_split(X3, y3, test_size=0.2, random_state=42)
scaler_X3 = StandardScaler()
X3_train_scaled = scaler_X3.fit_transform(X3_train)
X3_test_scaled = scaler_X3.transform(X3_test)
X3_train_tensor = torch.FloatTensor(X3_train_scaled)
y3_train_tensor = torch.FloatTensor(y3_train)
X3_test_tensor = torch.FloatTensor(X3_test_scaled)
y3_test_tensor = torch.FloatTensor(y3_test)

lr3 = LinearRegression(input_size=2)
lr3 = train_model(lr3, X3_train_tensor, y3_train_tensor)
results['Linear Regression (2 Features)'] = get_metrics(lr3, X3_test_tensor, y3_test_tensor)

mlp3 = MLP(input_size=2)
mlp3 = train_model(mlp3, X3_train_tensor, y3_train_tensor)
results['MLP (2 Features)'] = get_metrics(mlp3, X3_test_tensor, y3_test_tensor)

# --- Comparison of Results ---
results_df = pd.DataFrame.from_dict(
    results,
    orient='index',
    columns=['R-squared', 'MAE', 'MSE', 'RMSE']
)
print("\n--- Summary of Model Performance ---")
print(results_df.round(4))

# --- Predict prices for new houses ---
print("\n--- Predicting prices for new, hypothetical houses using best model ---")
# The MLP with 2 features is often the best-performing model due to more data
new_houses_data = pd.DataFrame([
    [2500, 4],
    [1500, 2],
    [3000, 3]
], columns=['Square Footage', 'Number of Bedrooms'])

new_houses_scaled = scaler_X3.transform(new_houses_data)
new_houses_tensor = torch.FloatTensor(new_houses_scaled)

mlp3.eval()
with torch.no_grad():
    predicted_prices_tensor = mlp3(new_houses_tensor)

predicted_prices = predicted_prices_tensor.squeeze().numpy()
new_houses_data['Predicted Price ($)'] = predicted_prices

print(new_houses_data)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_cleaned[col].fillna(mean_val, inplace=True)



--- Summary of Model Performance ---
                                R-squared          MAE           MSE  \
MLP (Square Footage)             -23.2778  386827.3125  1.559570e+11   
Linear Regression (Bedrooms)     -23.7171  390326.3438  1.587784e+11   
MLP (Bedrooms)                   -23.7170  390326.0000  1.587782e+11   
Linear Regression (2 Features)   -23.7170  390326.4062  1.587783e+11   
MLP (2 Features)                 -22.9238  383928.1562  1.536827e+11   

                                       RMSE  
MLP (Square Footage)            394913.8553  
Linear Regression (Bedrooms)    398470.0854  
MLP (Bedrooms)                  398469.8387  
Linear Regression (2 Features)  398470.0032  
MLP (2 Features)                392023.8804  

--- Predicting prices for new, hypothetical houses using best model ---
   Square Footage  Number of Bedrooms  Predicted Price ($)
0            2500                   4          5567.735352
1            1500                   2          6031.304688
2  



**Practice:**

**Task 1:** Try different hyperparameters of MLP.

**Task 2:** For this multiclass classification task, fit a Multilayer Perceptron (MLP) to the following dataset:

https://github.com/amrahmani/Machine-Learning/blob/main/mobile_data.csv

The dataset contains multiple features, and the goal is to classify the price_range, which has four classes. Compare the results using accuracy, R², and confusion matrices. Also, compare the performance with other classifiers covered in the class.

**Task 3:** Use the following dataset containing information about customers (e.g., Customer ID, Gender, Age, Annual Income (k$), Score (1-100)).

https://github.com/amrahmani/Machine-Learning/blob/main/customers.csv

First, analyze the relationships between Annual Income (k$) and other variables. Then, predict Annual Income using a multiple regression model and MLP.

**Task 4:** Find a new dataset on Kaggle and use an MLP for regression.