## Task 3: Multimodal Housing Price Prediction
Problem Statement & Objective: Predict real estate prices by fusing two different data modalities: structured tabular data and unstructured image data.

Dataset Loading & Preprocessing: Used the Houses Dataset. Normalized tabular features with MinMaxScaler and resized/normalized house images to (64, 64) pixels.

Model Development & Training: Developed a dual-branch neural network: a CNN for image feature extraction and an MLP for tabular data. Fused the branches using the Keras concatenate layer.

Evaluation Metrics: Measured performance using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) in actual dollar amounts.

Visualizations: Model architecture plot and Loss curves showing convergence over 50 epochs.

Final Summary / Insights: Multimodal models provide a richer understanding of data; visual features (house appearance) significantly complement numerical data (square footage) for price estimation.

In [None]:
!pip install tensorflow opencv-python pandas scikit-learn matplotlib



## Data Acquisition and Preprocessing

In [None]:
import os
import cv2
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

# 1. Clone the dataset
if not os.path.exists("Houses-dataset"):
    !git clone https://github.com/emanhamed/Houses-dataset

# 2. Load Tabular Data
cols = ["bedrooms", "bathrooms", "area", "zipcode", "price"]
df = pd.read_csv("Houses-dataset/Houses Dataset/HousesInfo.txt", sep=" ", header=None, names=cols)

# 3. Load Images (Using the 'Frontal' image for each house)
images = []
for i in df.index.values:
    # Path to the frontal image
    basePath = os.path.sep.join(["Houses-dataset/Houses Dataset", "{}_frontal.jpg".format(i + 1)])
    image = cv2.imread(basePath)
    image = cv2.resize(image, (64, 64)) # Resize for memory efficiency
    images.append(image)

images = np.array(images) / 255.0 # Normalize pixel values

# 4. Preprocess Tabular Data
# Scaling the target (price) and features
scaler = MinMaxScaler()
X_tab = scaler.fit_transform(df.drop("price", axis=1))
y = df["price"].values / df["price"].max() # Normalize price for training stability

# 5. Split into Train/Test
split = train_test_split(X_tab, images, y, test_size=0.2, random_state=42)
(trainTab, testTab, trainImg, testImg, trainY, testY) = split

## Build the Multimodal Architecture

In [None]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Dropout, concatenate

# BRANCH 1: CNN (Image processing)
img_input = Input(shape=(64, 64, 3))
x = Conv2D(16, (3, 3), activation='relu', padding="same")(img_input)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding="same")(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Flatten()(x)
x = Dense(16, activation="relu")(x)
cnn_branch = Model(inputs=img_input, outputs=x)

# BRANCH 2: MLP (Tabular processing)
tab_input = Input(shape=(4,))
y_dense = Dense(16, activation="relu")(tab_input)
y_dense = Dense(8, activation="relu")(y_dense)
mlp_branch = Model(inputs=tab_input, outputs=y_dense)

# FUSION: Combine both branches
combined = concatenate([cnn_branch.output, mlp_branch.output])

# Final Regression Layers
z = Dense(4, activation="relu")(combined)
z = Dense(1, activation="linear")(z)

# Final Multimodal Model
model = Model(inputs=[cnn_branch.input, mlp_branch.input], outputs=z)
model.compile(loss="mse", optimizer="adam", metrics=["mae"])

## Train the Model

In [None]:
print("[INFO] Training multimodal model...")
history = model.fit(
    x=[trainImg, trainTab], y=trainY,
    validation_data=([testImg, testTab], testY),
    epochs=50, batch_size=8
)

[INFO] Training multimodal model...
Epoch 1/50
[1m54/54[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 32ms/step - loss: 0.0119 - mae: 0.0823 - val_loss: 0.0043 - val_mae: 0.0510
Epoch 2/50
[1m54/54[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 52ms/step - loss: 0.0094 - mae: 0.0590 - val_loss: 0.0040 - val_mae: 0.0475
Epoch 3/50
[1m54/54[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 32ms/step - loss: 0.0102 - mae: 0.0601 - val_loss: 0.0039 - val_mae: 0.0459
Epoch 4/50
[1m54/54[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 28ms/step - loss: 0.0074 - mae: 0.0548 - val_loss: 0.0038 - val_mae: 0.0482
Epoch 5/50
[1m54/54[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 29ms/step - loss: 0.0116 - mae: 0.0592 - val_loss: 0.0036 - val_mae: 0.0444
Epoch 6/50
[1m54/54[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 28ms/step - loss: 0.0073 - mae: 0.0536 - val_loss: 0.0036 - val_mae: 0.0472
Epoch 7/50
[1m54/54[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37

## Evaluate Performance (MAE and RMSE)

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

# 1. Make predictions
preds = model.predict([testImg, testTab])

# 2. Rescale predictions and actual values back to original prices
maxPrice = df["price"].max()
final_preds = preds.flatten() * maxPrice
final_actual = testY * maxPrice

# 3. Calculate Metrics
mae = mean_absolute_error(final_actual, final_preds)
rmse = np.sqrt(mean_squared_error(final_actual, final_preds))

print(f"\n--- Model Evaluation ---")
print(f"Mean Absolute Error (MAE): ${mae:,.2f}")
print(f"Root Mean Squared Error (RMSE): ${rmse:,.2f}")

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 67ms/step

--- Model Evaluation ---
Mean Absolute Error (MAE): $271,292.83
Root Mean Squared Error (RMSE): $395,150.87
