# Multimodal CNN + Tabular Classification Pipeline

This notebook defines a modular, extensible pipeline for classifying detailed Dutch housing types based on both **images** (e.g., front view photos from Funda) and **tabular features** (e.g., build year, surface area, monument status).

### ðŸ“Œ Notebook Structure

1. **Setup**  
   Import libraries, define paths, check device (CPU/GPU), and set config options.

2. **Data Ingestion**  
   Load and merge image metadata with tabular housing data using `bag_id` as key.

3. **Preprocessing**  
   - Clean and encode label data (woningtype)
   - Normalize selected tabular features
   - Build image paths
   - Filter to valid samples
   - Train/validation split

4. **Dataset & DataLoader**  
   Define a custom PyTorch `Dataset` class to return paired image + tabular tensors, and create `DataLoader`s.

5. **Model Definition** *(placeholder)*  
   - CNN backbone (e.g., ResNet)
   - MLP for tabular features
   - Fusion layer and classifier head

6. **Training Loop** *(placeholder)*  
   Define the training pipeline using F1, accuracy, and recall as evaluation metrics.

7. **Evaluation & Export** *(placeholder)*  
   - Generate predictions and metrics
   - Save trained model and results to disk

This structure is modular and can be adjusted as needed. The modeling and training sections will be filled in once data loading and preprocessing are finalized.


In [1]:
# --- Section 1: Setup ---
import os
import pandas as pd
from PIL import Image

from dotenv import load_dotenv
import os

from sklearn.model_selection import train_test_split

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, models

import torch.nn.functional as F
from sklearn.metrics import f1_score, accuracy_score, recall_score

# Device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

# Find path
load_dotenv()
BASE_DIR = os.getenv('FILE_PATH')

# Paths
IMG_DIR = BASE_DIR + '/images'
TABULAR_PATH = BASE_DIR + '/detailed_woning_type_sample.csv'
IMG_META_PATH = BASE_DIR + '/bag_image_summary.csv'

Using device: cuda


In [2]:
# print("CUDA available:", torch.cuda.is_available())
# print("CUDA devices:", torch.cuda.device_count())
# print("Current device:", torch.cuda.current_device())
# print("Device name:", torch.cuda.get_device_name(0) if torch.cuda.device_count() > 0 else "None")


In [3]:
# --- Section 2: Data Ingestion ---

# Koen is ook nog bezig met code, dat kan hier geimplementeerd worden

# Load tabular data
tabular_df = pd.read_csv(TABULAR_PATH)
img_meta = pd.read_csv(IMG_META_PATH)

# Normalize column names
tabular_df.columns = [col.strip().lower() for col in tabular_df.columns]
img_meta.columns = [col.strip().lower() for col in img_meta.columns]

# Merge
tabular_df = tabular_df.rename(columns={'bag_nummeraanduidingid': 'bag_id'})

df = tabular_df.merge(img_meta, on='bag_id', how='left')

# Filter: keep only samples with frontview image path
df = df[df['frontview_exists'] == True]

# TODO
# hier moet wat logica over de path files. Komt nadat de paths in 'bag_image_summary' zijn verbeterd.
# mogelijk id's droppen die geen foto hebben
# nadenken of we die id's ook droppen in onze baseline tabular model

In [None]:
# --- Section 3: Data Ingestion ---

# TODO
# Preprocessing en normalisatie van de data toepassen (koen).

# TODO
# Training, test en validatie split toevoegen
# Zoals dit:
# CODE ROSA:

test_size_train = 0.4
test_size_val = 0.5
rnd_state = 42

train_df, temp_df = train_test_split(df, test_size=test_size_train, random_state=rnd_state)
val_df, test_df = train_test_split(temp_df, test_size=test_size_val, random_state=rnd_state)

#optioneel als je er een csv file van wil maar je hebt in principe ny al gwn train_df, val_df en test_df

# train_df.to_csv("train.csv", index=False)
# val_df.to_csv("val.csv", index=False)
# test_df.to_csv("test.csv", index=False)


print("Train size:", len(train_df), "Val size:", len(val_df), "Num classes:", df['woningtype'].nunique())
 

Train size: 4235 Val size: 1412 Num classes: 15


In [5]:
# --- Section 4: Dataset and DataLoader ---


class HousingDataset(Dataset):
    def __init__(self, df, numeric_features, transform=None):
        self.df = df.reset_index(drop=True)
        self.transform = transform
        self.numeric_features = numeric_features
        
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        
        # We houden het RGB omdat veel modellen uit de packages getrained zijn om RGB
        img = Image.open(row['img_path']).convert('RGB')
        if self.transform:
            img = self.transform(img)
        
        # Tabular features tensor
        tab_feats = torch.tensor(row[self.numeric_features].values, dtype=torch.float32)
        
        # Label tensor
        label = torch.tensor(row['label'], dtype=torch.long)
        
        return img, tab_feats, label

# TODO
# PLACEHOLDER FEATURES, hangt af van koen zijn preprocessing en vooral encoding
numeric_features = ['build_year', 'oppervlakte', 'is_monument']
categorical_features = ['encoded_build_type', 'encoded_postcode']
###############################################################################

# Combined features list to be used in the dataset
tabular_features = numeric_features + categorical_features

# Define transforms for images
train_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                         std=[0.229, 0.224, 0.225]), # <---- zijn pretrained vgm, gaat over RGB values
])

val_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]), # <---- zijn pretrained vgm, gaat over RGB values
])

# Initialize datasets passing all tabular features
train_dataset = HousingDataset(train_df, tabular_features, transform=train_transforms)
val_dataset = HousingDataset(val_df, tabular_features, transform=val_transforms)
test_dataset = HousingDataset(test_df, tabular_features, transform=val_transforms)

batch_size = 64  # Aan te passen voor GPU

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=0)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=2)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=2)


In [None]:
# --- Section 5: Model Definition ---

###### MODELLEN GAAN WE AAN SLEUTELEN, ENKEL NU VOOR DE PIPELINE ALS TEMPLATE #####

class MultimodalHousingClassifier(nn.Module):
    def __init__(self, tabular_input_dim, num_classes, cnn_output_dim=512, tabular_emb_dim=128, pretrained=True):
        super().__init__()
        
        # CNN Backbone: Pretrained ResNet18 without final FC layer
        # DIT MODEL KUNNEN WE AANPASSEN NAAR BIJV EfficientNet ETc. TODO
        resnet = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1) # TODO onderzoek naar pretrained weights
        # Remove last FC layer
        self.cnn_backbone = nn.Sequential(*list(resnet.children())[:-1])  # output: (batch, 512, 1, 1)
        
        # Flatten CNN output to (batch, 512)
        self.cnn_output_dim = cnn_output_dim
        
        # MLP for tabular features
        # DIT MODEL KUNNEN WE AANPASSEN NAAR BIJV TabNet ETc. TODO
        self.tabular_mlp = nn.Sequential(
            nn.Linear(tabular_input_dim, 256),
            nn.ReLU(),
            nn.BatchNorm1d(256),
            nn.Linear(256, tabular_emb_dim),
            nn.ReLU(),
        )
        
        # Fusion layer: concatenate CNN + tabular embeddings
        fusion_input_dim = cnn_output_dim + tabular_emb_dim
        self.classifier = nn.Sequential(
            nn.Linear(fusion_input_dim, 256),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(256, num_classes)
        )
        
    def forward(self, image, tabular_data):
        # Image forward pass through CNN backbone
        cnn_features = self.cnn_backbone(image)  # (batch, 512, 1, 1)
        cnn_features = cnn_features.view(-1, self.cnn_output_dim)  # flatten
        
        # Tabular features through MLP
        tabular_features = self.tabular_mlp(tabular_data)
        
        # Concatenate features
        combined = torch.cat([cnn_features, tabular_features], dim=1)
        
        # Classification head
        output = self.classifier(combined)
        return output


In [None]:
# --- Section 6: Typical Training Loop ---

from tqdm import tqdm

def train_one_epoch(model, dataloader, optimizer, criterion, device, log_interval=10):
    model.train()
    running_loss = 0
    all_preds = []
    all_labels = []

    loop = tqdm(dataloader, leave=True, dynamic_ncols=True)

    print("Starting training loop...", flush=True)
    for batch_idx, (images, tabular_data, labels) in loop:
        for i, (images, tabular_data, labels) in enumerate(loop):
            print(f"Batch {i+1}", flush=True)

        images = images.to(device)
        tabular_data = tabular_data.to(device)
        labels = labels.to(device)

        optimizer.zero_grad()
        outputs = model(images, tabular_data)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)
        preds = torch.argmax(outputs, dim=1).cpu().numpy()
        all_preds.extend(preds)
        all_labels.extend(labels.cpu().numpy())

        if batch_idx % log_interval == 0:
            current_loss = loss.item()
            current_acc = (preds == labels.cpu().numpy()).mean()
            print(f"[Batch {batch_idx}/{len(dataloader)}] Loss: {current_loss:.4f} | Acc: {current_acc:.4f}")

        loop.set_description(f"Train [{batch_idx+1}/{len(dataloader)}]")
        loop.set_postfix(loss=loss.item())

    epoch_loss = running_loss / len(dataloader.dataset)
    epoch_acc = accuracy_score(all_labels, all_preds)
    epoch_f1 = f1_score(all_labels, all_preds, average='weighted')
    epoch_recall = recall_score(all_labels, all_preds, average='weighted')

    return epoch_loss, epoch_acc, epoch_f1, epoch_recall

def validate_one_epoch(model, dataloader, criterion, device, log_interval=10):
    model.eval()
    running_loss = 0
    all_preds = []
    all_labels = []

    with torch.no_grad():
        loop = tqdm(enumerate(dataloader), total=len(dataloader), leave=False)
        for batch_idx, (images, tabular_data, labels) in loop:
            images = images.to(device)
            tabular_data = tabular_data.to(device)
            labels = labels.to(device)

            outputs = model(images, tabular_data)
            loss = criterion(outputs, labels)

            running_loss += loss.item() * images.size(0)
            preds = torch.argmax(outputs, dim=1).cpu().numpy()
            all_preds.extend(preds)
            all_labels.extend(labels.cpu().numpy())

            if batch_idx % log_interval == 0:
                current_loss = loss.item()
                current_acc = (preds == labels.cpu().numpy()).mean()
                print(f"[Val Batch {batch_idx}/{len(dataloader)}] Loss: {current_loss:.4f} | Acc: {current_acc:.4f}")

            loop.set_description(f"Val [{batch_idx+1}/{len(dataloader)}]")
            loop.set_postfix(loss=loss.item())

    epoch_loss = running_loss / len(dataloader.dataset)
    epoch_acc = accuracy_score(all_labels, all_preds)
    epoch_f1 = f1_score(all_labels, all_preds, average='weighted')
    epoch_recall = recall_score(all_labels, all_preds, average='weighted')

    return epoch_loss, epoch_acc, epoch_f1, epoch_recall


def train_model(model, train_loader, val_loader, optimizer, criterion, device, num_epochs=10):
    best_val_f1 = 0

    for epoch in range(1, num_epochs + 1):
        train_loss, train_acc, train_f1, train_recall = train_one_epoch(model, train_loader, optimizer, criterion, device)
        val_loss, val_acc, val_f1, val_recall = validate_one_epoch(model, val_loader, criterion, device)

        print(f"Epoch {epoch}/{num_epochs}", flush=True)
        print(f"Train Loss: {train_loss:.4f}, Acc: {train_acc:.4f}, F1: {train_f1:.4f}, Recall: {train_recall:.4f}")
        print(f"Val   Loss: {val_loss:.4f}, Acc: {val_acc:.4f}, F1: {val_f1:.4f}, Recall: {val_recall:.4f}")

        # Save best model based on validation F1
        if val_f1 > best_val_f1:
            best_val_f1 = val_f1
            torch.save(model.state_dict(), 'best_model.pth')
            print("Saved best model")

    print("Training complete.")


In [None]:
# --- Section 7: Evaluation & Export ---

import torch
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

def evaluate_model(model, dataloader, device, class_names=None):
    model.eval()
    all_preds = []
    all_labels = []

    with torch.no_grad():
        for images, tabular_data, labels in dataloader:
            images = images.to(device)
            tabular_data = tabular_data.to(device)
            labels = labels.to(device)

            outputs = model(images, tabular_data)
            preds = torch.argmax(outputs, dim=1).cpu().numpy()
            all_preds.extend(preds)
            all_labels.extend(labels.cpu().numpy())

    print("Classification Report:")
    print(classification_report(all_labels, all_preds, target_names=class_names))

    cm = confusion_matrix(all_labels, all_preds)
    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt="d", cmap='Blues', xticklabels=class_names, yticklabels=class_names)
    plt.xlabel('Predicted')
    plt.ylabel('True')
    plt.title('Confusion Matrix')
    plt.show()

def save_model(model, path='trained_model.pth'):
    torch.save(model.state_dict(), path)
    print(f"Model saved to {path}")

def load_model(model, path='trained_model.pth', device='cpu'):
    model.load_state_dict(torch.load(path, map_location=device))
    model.to(device)
    model.eval()
    print(f"Model loaded from {path}")


In [None]:
# --- Section 8: Test the Full Pipeline ---

# Setup model parameters
num_numeric = len(numeric_features)
num_categorical = len(categorical_features)
tabular_input_dim = num_numeric + num_categorical
num_classes = df['woningtype'].nunique()

# Initialize model and move to device
model = MultimodalHousingClassifier(tabular_input_dim=tabular_input_dim, num_classes=num_classes).to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

# Quick test: run one epoch of training
train_loss, train_acc, train_f1, train_recall = train_one_epoch(model, train_loader, optimizer, criterion, device)
print(f"Quick Train Epoch -> Loss: {train_loss:.4f}, Acc: {train_acc:.4f}, F1: {train_f1:.4f}, Recall: {train_recall:.4f}")

# Quick test: validate
val_loss, val_acc, val_f1, val_recall = validate_one_epoch(model, val_loader, criterion, device)
print(f"Quick Validation -> Loss: {val_loss:.4f}, Acc: {val_acc:.4f}, F1: {val_f1:.4f}, Recall: {val_recall:.4f}")

# Evaluate model with detailed metrics and confusion matrix
class_names = sorted(df['woningtype'].unique())
evaluate_model(model, val_loader, device, class_names=class_names)

# Save model checkpoint
save_model(model, 'final_model.pth')

# Optionally load the model and test on test set
load_model(model, 'final_model.pth', device=device)
evaluate_model(model, test_loader, device, class_names=class_names)
