# Test-Time Inference and Submission Generation

This notebook performs end to end inference on the test dataset using the
final multimodal model. Satellite image embeddings are extracted using a
pretrained CNN and combined with tabular features to generate property
price predictions for submission.

## 1. Setup and Imports

We import libraries required for image processing, feature extraction,
model loading, and submission file generation.

In [1]:
import os
import numpy as np
import pandas as pd
from tqdm import tqdm

import torch
import torch.nn as nn
from PIL import Image

from torchvision import models, transforms
from torchvision.models import ResNet18_Weights

import joblib

## 2. Loading Test Data

The raw test dataset is loaded along with the corresponding satellite
images that were downloaded earlier.

In [2]:
TEST_IMG_DIR = "../data/images/test"
TEST_TAB_PATH = "../data/raw/test.csv"

test_df = pd.read_csv(TEST_TAB_PATH)

print("Test data shape:", test_df.shape)
test_df.head()

Test data shape: (5404, 20)


Unnamed: 0,id,date,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
0,2591820310,20141006T000000,4,2.25,2070,8893,2.0,0,0,4,8,2070,0,1986,0,98058,47.4388,-122.162,2390,7700
1,7974200820,20140821T000000,5,3.0,2900,6730,1.0,0,0,5,8,1830,1070,1977,0,98115,47.6784,-122.285,2370,6283
2,7701450110,20140815T000000,4,2.5,3770,10893,2.0,0,2,3,11,3770,0,1997,0,98006,47.5646,-122.129,3710,9685
3,9522300010,20150331T000000,3,3.5,4560,14608,2.0,0,2,3,12,4560,0,1990,0,98034,47.6995,-122.228,4050,14226
4,9510861140,20140714T000000,3,2.5,2550,5376,2.0,0,0,3,9,2550,0,2004,0,98052,47.6647,-122.083,2250,4050


The test dataset shape confirms the number of properties for which
predictions must be generated.

## 3. CNN Feature Extractor

The same pretrained ResNet-18 architecture used during training is
initialized to extract satellite image embeddings for the test set.

In [3]:
cnn = models.resnet18(weights=ResNet18_Weights.DEFAULT)
cnn.fc = nn.Identity()
cnn.eval()

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

## 4. Image Preprocessing

Satellite images are resized and normalized using ImageNet statistics
to ensure consistency with the training pipeline.

In [4]:
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

## 5. Robust Embedding Extraction

A safe embedding extraction function is used to handle potential
missing or corrupted images during inference.

In [5]:
def extract_embedding_safe(img_path):
    try:
        img = Image.open(img_path).convert("RGB")
        x = transform(img).unsqueeze(0)

        with torch.no_grad():
            emb = cnn(x).squeeze().numpy()

        return emb
    except:
        return None

## 6. Extracting Test Image Embeddings

Satellite image embeddings are extracted for each property in the
test dataset. Only valid image ID pairs are retained.

In [6]:
embeddings = []
valid_ids = []

for _, row in tqdm(test_df.iterrows(), total=len(test_df)):
    img_path = f"{TEST_IMG_DIR}/{row['id']}.png"
    emb = extract_embedding_safe(img_path)

    if emb is not None:
        embeddings.append(emb)
        valid_ids.append(row["id"])

100%|███████████████████████████████████████| 5404/5404 [01:11<00:00, 75.74it/s]


## 7. Creating the Embedding Matrix

Extracted embeddings are stored in a DataFrame with one 512-dimensional
vector per property.

In [7]:
emb_df = pd.DataFrame(
    embeddings,
    columns=[f"img_emb_{i}" for i in range(512)]
)

emb_df["id"] = valid_ids

print("Test embeddings shape:", emb_df.shape)

Test embeddings shape: (5404, 513)


## 8. Integrity Check: Duplicate Embeddings

We explicitly check for duplicate property IDs to ensure that each
test property is represented exactly once.

In [8]:
dup_ids = emb_df["id"][emb_df["id"].duplicated()]
print("Duplicate embedding IDs:", dup_ids.nunique())

Duplicate embedding IDs: 8


Duplicate embeddings, if any, are removed to maintain a one to one
mapping between property IDs and feature vectors.

In [9]:
emb_df = emb_df.drop_duplicates(subset="id", keep="first")

print("Embeddings after dedup:", emb_df.shape)
print("Unique embedding IDs:", emb_df["id"].nunique())

Embeddings after dedup: (5396, 513)
Unique embedding IDs: 5396


## 9. Aligning Tabular and Visual Features

The tabular test dataset is filtered to include only properties with
valid satellite image embeddings.

In [10]:
test_df_mm = test_df[test_df["id"].isin(valid_ids)].copy()

print("Filtered test data shape:", test_df_mm.shape)

Filtered test data shape: (5404, 20)


## 10. Constructing the Multimodal Test Dataset

Tabular features and image embeddings are merged using property IDs
to form the final multimodal feature matrix for inference.

In [11]:
full_test_df = test_df_mm.merge(emb_df, on="id", how="inner")

print("Final test multimodal shape:", full_test_df.shape)

Final test multimodal shape: (5404, 532)


## 11. Preparing Features for Prediction

Non predictive columns such as identifiers and timestamps are removed
before passing features to the trained model.

In [12]:
X_test = full_test_df.drop(columns=["id", "date"])

print("X_test shape:", X_test.shape)

X_test shape: (5404, 530)


## 12. Loading the Trained Multimodal Model

The trained XGBoost multimodal model is loaded from disk for test-time
inference.

In [13]:
xgb_mm = joblib.load("../models/xgb_multimodal.pkl")

## 13. Generating Test Predictions

The multimodal model is used to predict property prices for the test set.

In [14]:
test_preds = xgb_mm.predict(X_test)

test_preds[:5]

array([ 347365.2 ,  805935.25, 1107798.5 , 1927297.8 ,  714824.06],
      dtype=float32)

## 14. Creating the Submission File

Predictions are combined with property IDs to create the final
submission file in the required format.

In [15]:
submission = pd.DataFrame({
    "id": full_test_df["id"],
    "predicted_price": test_preds
})

submission.head()

Unnamed: 0,id,predicted_price
0,2591820310,347365.2
1,7974200820,805935.2
2,7701450110,1107798.0
3,9522300010,1927298.0
4,9510861140,714824.1


## 15. Saving Predictions

The submission file is saved as a CSV for upload to the evaluation portal.

In [16]:
SUB_PATH = "../submissions/23119016_final.csv"
submission.to_csv(SUB_PATH, index=False)

print("Saved submission to:", SUB_PATH)

Saved submission to: ../submissions/23119016_final.csv


## 16. Final Validation Checks

We verify the submission shape and ensure that no missing values
are present in the prediction file.

In [17]:
submission.shape

(5404, 2)

In [18]:
submission.isnull().sum()

id                 0
predicted_price    0
dtype: int64

## Summary

This notebook completes the end to end multimodal inference pipeline.
Satellite image embeddings and tabular features are successfully combined
to generate price predictions for all test properties. The resulting
submission file is validated and ready for evaluation.