**V2 Testing**
# Critical implementation bugs (fix first)


1. pydicom API misuse

- AttributeError: module 'pydicom' has no attribute 'read_file'. Use the supported API: pydicom.dcmread(...) (older read_file deprecations).
- Verify pydicom version and use dcmread. Also use .pixel_array only after confirming hasattr.

2. Random projection inside forward pass (major bug)

- tab_proj = F.linear(tab_expanded, torch.randn(1024, 512).to(images.device))
- You are using a new random matrix every forward pass. That destroys learning of cross-modal attention. Replace with a trainable nn.Linear(512, 1024) (initialized once). This is probably the single biggest structural bug.

3. Inconsistent sigma / scale handling and arbitrary multipliers

- In training you clamp sigma = exp(log_var/2) to min 2.0 — while evaluation uses sigma floor 70.0. In submission you set confidence_val = max(exp(log_var/2) * 70, 70) (why multiply by 70?). This mismatch creates meaningless confidences and ruins LLL.
- Fix: keep internal sigma in natural units of FVC; do not multiply by arbitrary constants. Only apply the contest evaluation floor (70) at scoring time — not during training.

4. Submission pipeline clipping hides bug

- You clamp predictions to [800,6000] during submission — all predictions hitting 800 likely reflect upstream underprediction or wrong scale. Remove clipping while debugging; investigate raw predictions distribution first.

5. Hard-coded patient exclusions & path issues

- Filtering out two patient IDs silently in dataset init is suspicious. Document why or remove. Ensure mapping patient → image files is correct and complete.

6. ModelWithConfidence referenced but not defined → NameError later. Keep code consistent.

In [None]:
# Downgrade numpy and scipy to compatible versions for scikit-learn and scipy
!pip install --quiet numpy==1.26.4 scipy==1.13.0
import os
os._exit(00)  # Force kernel restart after pip install

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.6/60.6 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.3/18.3 MB[0m [31m90.1 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m38.6/38.6 MB[0m [31m46.8 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
bigframes 2.8.0 requires google-cloud-bigquery-storage<3.0.0,>=2.30.0, which is not installed.
pylibjpeg-libjpeg 2.3.0 requires numpy<3.0,>=2.0, but you have numpy 1.26.4 which is incompatible.
datasets 3.6.0 requires fsspec[http]<=2025.3.0,>=2023.1.0, but you have fsspec 2025.5.1 which is incompa

In [3]:
# =============================================================================
# Revised Implementation: Tabular + Image Model with Laplace Log-Likelihood Loss
# =============================================================================

# Imports
import os
import cv2
import pydicom
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torchvision.models as models
from torch.utils.data import Dataset, DataLoader
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GroupKFold
import albumentations as albu
from albumentations.pytorch import ToTensorV2
import math
import random
from pathlib import Path

# Set device
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Device: {DEVICE}")

# Seed for reproducibility
def seed_everything(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False

seed_everything(42)

# =============================================================================
# Load Data and Basic Processing
# =============================================================================

DATA_DIR = Path("/kaggle/input/osic-pulmonary-fibrosis-progression")
TRAIN_DIR = DATA_DIR / "train"
TRAIN_CSV = DATA_DIR / "train.csv"

# Load training csv
train_df = pd.read_csv(TRAIN_CSV)

# Compute baseline features for each patient
baseline_features = {}
patient_data_dict = {}
for patient in train_df['Patient'].unique():
    patient_data = train_df[train_df['Patient'] == patient].sort_values('Weeks')
    baseline = patient_data.iloc[0]
    patient_data_dict[patient] = patient_data
    baseline_features[patient] = {
        'Age': baseline['Age'],
        'Sex': baseline['Sex'],
        'SmokingStatus': baseline['SmokingStatus'],
        'BaselineFVC': baseline['FVC'],
        'BaselineWeeks': baseline['Weeks'],
        'Percent': baseline.get('Percent', 50.0)
    }

# Prepare scalers (8 features w/o week delta, 9 with)
tabular_data_8 = []
tabular_data_9 = []
for feats in baseline_features.values():
    row_8 = [
        feats['Age'],
        1 if feats['Sex']=='Female' else 0,
        1 if feats['SmokingStatus']=='Never smoked' else 0,
        1 if feats['SmokingStatus']=='Ex-smoker' else 0,
        1 if feats['SmokingStatus']=='Currently smokes' else 0,
        feats['BaselineFVC'],
        feats['BaselineWeeks'],
        feats['Percent']
    ]
    tabular_data_8.append(row_8)
    tabular_data_9.append(row_8 + [0.0])  # baseline delta = 0
scaler_8 = StandardScaler().fit(tabular_data_8)
scaler_9 = StandardScaler().fit(tabular_data_9)

# Function to get scaled tabular features
def get_tabular_features(patient_id, current_week):
    feats = baseline_features[patient_id]
    features = [
        feats['Age'],
        1 if feats['Sex']=='Female' else 0,
        1 if feats['SmokingStatus']=='Never smoked' else 0,
        1 if feats['SmokingStatus']=='Ex-smoker' else 0,
        1 if feats['SmokingStatus']=='Currently smokes' else 0,
        feats['BaselineFVC'],
        feats['BaselineWeeks'],
        feats['Percent']
    ]
    week_delta = current_week - feats['BaselineWeeks']
    features.append(week_delta)
    scaled = scaler_9.transform([features])
    return scaled.flatten().astype(np.float32)

# Create target dict: patient -> {week: FVC}
target_dict = {}
for patient, data in patient_data_dict.items():
    target_dict[patient] = {row.Weeks: row.FVC for _, row in data.iterrows()}

# =============================================================================
# Dataset: Tabular + Image
# =============================================================================

class TabularImageDataset(Dataset):
    def __init__(self, patient_data, target_dict, img_dir, transform=None):
        """
        patient_data: dict of patient -> DataFrame (weeks, FVC, etc)
        img_dir: base directory with patient subfolders of DICOM images
        transform: optional image transformation (albumentations)
        """
        self.samples = []
        self.img_dir = img_dir
        self.transform = transform
        self.patient_images = {}
        
        # Gather samples and baseline image
        for patient, data in patient_data.items():
            data = data.sort_values('Weeks')
            patient_folder = img_dir / patient
            if patient_folder.exists():
                files = sorted([str(f) for f in patient_folder.rglob("*.dcm")])
                if files:
                    # Choose middle slice as representative CT image
                    mid_idx = len(files)//2
                    self.patient_images[patient] = files[mid_idx]
                else:
                    self.patient_images[patient] = None
            else:
                self.patient_images[patient] = None
            
            for _, row in data.iterrows():
                self.samples.append((patient, row['Weeks'], row['FVC']))
        print(f"Dataset initialized: {len(self.samples)} samples for {len(self.patient_images)} patients")

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        patient, week, fvc = self.samples[idx]
        # Tabular features
        tab_feats = get_tabular_features(patient, week)
        tab_tensor = torch.tensor(tab_feats, dtype=torch.float32)
        # Image features
        img_path = self.patient_images.get(patient, None)
        if img_path is not None:
            ds = pydicom.dcmread(img_path)
            img = ds.pixel_array.astype(np.float32)
            # Normalize to [0, 1]
            img -= img.min()
            img /= (img.max() + 1e-6)
            # Convert to 3 channels
            img = np.stack([img, img, img], axis=-1)
            if self.transform:
                img = self.transform(image=img)['image']  # Albumentations returns a dict
            else:
                img = torch.from_numpy(img.transpose(2, 0, 1)).float()
        else:
            # If no image found, use zeros
            img = torch.zeros((3, 224, 224), dtype=torch.float32)
        target = torch.tensor(fvc, dtype=torch.float32)
        return img, tab_tensor, target

# =============================================================================
# Model: CNN (ResNet) + Tabular MLP
# =============================================================================

class TabularImageModel(nn.Module):
    def __init__(self, tab_input_dim=9, pretrained=True):
        super().__init__()
        # CNN backbone (ResNet18)
        self.cnn = models.resnet18(pretrained=pretrained)
        num_features = self.cnn.fc.in_features
        self.cnn.fc = nn.Identity()  # Remove final classification layer
        
        # Tabular branch MLP
        self.tab_net = nn.Sequential(
            nn.Linear(tab_input_dim, 32), nn.ReLU(),
            nn.Linear(32, 32), nn.ReLU()
        )
        
        # Combined head
        self.head = nn.Sequential(
            nn.Linear(num_features + 32, 64),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 1)
        )
        # Learnable log-scale (sigma) parameter for Laplace
        self.log_scale = nn.Parameter(torch.tensor(0.0))
    
    def forward(self, img, tab):
        img_feat = self.cnn(img)               # [batch, num_features]
        tab_feat = self.tab_net(tab)           # [batch, 32]
        combined = torch.cat([img_feat, tab_feat], dim=1)
        mean_pred = self.head(combined).squeeze(-1)
        log_scale = self.log_scale.expand(mean_pred.size(0))
        return mean_pred, log_scale

# =============================================================================
# Loss and Metrics (Laplace log-likelihood)
# =============================================================================

def laplace_nll_loss(mean_pred, log_scale, targets):
    scale = torch.exp(log_scale) + 1e-6
    return torch.mean(torch.abs(targets - mean_pred) / scale + log_scale + math.log(2))

def laplace_log_likelihood(y_true, y_pred, sigma):
    sigma = np.maximum(sigma, 1e-6)
    delta = np.abs(y_true - y_pred)
    return -np.sqrt(2) * delta / sigma - np.log(np.sqrt(2) * sigma)

def compute_metrics(mean_pred, log_scale, targets):
    mean_np = mean_pred.detach().cpu().numpy()
    targ_np = targets.detach().cpu().numpy()
    sigma_np = np.exp(log_scale.detach().cpu().numpy())
    mse = ((targ_np - mean_np)**2).mean()
    rmse = np.sqrt(mse)
    # R2 score
    r2 = 1 - np.sum((targ_np - mean_np)**2) / np.sum((targ_np - targ_np.mean())**2)
    lll = np.mean(laplace_log_likelihood(targ_np, mean_np, sigma_np))
    return {'mse': mse, 'rmse': rmse, 'r2': r2, 'lll': lll}

# =============================================================================
# Prepare DataLoaders
# =============================================================================

# Split patients into train/val groups using GroupKFold
patients = list(baseline_features.keys())
gkf = GroupKFold(n_splits=5)
train_idx, val_idx = next(gkf.split(patients, groups=patients))
train_patients = [patients[i] for i in train_idx]
val_patients = [patients[i] for i in val_idx]

train_data = {p: patient_data_dict[p] for p in train_patients}
val_data = {p: patient_data_dict[p] for p in val_patients}

# Image transformation: resize + normalize
transform = albu.Compose([
    albu.Resize(224, 224),
    albu.Normalize(mean=(0.0, 0.0, 0.0), std=(1.0, 1.0, 1.0)),
    ToTensorV2()
])

train_dataset = TabularImageDataset(train_data, target_dict, TRAIN_DIR, transform=transform)
val_dataset = TabularImageDataset(val_data, target_dict, TRAIN_DIR, transform=transform)

BATCH_SIZE = 8
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=2)

# =============================================================================
# Training Loop
# =============================================================================

class Trainer:
    def __init__(self, model, device, lr=1e-3):
        self.model = model.to(device)
        self.device = device
        self.optimizer = torch.optim.Adam(self.model.parameters(), lr=lr)
        self.scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
            self.optimizer, mode='min', patience=3, factor=0.5, verbose=True)
        self.best_lll = -float('inf')
    
    def train_epoch(self, loader):
        self.model.train()
        total_loss = 0
        metrics_sum = {'mse': 0, 'rmse': 0, 'r2': 0, 'lll': 0}
        for images, tabular, targets in loader:
            images = images.to(self.device)
            tabular = tabular.to(self.device)
            targets = targets.to(self.device)
            self.optimizer.zero_grad()
            mean_pred, log_scale = self.model(images, tabular)
            loss = laplace_nll_loss(mean_pred, log_scale, targets)
            loss.backward()
            torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)
            self.optimizer.step()
            total_loss += loss.item()
            batch_metrics = compute_metrics(mean_pred, log_scale, targets)
            for k in metrics_sum:
                metrics_sum[k] += batch_metrics[k]
        n_batches = len(loader)
        avg_metrics = {k: v / n_batches for k,v in metrics_sum.items()}
        return total_loss / n_batches, avg_metrics
    
    def validate(self, loader):
        self.model.eval()
        total_loss = 0
        metrics_sum = {'mse': 0, 'rmse': 0, 'r2': 0, 'lll': 0}
        with torch.no_grad():
            for images, tabular, targets in loader:
                images = images.to(self.device)
                tabular = tabular.to(self.device)
                targets = targets.to(self.device)
                mean_pred, log_scale = self.model(images, tabular)
                loss = laplace_nll_loss(mean_pred, log_scale, targets)
                total_loss += loss.item()
                batch_metrics = compute_metrics(mean_pred, log_scale, targets)
                for k in metrics_sum:
                    metrics_sum[k] += batch_metrics[k]
        n_batches = len(loader)
        avg_metrics = {k: v / n_batches for k,v in metrics_sum.items()}
        return total_loss / n_batches, avg_metrics
    
    def train(self, train_loader, val_loader, epochs=10):
        for epoch in range(1, epochs+1):
            train_loss, train_metrics = self.train_epoch(train_loader)
            val_loss, val_metrics = self.validate(val_loader)
            self.scheduler.step(val_loss)
            print(f"Epoch {epoch}/{epochs}: Train Loss={train_loss:.3f}, Val Loss={val_loss:.3f}")
            print(f"  Train RMSE: {train_metrics['rmse']:.1f}, R2: {train_metrics['r2']:.3f}, LLL: {train_metrics['lll']:.3f}")
            print(f"  Val   RMSE: {val_metrics['rmse']:.1f}, R2: {val_metrics['r2']:.3f}, LLL: {val_metrics['lll']:.3f}")
            if val_metrics['lll'] > self.best_lll:
                self.best_lll = val_metrics['lll']
                torch.save(self.model.state_dict(), "best_model.pth")
                print("  --> New best model saved (improved LLL).")
                print("-" * 50)




Device: cuda
Dataset initialized: 1236 samples for 140 patients
Dataset initialized: 313 samples for 36 patients


In [4]:
# Instantiate model and train
model = TabularImageModel(tab_input_dim=9, pretrained=True)
trainer = Trainer(model, DEVICE, lr=1e-4)
trainer.train(train_loader, val_loader, epochs=10)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 229MB/s]


Epoch 1/10: Train Loss=2734.969, Val Loss=2333.327
  Train RMSE: 2867.8, R2: -17.617, LLL: -3866.810
  Val   RMSE: 2401.7, R2: -inf, LLL: -3299.186
  --> New best model saved (improved LLL).
--------------------------------------------------
Epoch 2/10: Train Loss=2616.116, Val Loss=2144.188
  Train RMSE: 2793.6, R2: -16.421, LLL: -3698.737
  Val   RMSE: 2246.7, R2: -inf, LLL: -3031.696
  --> New best model saved (improved LLL).
--------------------------------------------------
Epoch 3/10: Train Loss=2286.081, Val Loss=1650.843
  Train RMSE: 2501.8, R2: -12.086, LLL: -3232.047
  Val   RMSE: 1784.8, R2: -inf, LLL: -2333.994
  --> New best model saved (improved LLL).
--------------------------------------------------
Epoch 4/10: Train Loss=1545.156, Val Loss=879.873
  Train RMSE: 1818.6, R2: -6.423, LLL: -2184.350
  Val   RMSE: 1042.0, R2: -inf, LLL: -1243.672
  --> New best model saved (improved LLL).
--------------------------------------------------
Epoch 5/10: Train Loss=925.533, Va

In [5]:
# ============================================================
# OSIC: Tabular + 2.5D CT Model with Proper Laplace Objective
# ============================================================

import os, random, math, glob
from pathlib import Path

import numpy as np
import pandas as pd
import pydicom
import cv2
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models

from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import GroupKFold
from sklearn.preprocessing import StandardScaler

import albumentations as albu
from albumentations.pytorch import ToTensorV2

# ---------------------------
# Reproducibility & Device
# ---------------------------
def seed_everything(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False

seed_everything(42)
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device:", DEVICE)

# ---------------------------
# Data paths
# ---------------------------
DATA_DIR = Path("/kaggle/input/osic-pulmonary-fibrosis-progression")
TRAIN_DIR = DATA_DIR / "train"
TRAIN_CSV = DATA_DIR / "train.csv"

df = pd.read_csv(TRAIN_CSV)

# ---------------------------
# Patient-level structures
# ---------------------------
patient_data = {}
baseline = {}
for pid in df["Patient"].unique():
    d = df[df["Patient"] == pid].sort_values("Weeks")
    patient_data[pid] = d.reset_index(drop=True)
    b = d.iloc[0]
    baseline[pid] = {
        "Age": b["Age"],
        "Sex": b["Sex"],  # "Male"/"Female"
        "SmokingStatus": b["SmokingStatus"],
        "BaselineFVC": b["FVC"],
        "BaselineWeeks": b["Weeks"],
        "Percent": b.get("Percent", 50.0),
    }

# ---------------------------
# KFold split by patient
# ---------------------------
patients = list(patient_data.keys())
gkf = GroupKFold(n_splits=5)
train_idx, val_idx = next(gkf.split(patients, groups=patients))
train_pids = [patients[i] for i in train_idx]
val_pids   = [patients[i] for i in val_idx]

train_pat_data = {p: patient_data[p] for p in train_pids}
val_pat_data   = {p: patient_data[p] for p in val_pids}

# ---------------------------
# Scalers: fit on TRAIN ONLY
# ---------------------------
def build_rows(feats, week_delta=0.0):
    return [
        feats["Age"],
        1 if feats["Sex"] == "Female" else 0,
        1 if feats["SmokingStatus"] == "Never smoked" else 0,
        1 if feats["SmokingStatus"] == "Ex-smoker" else 0,
        1 if feats["SmokingStatus"] == "Currently smokes" else 0,
        feats["BaselineFVC"],
        feats["BaselineWeeks"],
        feats["Percent"],
        week_delta,
    ]

X_train_for_scaler = []
for p in train_pids:
    X_train_for_scaler.append(build_rows(baseline[p], week_delta=0.0))
X_train_for_scaler = np.array(X_train_for_scaler, dtype=np.float32)
scaler = StandardScaler().fit(X_train_for_scaler)  # 9 features incl. delta

def get_features(pid, current_week):
    feats = baseline[pid]
    week_delta = float(current_week - feats["BaselineWeeks"])
    x = np.array(build_rows(feats, week_delta), dtype=np.float32).reshape(1, -1)
    x = scaler.transform(x).astype(np.float32).flatten()
    return x

# ---------------------------
# DICOM utils (HU-ish window)
# ---------------------------
def load_dicom(path):
    d = pydicom.dcmread(path, force=True)
    img = d.pixel_array.astype(np.float32)

    # HU conversion if possible
    intercept = getattr(d, "RescaleIntercept", 0.0)
    slope = getattr(d, "RescaleSlope", 1.0)
    img = img * float(slope) + float(intercept)

    # Window to lung-ish range and normalize to [-1, 1]
    img = np.clip(img, -1200, 600)
    img = (img + 300.0) / 900.0  # center ~ -300 HU
    img = np.clip(img, -1.0, 1.0)
    return img

def get_patient_slices(patient_folder):
    files = sorted(glob.glob(str(patient_folder / "*.dcm")))
    return files

def load_2p5d(patient_folder, target_hw=(224, 224)):
    files = get_patient_slices(patient_folder)
    if len(files) == 0:
        # 3 blank channels for missing patients
        return np.zeros((*target_hw, 3), dtype=np.float32)

    mid = len(files) // 2
    idxs = [max(0, mid - 1), mid, min(len(files) - 1, mid + 1)]
    slices = []
    for i in idxs:
        img = load_dicom(files[i])
        if img.shape != target_hw:
            img = cv2.resize(img, target_hw, interpolation=cv2.INTER_AREA)
        slices.append(img)
    arr = np.stack(slices, axis=-1)  # H x W x 3 (three adjacent slices)
    return arr

# ---------------------------
# Albumentations
# ---------------------------
train_tfms = albu.Compose([
    albu.HorizontalFlip(p=0.25),
    albu.ShiftScaleRotate(shift_limit=0.03, scale_limit=0.05, rotate_limit=5, p=0.3, border_mode=cv2.BORDER_REFLECT_101),
    ToTensorV2()
])

val_tfms = albu.Compose([ToTensorV2()])

# ---------------------------
# Dataset
# ---------------------------
class OSICDataset(Dataset):
    def __init__(self, pat_dict, split="train", img_dir=TRAIN_DIR, tfms=None):
        self.samples = []
        self.img_dir = img_dir
        self.tfms = tfms
        self.split = split

        # Pre-compute a per-patient representative 2.5D volume (mid ±1)
        self.patient_img = {}
        for p, d in pat_dict.items():
            folder = img_dir / p
            if folder.exists():
                arr = load_2p5d(folder, target_hw=(224, 224))  # H,W,3
            else:
                arr = np.zeros((224, 224, 3), dtype=np.float32)
            self.patient_img[p] = arr  # cached

            # build samples (one per visit)
            for _, row in d.iterrows():
                self.samples.append((p, int(row["Weeks"]), float(row["FVC"])))

        print(f"{split} dataset: {len(self.samples)} samples, {len(self.patient_img)} patients")

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        pid, week, fvc = self.samples[idx]
        x_tab = get_features(pid, week)                       # (9,)
        img = self.patient_img[pid]                           # H,W,3
        if self.tfms is not None:
            img = self.tfms(image=img)["image"]               # 3,H,W
        else:
            img = torch.from_numpy(img.transpose(2, 0, 1))    # 3,H,W

        return img.float(), torch.tensor(x_tab, dtype=torch.float32), torch.tensor(fvc, dtype=torch.float32)

# ---------------------------
# Model: ResNet18 + Tabular MLP
# ---------------------------
class FusionNet(nn.Module):
    def __init__(self, tab_dim=9, pretrained=True):
        super().__init__()
        self.cnn = models.resnet18(pretrained=pretrained)
        n = self.cnn.fc.in_features
        self.cnn.fc = nn.Identity()

        self.tab = nn.Sequential(
            nn.Linear(tab_dim, 64), nn.ReLU(),
            nn.Linear(64, 64), nn.ReLU()
        )

        self.fuse = nn.Sequential(
            nn.Linear(n + 64, 128), nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(128, 64), nn.ReLU()
        )

        # Separate heads: mean and log_sigma
        self.head_mean = nn.Linear(64, 1)
        self.head_logsig = nn.Linear(64, 1)

    def forward(self, img, tab):
        f_img = self.cnn(img)          # [B, n]
        f_tab = self.tab(tab)          # [B, 64]
        f = torch.cat([f_img, f_tab], dim=1)
        f = self.fuse(f)               # [B, 64]
        mean = self.head_mean(f).squeeze(-1)
        log_sig = self.head_logsig(f).squeeze(-1)
        return mean, log_sig

# ---------------------------
# Loss/metrics (OSIC Laplace)
# ---------------------------
def laplace_nll_with_floor(mean, log_sigma, target, sigma_floor=70.0):
    # OSIC uses sigma_clipped = max(sigma, 70)
    sigma = torch.exp(log_sigma)
    sigma = torch.clamp(sigma, min=sigma_floor)
    # Negative of OSIC LLL to minimize:
    # LLL = -sqrt(2)*|e|/sigma - log(sqrt(2)*sigma)
    e = torch.abs(target - mean)
    loss = (math.sqrt(2.0) * e / sigma) + torch.log(math.sqrt(2.0) * sigma)
    return loss.mean()

@torch.no_grad()
def evaluate_loader(model, loader, device, sigma_floor=70.0):
    model.eval()
    all_mean = []
    all_sigma = []
    all_target = []
    for img, tab, y in loader:
        img, tab, y = img.to(device), tab.to(device), y.to(device)
        mean, log_sigma = model(img, tab)
        sigma = torch.exp(log_sigma)
        sigma = torch.clamp(sigma, min=sigma_floor)
        all_mean.append(mean.cpu().numpy())
        all_sigma.append(sigma.cpu().numpy())
        all_target.append(y.cpu().numpy())
    yhat = np.concatenate(all_mean)
    sigma = np.concatenate(all_sigma)
    y = np.concatenate(all_target)

    # Metrics across full validation set
    mse = np.mean((y - yhat) ** 2)
    rmse = float(np.sqrt(mse))
    var = np.var(y)
    r2 = float(1.0 - np.sum((y - yhat) ** 2) / (np.sum((y - np.mean(y)) ** 2) + 1e-12)) if var > 0 else 0.0
    # OSIC LLL (higher is better)
    lll = float(np.mean(-np.sqrt(2.0) * np.abs(y - yhat) / sigma - np.log(np.sqrt(2.0) * sigma)))
    return {"rmse": rmse, "r2": r2, "lll": lll}

# ---------------------------
# DataLoaders
# ---------------------------
train_ds = OSICDataset(train_pat_data, split="train", img_dir=TRAIN_DIR, tfms=train_tfms)
val_ds   = OSICDataset(val_pat_data,   split="val",   img_dir=TRAIN_DIR, tfms=val_tfms)

BATCH_SIZE = 8
train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True,  num_workers=2, pin_memory=True)
val_loader   = DataLoader(val_ds,   batch_size=BATCH_SIZE, shuffle=False, num_workers=2, pin_memory=True)

# ---------------------------
# Trainer
# ---------------------------
class Trainer:
    def __init__(self, model, device, lr=1e-4):
        self.model = model.to(device)
        self.opt = torch.optim.Adam(self.model.parameters(), lr=lr, weight_decay=1e-4)
        self.sched = torch.optim.lr_scheduler.ReduceLROnPlateau(self.opt, mode="min", patience=3, factor=0.5, verbose=True)
        self.best_lll = -1e18

    def fit(self, train_loader, val_loader, epochs=10):
        for epoch in range(1, epochs+1):
            self.model.train()
            losses = []
            for img, tab, y in train_loader:
                img, tab, y = img.to(DEVICE), tab.to(DEVICE), y.to(DEVICE)
                self.opt.zero_grad()
                mean, log_sigma = self.model(img, tab)
                loss = laplace_nll_with_floor(mean, log_sigma, y, sigma_floor=70.0)
                loss.backward()
                torch.nn.utils.clip_grad_norm_(self.model.parameters(), 1.0)
                self.opt.step()
                losses.append(loss.item())
            tr_loss = float(np.mean(losses)) if losses else 0.0

            # Validation
            with torch.no_grad():
                val_loss_batches = []
                for img, tab, y in val_loader:
                    img, tab, y = img.to(DEVICE), tab.to(DEVICE), y.to(DEVICE)
                    mean, log_sigma = self.model(img, tab)
                    vloss = laplace_nll_with_floor(mean, log_sigma, y, sigma_floor=70.0)
                    val_loss_batches.append(vloss.item())
                val_loss = float(np.mean(val_loss_batches)) if val_loss_batches else 0.0

                metrics = evaluate_loader(self.model, val_loader, DEVICE, sigma_floor=70.0)

            self.sched.step(val_loss)
            print(f"Epoch {epoch:02d} | Train NLL: {tr_loss:.3f} | Val NLL: {val_loss:.3f} | "
                  f"Val RMSE: {metrics['rmse']:.1f} | Val R²: {metrics['r2']:.3f} | Val LLL: {metrics['lll']:.3f}")

            if metrics["lll"] > self.best_lll:
                self.best_lll = metrics["lll"]
                torch.save(self.model.state_dict(), "best_model.pth")
                print("  -> Saved new best (LLL).")



Device: cuda
train dataset: 1236 samples, 140 patients
val dataset: 313 samples, 36 patients


In [6]:
# ---------------------------
# Run training
# ---------------------------
model = FusionNet(tab_dim=9, pretrained=True)
trainer = Trainer(model, DEVICE, lr=1e-4)
trainer.fit(train_loader, val_loader, epochs=10)


Epoch 01 | Train NLL: 57.405 | Val NLL: 11.168 | Val RMSE: 2446.6 | Val R²: -11.119 | Val LLL: -11.693
  -> Saved new best (LLL).
Epoch 02 | Train NLL: 10.494 | Val NLL: 9.890 | Val RMSE: 2451.3 | Val R²: -11.167 | Val LLL: -9.601
  -> Saved new best (LLL).
Epoch 03 | Train NLL: 9.976 | Val NLL: 9.728 | Val RMSE: 2457.4 | Val R²: -11.227 | Val LLL: -9.547
  -> Saved new best (LLL).
Epoch 04 | Train NLL: 9.836 | Val NLL: 9.761 | Val RMSE: 2458.6 | Val R²: -11.239 | Val LLL: -9.513
  -> Saved new best (LLL).
Epoch 05 | Train NLL: 9.767 | Val NLL: 9.582 | Val RMSE: 2460.6 | Val R²: -11.259 | Val LLL: -9.475
  -> Saved new best (LLL).
Epoch 06 | Train NLL: 9.747 | Val NLL: 9.598 | Val RMSE: 2460.8 | Val R²: -11.261 | Val LLL: -9.499
Epoch 07 | Train NLL: 9.711 | Val NLL: 9.598 | Val RMSE: 2460.9 | Val R²: -11.262 | Val LLL: -9.528
Epoch 08 | Train NLL: 9.708 | Val NLL: 9.573 | Val RMSE: 2460.1 | Val R²: -11.253 | Val LLL: -9.496
Epoch 09 | Train NLL: 9.700 | Val NLL: 9.582 | Val RMSE: 2459

In [9]:
"""
Tabular-only baseline training script for OSIC (Laplace NLL)
- Predicts FVC per visit (direct target)
- Outputs mean and log_scale (per-sample)
- Uses GroupKFold (patient-wise split)
- Fits scalers on training patients only
- Provides baseline comparisons and sigma calibration
"""

import os, math, random
from pathlib import Path
from collections import defaultdict

import numpy as np
import pandas as pd
from sklearn.model_selection import GroupKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# --------------------------
# Repro / device
# --------------------------
def seed_everything(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False

seed_everything(42)
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device:", DEVICE)

# --------------------------
# Paths & load CSV
# --------------------------
DATA_DIR = Path("/kaggle/input/osic-pulmonary-fibrosis-progression")  # adapt path if needed
train_csv = DATA_DIR / "train.csv"
df = pd.read_csv(train_csv)
print("Loaded rows:", len(df))

# --------------------------
# Build patient-level baseline info and samples
# --------------------------
patients = df['Patient'].unique().tolist()
patient_samples = defaultdict(list)
baseline_info = {}

for p in patients:
    d = df[df['Patient'] == p].sort_values('Weeks').reset_index(drop=True)
    baseline = d.iloc[0]
    baseline_info[p] = {
        'Age': float(baseline['Age']),
        'Sex': baseline['Sex'],             # 'Male'/'Female'
        'SmokingStatus': baseline['SmokingStatus'],
        'BaselineFVC': float(baseline['FVC']),
        'BaselineWeeks': int(baseline['Weeks']),
        'Percent': float(baseline.get('Percent', 50.0))
    }
    for _, row in d.iterrows():
        patient_samples[p].append({'Weeks': int(row['Weeks']), 'FVC': float(row['FVC'])})

# --------------------------
# Feature builder (explicit order)
# --------------------------
def build_raw_features(pid, week):
    b = baseline_info[pid]
    week_delta = float(week - b['BaselineWeeks'])
    # explicit ordering:
    raw = [
        b['Age'],
        1.0 if b['Sex'] == 'Female' else 0.0,
        1.0 if b['SmokingStatus'] == 'Never smoked' else 0.0,
        1.0 if b['SmokingStatus'] == 'Ex-smoker' else 0.0,
        1.0 if b['SmokingStatus'] == 'Currently smokes' else 0.0,
        b['BaselineFVC'],
        b['BaselineWeeks'],
        b['Percent'],
        week_delta
    ]
    return np.array(raw, dtype=np.float32)

# --------------------------
# Build dataset lists (samples)
# --------------------------
samples = []  # (patient, week, fvc)
for p, recs in patient_samples.items():
    for rec in recs:
        samples.append((p, rec['Weeks'], rec['FVC']))

print("Total samples:", len(samples))
# Shuffle samples for robustness in DataLoader (split by patient later)
random.shuffle(samples)

# --------------------------
# GroupKFold split on patients (train-val)
# --------------------------
unique_patients = list(baseline_info.keys())
gkf = GroupKFold(n_splits=5)
train_pid_idx, val_pid_idx = next(gkf.split(unique_patients, groups=unique_patients))
train_pids = [unique_patients[i] for i in train_pid_idx]
val_pids   = [unique_patients[i] for i in val_pid_idx]
print("Train patients:", len(train_pids), "Val patients:", len(val_pids))

# Build sample lists per split
train_samples = [s for s in samples if s[0] in train_pids]
val_samples   = [s for s in samples if s[0] in val_pids]
print("Train samples:", len(train_samples), "Val samples:", len(val_samples))

# --------------------------
# Fit scalers on TRAIN patients only
# --------------------------
X_train_for_scaler = np.stack([build_raw_features(p, w) for p, w, _ in train_samples])
scaler = StandardScaler().fit(X_train_for_scaler)

def make_features(pid, week):
    x_raw = build_raw_features(pid, week).reshape(1, -1)
    x_scaled = scaler.transform(x_raw).astype(np.float32).flatten()
    return x_scaled

# --------------------------
# Dataset & DataLoader
# --------------------------
class TabularDataset(Dataset):
    def __init__(self, samples):
        self.samples = samples
    def __len__(self):
        return len(self.samples)
    def __getitem__(self, idx):
        pid, week, fvc = self.samples[idx]
        x = make_features(pid, week)
        return torch.from_numpy(x), torch.tensor(fvc, dtype=torch.float32), pid

BATCH = 64
train_loader = DataLoader(TabularDataset(train_samples), batch_size=BATCH, shuffle=True, num_workers=2, pin_memory=True)
val_loader   = DataLoader(TabularDataset(val_samples), batch_size=BATCH, shuffle=False, num_workers=2, pin_memory=True)

# --------------------------
# Simple MLP model: predicts mean and log_scale (per-sample)
# --------------------------
class TabularMLP(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, 128),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Dropout(0.1),
        )
        self.mean_head = nn.Linear(64, 1)
        self.log_scale_head = nn.Linear(64, 1)
        # Initialize log_scale bias so initial sigma is reasonable (e.g., ~300)
        nn.init.constant_(self.log_scale_head.bias, math.log(300.0))
    def forward(self, x):
        h = self.net(x)
        mean = self.mean_head(h).squeeze(-1)
        log_scale = self.log_scale_head(h).squeeze(-1)
        return mean, log_scale

INPUT_DIM = X_train_for_scaler.shape[1]
model = TabularMLP(INPUT_DIM).to(DEVICE)
print("Model params:", sum(p.numel() for p in model.parameters()))

# --------------------------
# Laplace NLL loss & utilities
# --------------------------
def laplace_nll_torch(mean, log_scale, target, sigma_floor=1e-6):
    # We train on raw sigma (no competition floor). But we avoid sigma -> 0.
    sigma = torch.exp(log_scale) + sigma_floor
    # Laplace NLL: |e|/b + log(2b), where b = sigma
    loss = (torch.abs(target - mean) / sigma) + torch.log(2.0 * sigma)
    return loss.mean()

@torch.no_grad()
def evaluate_model(model, loader, device, apply_competition_sigma_floor=True, sigma_floor_val=70.0):
    model.eval()
    preds = []
    sigs = []
    trues = []
    pids = []
    for x, y, pid in loader:
        x = x.to(device)
        y = y.to(device)
        mean, log_scale = model(x)
        mean_np = mean.detach().cpu().numpy()
        sigma_np = np.exp(log_scale.detach().cpu().numpy())
        if apply_competition_sigma_floor:
            sigma_np = np.maximum(sigma_np, sigma_floor_val)
        preds.append(mean_np)
        sigs.append(sigma_np)
        trues.append(y.detach().cpu().numpy())
        pids.extend(pid)
    preds = np.concatenate(preds)
    sigs = np.concatenate(sigs)
    trues = np.concatenate(trues)
    # Metrics across full val set
    rmse = float(np.sqrt(mean_squared_error(trues, preds)))
    # r2 safe:
    denom = np.sum((trues - trues.mean())**2)
    r2 = float(1.0 - np.sum((trues - preds)**2) / (denom + 1e-12))
    # Laplace LLL (per-sample), higher is better
    lll = np.mean(-np.sqrt(2.0) * np.abs(trues - preds) / sigs - np.log(np.sqrt(2.0) * sigs))
    return {'rmse': rmse, 'r2': r2, 'lll': float(lll)}

# --------------------------
# Baselines for sanity checks
#  - baseline predict = baseline FVC at each visit
#  - global mean
# --------------------------
def baseline_metrics(samples):
    # baseline predict baseline_FVC (per sample)
    y_true = []
    y_basepred = []
    for pid, week, fvc in samples:
        y_true.append(fvc)
        y_basepred.append(baseline_info[pid]['BaselineFVC'])
    y_true = np.array(y_true)
    y_basepred = np.array(y_basepred)
    rmse_base = float(np.sqrt(mean_squared_error(y_true, y_basepred)))
    mean_pred = np.array([y_true.mean()] * len(y_true))
    rmse_mean = float(np.sqrt(mean_squared_error(y_true, mean_pred)))
    return {'rmse_baseline': rmse_base, 'rmse_global_mean': rmse_mean}

print("Baselines (val):", baseline_metrics(val_samples))

# --------------------------
# Training loop
# --------------------------
opt = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-5)
sched = torch.optim.lr_scheduler.ReduceLROnPlateau(opt, mode='min', factor=0.5, patience=3, verbose=True)

EPOCHS = 20
best_lll = -1e18
for epoch in range(1, EPOCHS + 1):
    model.train()
    losses = []
    for x, y, pid in train_loader:
        x = x.to(DEVICE)
        y = y.to(DEVICE)
        opt.zero_grad()
        mean, log_scale = model(x)
        loss = laplace_nll_torch(mean, log_scale, y)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        opt.step()
        losses.append(loss.item())
    train_loss = float(np.mean(losses)) if losses else 0.0

    val_stats = evaluate_model(model, val_loader, DEVICE, apply_competition_sigma_floor=True, sigma_floor_val=70.0)
    sched.step(val_stats['lll'] * -1.0)  # scheduler expects loss (we invert LLL)
    print(f"Epoch {epoch:02d} | Train NLL: {train_loss:.4f} | Val RMSE: {val_stats['rmse']:.2f} | Val R²: {val_stats['r2']:.4f} | Val LLL: {val_stats['lll']:.4f}")

    # Save best by LLL
    if val_stats['lll'] > best_lll:
        best_lll = val_stats['lll']
        torch.save(model.state_dict(), "tabular_best.pth")
        print("Saved best tabular model (LLL).")

# --------------------------
# Load best and compute final metrics (including no-floor and with-floor)
# --------------------------
model.load_state_dict(torch.load("tabular_best.pth"))
m_no_floor = evaluate_model(model, val_loader, DEVICE, apply_competition_sigma_floor=False)
m_with_floor = evaluate_model(model, val_loader, DEVICE, apply_competition_sigma_floor=True, sigma_floor_val=70.0)
print("Final metrics (no floor):", m_no_floor)
print("Final metrics (with 70 floor):", m_with_floor)

# --------------------------
# Sigma calibration (learn scalar multiplier on sigma to maximize LLL on val)
# We fit alpha >= 0 such that sigma_cal = alpha * sigma_pred
# Optimize alpha by simple grid search on validation set (cheap)
# --------------------------
def calibrate_sigma_scalar(model, loader, device, grid=np.linspace(0.1, 10.0, 100)):
    model.eval()
    preds = []
    sigs = []
    trues = []
    with torch.no_grad():
        for x, y, pid in loader:
            x = x.to(device)
            y = y.to(device)
            mean, log_scale = model(x)
            preds.append(mean.cpu().numpy())
            sigs.append(np.exp(log_scale.cpu().numpy()))
            trues.append(y.cpu().numpy())
    preds = np.concatenate(preds)
    sigs = np.concatenate(sigs)
    trues = np.concatenate(trues)

    best_alpha = 1.0
    best_lll = -1e18
    for a in grid:
        s_cal = np.maximum(sigs * a, 70.0)  # competition floor
        lll = np.mean(-np.sqrt(2.0) * np.abs(trues - preds) / s_cal - np.log(np.sqrt(2.0) * s_cal))
        if lll > best_lll:
            best_lll = lll
            best_alpha = a
    return best_alpha, best_lll

alpha, best_lll = calibrate_sigma_scalar(model, val_loader, DEVICE)
print(f"Calibration alpha (sigma multiplier): {alpha:.4f}, calibrated LLL: {best_lll:.4f}")

# --------------------------
# If tabular baseline works (i.e., RMSE << 2000 and ideally < ~400), proceed to add images.
# --------------------------


Device: cuda
Loaded rows: 1549
Total samples: 1549
Train patients: 140 Val patients: 36
Train samples: 1236 Val samples: 313
Model params: 9666
Baselines (val): {'rmse_baseline': 253.36408156539483, 'rmse_global_mean': 702.7735537599037}
Epoch 01 | Train NLL: 11.3629 | Val RMSE: 2507.11 | Val R²: -11.7267 | Val LLL: -9.6789
Saved best tabular model (LLL).
Epoch 02 | Train NLL: 9.7148 | Val RMSE: 2506.54 | Val R²: -11.7209 | Val LLL: -9.5526
Saved best tabular model (LLL).
Epoch 03 | Train NLL: 9.6349 | Val RMSE: 2505.86 | Val R²: -11.7140 | Val LLL: -9.5085
Saved best tabular model (LLL).
Epoch 04 | Train NLL: 9.6181 | Val RMSE: 2505.19 | Val R²: -11.7072 | Val LLL: -9.4847
Saved best tabular model (LLL).
Epoch 05 | Train NLL: 9.6135 | Val RMSE: 2504.65 | Val R²: -11.7017 | Val LLL: -9.4984
Epoch 06 | Train NLL: 9.6054 | Val RMSE: 2504.11 | Val R²: -11.6963 | Val LLL: -9.5073
Epoch 07 | Train NLL: 9.6068 | Val RMSE: 2503.49 | Val R²: -11.6900 | Val LLL: -9.4969
Epoch 08 | Train NLL: 9.

In [16]:
# Fixed end-to-end tabular Laplace training script (no input_dim mismatch)
import os, math, random
from pathlib import Path
import numpy as np
import pandas as pd
from sklearn.model_selection import GroupKFold
from sklearn.preprocessing import StandardScaler

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# ----------------- config -----------------
SEED = 42
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
DATA_DIR = Path("/kaggle/input/osic-pulmonary-fibrosis-progression")
TRAIN_CSV = DATA_DIR / "train.csv"

BATCH_SIZE = 64
EPOCHS = 10
LR = 1e-3
WEIGHT_DECAY = 1e-5
SIGMA_REG = 1e-4
SIGMA_FLOOR = 70.0

random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(SEED)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

# ----------------- load data -----------------
df = pd.read_csv(TRAIN_CSV)
print(f"Loaded {len(df)} rows")

# ----------------- categorical encoding -----------------
# Factorize to numeric if present
for col in ["Sex", "SmokingStatus"]:
    if col in df.columns and df[col].dtype == "object":
        df[col], _ = pd.factorize(df[col])

# ----------------- baseline week per patient -----------------
patient_baseline_week = df.groupby("Patient")["Weeks"].first().to_dict()

# ----------------- engineered features -----------------
def make_row_features(row):
    pid = row["Patient"]
    base_w = patient_baseline_week[pid]
    rel_w = float(row["Weeks"] - base_w)
    percent = float(row["Percent"]) if "Percent" in row and not pd.isna(row["Percent"]) else 50.0
    age = float(row["Age"])
    # engineered
    return {
        "Age": age,
        "Percent": percent,
        "Sex": float(row["Sex"]),
        "SmokingStatus": float(row["SmokingStatus"]),
        "rel_week": rel_w,
        "rel_week_sq": rel_w ** 2,
        "rel_week_x_percent": rel_w * percent,
        "age_x_percent": age * percent,
    }

eng_list = []
for _, r in df.iterrows():
    eng_list.append(make_row_features(r))
eng_df = pd.DataFrame(eng_list)
df = pd.concat([df.reset_index(drop=True), eng_df.reset_index(drop=True)], axis=1)

# Explicit feature column order (you can add/remove features here)
feature_cols = ["Age", "Percent", "Sex", "SmokingStatus",
                "rel_week", "rel_week_sq", "rel_week_x_percent", "age_x_percent"]

print("Feature columns:", feature_cols)
print("Feature count:", len(feature_cols))

# ----------------- train/val split by patient -----------------
patients = df["Patient"].unique().tolist()
gkf = GroupKFold(n_splits=5)
train_idx, val_idx = next(gkf.split(patients, groups=patients))
train_pids = [patients[i] for i in train_idx]
val_pids = [patients[i] for i in val_idx]

train_df = df[df["Patient"].isin(train_pids)].reset_index(drop=True)
val_df = df[df["Patient"].isin(val_pids)].reset_index(drop=True)
print(f"Train rows: {len(train_df)}, Val rows: {len(val_df)}")

# ----------------- scalers fit on train only -----------------
scaler_X = StandardScaler().fit(train_df[feature_cols].values.astype(np.float32))
y_train = train_df["FVC"].values.astype(np.float32)
y_mean = y_train.mean()
y_std = y_train.std() if y_train.std() > 0 else 1.0

# ----------------- Dataset -----------------
class TabularDS(Dataset):
    def __init__(self, df, feature_cols):
        self.df = df.reset_index(drop=True)
        self.feature_cols = feature_cols
    def __len__(self):
        return len(self.df)
    def __getitem__(self, idx):
        r = self.df.iloc[idx]
        x_raw = r[self.feature_cols].values.astype(np.float32)
        x = scaler_X.transform(x_raw.reshape(1, -1)).astype(np.float32).flatten()
        y = np.float32(r["FVC"])
        y_scaled = (y - y_mean) / y_std
        return torch.from_numpy(x), torch.tensor(y_scaled, dtype=torch.float32)

train_ds = TabularDS(train_df, feature_cols)
val_ds = TabularDS(val_df, feature_cols)
train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True, num_workers=2, pin_memory=True)
val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE, shuffle=False, num_workers=2, pin_memory=True)

# Print first batch shapes for a sanity check
xb, yb = next(iter(train_loader))
print("First batch xb shape:", xb.shape, "yb shape:", yb.shape)
# xb.shape[1] is the actual input_dim from data

# ----------------- Model (input_dim derived from data) -----------------
class ResidualBlock(nn.Module):
    def __init__(self, dim, dropout=0.1):
        super().__init__()
        self.fc1 = nn.Linear(dim, dim)
        self.fc2 = nn.Linear(dim, dim)
        self.act = nn.ReLU()
        self.dropout = nn.Dropout(dropout)
        self.norm = nn.LayerNorm(dim)
    def forward(self, x):
        r = x
        out = self.act(self.fc1(x))
        out = self.dropout(out)
        out = self.fc2(out)
        out = self.norm(out + r)
        return self.act(out)

class TabularLaplace(nn.Module):
    def __init__(self, input_dim, hidden_dim=64, num_blocks=2):
        super().__init__()
        self.input = nn.Linear(input_dim, hidden_dim)
        self.blocks = nn.ModuleList([ResidualBlock(hidden_dim) for _ in range(num_blocks)])
        self.mean_head = nn.Linear(hidden_dim, 1)
        self.log_s_head = nn.Linear(hidden_dim, 1)
        nn.init.constant_(self.log_s_head.bias, math.log(1.0))
    def forward(self, x):
        h = torch.relu(self.input(x))
        for b in self.blocks:
            h = b(h)
        mu = self.mean_head(h).squeeze(-1)
        log_s = self.log_s_head(h).squeeze(-1)
        return mu, log_s

# initialize with true input_dim
input_dim = xb.shape[1]
model = TabularLaplace(input_dim=input_dim, hidden_dim=64, num_blocks=2).to(DEVICE)
print("Initialized model with input_dim =", input_dim, " ; total params =", sum(p.numel() for p in model.parameters()))

# ----------------- loss with softplus sigma -----------------
softplus = nn.Softplus()
def laplace_nll_scaled(mu_s, log_s, y_s, sigma_reg=1e-4):
    b = softplus(log_s) + 1e-6         # positive scale (in scaled-space)
    loss = torch.abs(y_s - mu_s) / b + torch.log(2.0 * b)
    return loss.mean() + sigma_reg * (log_s ** 2).mean()

# ----------------- evaluation helper (unscale, compute RMSE, R2, LLL) -----------------
def evaluate(model, loader, device=DEVICE, sigma_floor=SIGMA_FLOOR):
    model.eval()
    preds, sigs, trues = [], [], []
    with torch.no_grad():
        for xb, yb_s in loader:
            xb = xb.to(device).float()
            yb_s = yb_s.to(device).float()
            mu_s, log_s = model(xb)
            b_s = softplus(log_s) + 1e-6
            # unscale
            mu = (mu_s.cpu().numpy() * y_std) + y_mean
            b = (b_s.cpu().numpy() * y_std)   # scale sigma to original units
            y_true = (yb_s.cpu().numpy() * y_std) + y_mean
            preds.append(mu)
            sigs.append(b)
            trues.append(y_true)
    preds = np.concatenate(preds)
    sigs = np.concatenate(sigs)
    trues = np.concatenate(trues)
    sigs_floor = np.maximum(sigs, sigma_floor)
    rmse = float(np.sqrt(np.mean((trues - preds) ** 2)))
    denom = np.sum((trues - trues.mean()) ** 2)
    r2 = float(1.0 - np.sum((trues - preds) ** 2) / (denom + 1e-12)) if denom > 0 else 0.0
    lll = float(np.mean(-math.sqrt(2.0) * np.abs(trues - preds) / sigs_floor - np.log(math.sqrt(2.0) * sigs_floor)))
    return {"rmse": rmse, "r2": r2, "lll": lll, "preds": preds, "sigs": sigs, "trues": trues}

# ----------------- optimizer / scheduler -----------------
opt = torch.optim.Adam(model.parameters(), lr=LR, weight_decay=WEIGHT_DECAY)
sched = torch.optim.lr_scheduler.ReduceLROnPlateau(opt, mode="max", factor=0.5, patience=4, verbose=True)

# ----------------- train loop -----------------
best_lll = -1e18
for epoch in range(1, EPOCHS + 1):
    model.train()
    losses = []
    for step, (xb, yb_s) in enumerate(train_loader):
        xb = xb.to(DEVICE).float()
        yb_s = yb_s.to(DEVICE).float()
        mu_s, log_s = model(xb)
        loss = laplace_nll_scaled(mu_s, log_s, yb_s, sigma_reg=SIGMA_REG)
        opt.zero_grad()
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        opt.step()
        losses.append(loss.item())
        # debug print first batch shapes (only once)
        if epoch == 1 and step == 0:
            print("DEBUG shapes (batch0): xb", xb.shape, "mu_s", mu_s.shape, "log_s", log_s.shape, "yb_s", yb_s.shape)

    train_loss = float(np.mean(losses)) if losses else 0.0
    val_stats = evaluate(model, val_loader, device=DEVICE, sigma_floor=SIGMA_FLOOR)
    # scheduler steps on validation LLL (we want to maximize LLL -> use 'max' above)
    sched.step(val_stats["lll"])
    print(f"Epoch {epoch:02d} | TrainLoss: {train_loss:.6f} | Val RMSE: {val_stats['rmse']:.2f} | Val R2: {val_stats['r2']:.4f} | Val LLL: {val_stats['lll']:.4f}")

    if val_stats["lll"] > best_lll:
        best_lll = val_stats["lll"]
        torch.save(model.state_dict(), "best_tabular_laplace_fixed.pt")
        print(" -> saved best model (LLL)")

# ----------------- final calibrated LLL on val -----------------
res = evaluate(model, val_loader, device=DEVICE, sigma_floor=0.0)  # raw sigmas
preds, sigs, trues = res["preds"], res["sigs"], res["trues"]
sigs = np.maximum(sigs, 1e-6)

best_alpha, best_lll = 1.0, -1e18
for a in np.linspace(0.1, 5.0, 50):
    s_cal = np.maximum(sigs * a, SIGMA_FLOOR)
    lll = np.mean(-math.sqrt(2.0) * np.abs(trues - preds) / s_cal - np.log(math.sqrt(2.0) * s_cal))
    if lll > best_lll:
        best_lll = lll
        best_alpha = a

print("Final val RMSE, R2, LLL (before calib):", evaluate(model, val_loader, DEVICE, SIGMA_FLOOR))
print(f"Sigma calibration -> alpha: {best_alpha:.3f}, calibrated LLL: {best_lll:.6f}")


Loaded 1549 rows
Feature columns: ['Age', 'Percent', 'Sex', 'SmokingStatus', 'rel_week', 'rel_week_sq', 'rel_week_x_percent', 'age_x_percent']
Feature count: 8
Train rows: 1236, Val rows: 313
First batch xb shape: torch.Size([64, 12]) yb shape: torch.Size([64])
Initialized model with input_dim = 12  ; total params = 17858
DEBUG shapes (batch0): xb torch.Size([64, 12]) mu_s torch.Size([64]) log_s torch.Size([64]) yb_s torch.Size([64])
Epoch 01 | TrainLoss: 0.962912 | Val RMSE: 311.96 | Val R2: 0.8030 | Val LLL: -7.2297
 -> saved best model (LLL)
Epoch 02 | TrainLoss: 0.548413 | Val RMSE: 302.91 | Val R2: 0.8142 | Val LLL: -7.2821
Epoch 03 | TrainLoss: 0.438487 | Val RMSE: 272.39 | Val R2: 0.8498 | Val LLL: -7.1264
 -> saved best model (LLL)
Epoch 04 | TrainLoss: 0.393646 | Val RMSE: 256.50 | Val R2: 0.8668 | Val LLL: -7.0930
 -> saved best model (LLL)
Epoch 05 | TrainLoss: 0.355745 | Val RMSE: 249.76 | Val R2: 0.8737 | Val LLL: -7.0356
 -> saved best model (LLL)
Epoch 06 | TrainLoss: 0.

In [20]:
# Full corrected script — fixes scaler / feature mismatch (no more 12 vs 8 bug)
# Run as a single cell. Assumes /kaggle/input/osic-pulmonary-fibrosis-progression/train/ exists.
import os, time, math, random
from pathlib import Path
from glob import glob

import numpy as np
import pandas as pd
from sklearn.model_selection import GroupKFold
from sklearn.preprocessing import StandardScaler

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

import torchvision.transforms as T
import torchvision.models as models
import cv2
import pydicom

# ---------- CONFIG ----------
SEED = 42
random.seed(SEED); np.random.seed(SEED); torch.manual_seed(SEED)
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device:", DEVICE)

DATA_DIR = Path("/kaggle/input/osic-pulmonary-fibrosis-progression")
TRAIN_DIR = DATA_DIR / "train"
TRAIN_CSV = DATA_DIR / "train.csv"

EMBED_CACHE_DIR = Path("./cache_embeddings")
EMBED_CACHE_DIR.mkdir(exist_ok=True)

BATCH_SIZE = 48
EPOCHS = 12
LR = 1e-3
WEIGHT_DECAY = 1e-5
SIGMA_REG = 1e-4
SIGMA_FLOOR = 70.0

# ---------- Helper: DICOM -> HU -> windowed image ----------
def load_dicom_hu(path):
    d = pydicom.dcmread(str(path), force=True)
    arr = d.pixel_array.astype(np.float32)
    intercept = float(getattr(d, "RescaleIntercept", 0.0))
    slope = float(getattr(d, "RescaleSlope", 1.0))
    arr = arr * slope + intercept
    return arr

def window_and_norm_uint8(img_hu, win_min=-1200, win_max=600, to_255=True):
    img = np.clip(img_hu, win_min, win_max)
    if to_255:
        img = (img - win_min) / (win_max - win_min)
        img = (img * 255.0).astype(np.uint8)
    return img

def build_3slice_image(patient_dir, target_size=(224,224)):
    files = sorted(glob(str(patient_dir / "*.dcm")))
    if len(files) == 0:
        return np.zeros((target_size[0], target_size[1], 3), dtype=np.uint8)
    mid = len(files) // 2
    idxs = [max(0, mid-1), mid, min(len(files)-1, mid+1)]
    channels = []
    for i in idxs:
        try:
            hu = load_dicom_hu(files[i])
            w = window_and_norm_uint8(hu)
            w = cv2.resize(w, target_size, interpolation=cv2.INTER_AREA)
        except Exception:
            w = np.zeros(target_size, dtype=np.uint8)
        channels.append(w)
    return np.stack(channels, axis=2)

# ---------- Embedding extractor (caches per patient) ----------
def extract_and_cache_embeddings(patient_list, target_size=(224,224), force=False):
    device = DEVICE
    backbone = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1)
    modules = list(backbone.children())[:-1]
    backbone = nn.Sequential(*modules).to(device).eval()
    transform = T.Compose([T.ToTensor(), T.Normalize(mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225])])
    t0 = time.time()
    for i, pid in enumerate(patient_list, 1):
        cache_file = EMBED_CACHE_DIR / f"{pid}.npy"
        if cache_file.exists() and not force:
            continue
        patient_dir = TRAIN_DIR / pid
        if not patient_dir.exists():
            np.save(cache_file, np.zeros(512, dtype=np.float32))
            continue
        img = build_3slice_image(patient_dir, target_size=target_size)
        x = transform(img).unsqueeze(0).to(device)
        with torch.no_grad():
            feat = backbone(x)  # [1,512,1,1]
            feat = feat.reshape(-1).cpu().numpy().astype(np.float32)
        np.save(cache_file, feat)
        if i % 20 == 0 or i==len(patient_list):
            print(f"Cached {i}/{len(patient_list)} embeddings — elapsed {time.time()-t0:.0f}s")

# ---------- Load CSV and create features (explicit) ----------
df = pd.read_csv(TRAIN_CSV)
print("Loaded rows:", len(df))

# Safe factorize
for c in ["Sex", "SmokingStatus"]:
    if c in df.columns and df[c].dtype == "object":
        df[c], _ = pd.factorize(df[c])

# baseline week per patient
patient_baseline_week = df.groupby("Patient")["Weeks"].first().to_dict()

# Explicit engineered features (only these will be used)
def make_features(row):
    pid = row["Patient"]
    base = patient_baseline_week[pid]
    rel_week = float(row["Weeks"] - base)
    percent = float(row["Percent"]) if not pd.isna(row["Percent"]) else 50.0
    age = float(row["Age"])
    return {
        "Age": age,
        "Percent": percent,
        "Sex": float(row["Sex"]),
        "SmokingStatus": float(row["SmokingStatus"]),
        "rel_week": rel_week,
        "rel_week_sq": rel_week**2,
        "rel_week_x_percent": rel_week * percent,
        "age_x_percent": age * percent
    }

eng = [make_features(r) for _, r in df.iterrows()]
eng_df = pd.DataFrame(eng)
# Replace any previously existing feature columns to avoid duplication/conflict:
for c in eng_df.columns:
    if c in df.columns:
        df.drop(columns=[c], inplace=True)
df = pd.concat([df.reset_index(drop=True), eng_df.reset_index(drop=True)], axis=1)

feature_cols = ["Age","Percent","Sex","SmokingStatus","rel_week","rel_week_sq","rel_week_x_percent","age_x_percent"]
print("Using explicit feature_cols:", feature_cols, "count:", len(feature_cols))

# ---------- Patient-wise split ----------
patients = df["Patient"].unique().tolist()
gkf = GroupKFold(n_splits=5)
train_idx, val_idx = next(gkf.split(patients, groups=patients))
train_pids = [patients[i] for i in train_idx]
val_pids = [patients[i] for i in val_idx]

train_df = df[df["Patient"].isin(train_pids)].reset_index(drop=True)
val_df = df[df["Patient"].isin(val_pids)].reset_index(drop=True)
print(f"Train rows: {len(train_df)}, Val rows: {len(val_df)}")

# ---------- IMPORTANT: Fit scaler only on feature_cols from train_df ----------
# This is the crucial fix. We explicitly select columns in the same order as feature_cols.
scaler_X = StandardScaler().fit(train_df[feature_cols].values.astype(np.float32))
print("Scaler fit: mean shape", scaler_X.mean_.shape, " expected features:", len(feature_cols))

# Defensive sanity: if shapes still mismatch, print columns and abort with clear message.
if scaler_X.mean_.shape[0] != len(feature_cols):
    print("Scaler mean shape mismatch — scaler was fit on different columns. Debug info:")
    print("scaler.mean_.shape[0] =", scaler_X.mean_.shape[0])
    print("len(feature_cols) =", len(feature_cols))
    print("feature_cols:", feature_cols)
    raise AssertionError("Scaler feature count mismatch. Recreate scaler explicitly on train_df[feature_cols].")

y_train = train_df["FVC"].values.astype(np.float32)
y_mean, y_std = float(y_train.mean()), float(y_train.std() if y_train.std()>0 else 1.0)
print("y_mean, y_std:", y_mean, y_std)

# ---------- Cache embeddings (train+val patients) ----------
needed_pids = list(set(train_df["Patient"].unique().tolist() + val_df["Patient"].unique().tolist()))
print("Caching embeddings for", len(needed_pids), "patients (one-time).")
extract_and_cache_embeddings(needed_pids, target_size=(224,224), force=False)

# ---------- Fusion Dataset (guaranteed to use feature_cols and scaler_X) ----------
class FusionDataset(Dataset):
    def __init__(self, df, feature_cols, embed_dir):
        self.df = df.reset_index(drop=True)
        self.feature_cols = feature_cols
        self.embed_dir = Path(embed_dir)
    def __len__(self):
        return len(self.df)
    def __getitem__(self, idx):
        r = self.df.iloc[idx]
        pid = r["Patient"]
        emb_path = self.embed_dir / f"{pid}.npy"
        emb = np.load(emb_path).astype(np.float32) if emb_path.exists() else np.zeros(512, dtype=np.float32)
        x_raw = r[self.feature_cols].values.astype(np.float32)
        x = scaler_X.transform(x_raw.reshape(1,-1)).astype(np.float32).flatten()
        y = np.float32(r["FVC"])
        y_s = (y - y_mean) / y_std
        return torch.from_numpy(emb), torch.from_numpy(x), torch.tensor(y_s, dtype=torch.float32)

train_ds = FusionDataset(train_df, feature_cols, EMBED_CACHE_DIR)
val_ds = FusionDataset(val_df, feature_cols, EMBED_CACHE_DIR)
train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True, num_workers=2, pin_memory=True)
val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE, shuffle=False, num_workers=2, pin_memory=True)

# ---------- Debug: inspect one batch and derive dims ----------
emb_sample, tab_sample, y_sample = next(iter(train_loader))
print("DEBUG sample shapes -> emb:", emb_sample.shape, "tab:", tab_sample.shape, "y:", y_sample.shape)
inferred_emb_dim = int(emb_sample.shape[1])
inferred_tab_dim = int(tab_sample.shape[1])
print("Inferred emb_dim:", inferred_emb_dim, "inferred_tab_dim:", inferred_tab_dim)
# defensive check: tab dim must equal scaler feature count
assert inferred_tab_dim == len(feature_cols), f"Data tab dim ({inferred_tab_dim}) != len(feature_cols) ({len(feature_cols)})"

# ---------- Model (built using inferred_tab_dim) ----------
class FusionLaplaceModel(nn.Module):
    def __init__(self, emb_dim, tab_dim, proj_dim=128, hidden_dim=128):
        super().__init__()
        self.img_proj = nn.Sequential(nn.Linear(emb_dim, proj_dim), nn.ReLU(), nn.Dropout(0.2))
        self.tab_in = nn.Linear(tab_dim, proj_dim)
        self.tab_blocks = nn.ModuleList([nn.Sequential(nn.Linear(proj_dim, proj_dim), nn.ReLU(), nn.Dropout(0.1)) for _ in range(2)])
        fusion_in = proj_dim + proj_dim
        self.fusion = nn.Sequential(nn.Linear(fusion_in, hidden_dim), nn.ReLU(), nn.Dropout(0.3), nn.Linear(hidden_dim, hidden_dim), nn.ReLU())
        self.mean_head = nn.Linear(hidden_dim, 1)
        self.log_s_head = nn.Linear(hidden_dim, 1)
        nn.init.constant_(self.log_s_head.bias, math.log(1.0))
    def forward(self, emb, tab):
        i = self.img_proj(emb)
        t = F.relu(self.tab_in(tab))
        for b in self.tab_blocks:
            t = b(t)
        fuse = torch.cat([i, t], dim=1)
        h = self.fusion(fuse)
        mu = self.mean_head(h).squeeze(-1)
        log_s = self.log_s_head(h).squeeze(-1)
        return mu, log_s

model = FusionLaplaceModel(emb_dim=inferred_emb_dim, tab_dim=inferred_tab_dim, proj_dim=128, hidden_dim=128).to(DEVICE)
print("Model params:", sum(p.numel() for p in model.parameters() if p.requires_grad))

# ---------- Loss (Laplace NLL) & evaluation ----------
softplus = nn.Softplus()
def laplace_nll_scaled(mu_s, log_s, y_s, sigma_reg=SIGMA_REG):
    b = softplus(log_s) + 1e-6
    loss = torch.abs(y_s - mu_s) / b + torch.log(2.0 * b)
    return loss.mean() + sigma_reg * (log_s ** 2).mean()

def evaluate(model, loader, device=DEVICE, sigma_floor=SIGMA_FLOOR):
    model.eval()
    preds, sigs, trues = [], [], []
    with torch.no_grad():
        for emb, tab, y_s in loader:
            emb = emb.to(device).float()
            tab = tab.to(device).float()
            y_s = y_s.to(device).float()
            mu_s, log_s = model(emb, tab)
            b_s = softplus(log_s) + 1e-6
            mu = (mu_s.cpu().numpy() * y_std) + y_mean
            b = (b_s.cpu().numpy() * y_std)
            y_true = (y_s.cpu().numpy() * y_std) + y_mean
            preds.append(mu); sigs.append(b); trues.append(y_true)
    preds = np.concatenate(preds); sigs = np.concatenate(sigs); trues = np.concatenate(trues)
    sigs_floor = np.maximum(sigs, sigma_floor)
    rmse = float(np.sqrt(np.mean((trues - preds)**2)))
    denom = np.sum((trues - trues.mean())**2)
    r2 = float(1.0 - np.sum((trues - preds)**2)/(denom + 1e-12)) if denom > 0 else 0.0
    lll = float(np.mean(-math.sqrt(2.0)*np.abs(trues-preds)/sigs_floor - np.log(math.sqrt(2.0)*sigs_floor)))
    return {"rmse": rmse, "r2": r2, "lll": lll, "preds": preds, "sigs": sigs, "trues": trues}

# ---------- Optimizer & scheduler ----------
opt = torch.optim.Adam(model.parameters(), lr=LR, weight_decay=WEIGHT_DECAY)
sched = torch.optim.lr_scheduler.ReduceLROnPlateau(opt, mode="max", factor=0.5, patience=3, verbose=True)

# ---------- Training loop with defensive assert ----------
best_lll = -1e18
for epoch in range(1, EPOCHS+1):
    model.train()
    losses = []
    t0 = time.time()
    for step, (emb, tab, y_s) in enumerate(train_loader):
        emb = emb.to(DEVICE).float()
        tab = tab.to(DEVICE).float()
        y_s = y_s.to(DEVICE).float()

        # Defensive assert to catch any future mismatch
        assert tab.shape[1] == model.tab_in.in_features, f"Tab dim mismatch: data {tab.shape[1]} vs model {model.tab_in.in_features}"

        mu_s, log_s = model(emb, tab)
        loss = laplace_nll_scaled(mu_s, log_s, y_s)
        opt.zero_grad()
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        opt.step()
        losses.append(loss.item())

        if epoch == 1 and step == 0:
            print("DEBUG batch shapes: emb", emb.shape, "tab", tab.shape, "mu_s", mu_s.shape, "log_s", log_s.shape, "y_s", y_s.shape)

    train_loss = float(np.mean(losses)) if len(losses)>0 else 0.0
    val_stats = evaluate(model, val_loader, device=DEVICE)
    sched.step(val_stats["lll"])
    print(f"Epoch {epoch:02d} | TrainLoss: {train_loss:.6f} | Val RMSE: {val_stats['rmse']:.2f} | Val R2: {val_stats['r2']:.4f} | Val LLL: {val_stats['lll']:.4f} | time: {time.time()-t0:.1f}s")

    if val_stats["lll"] > best_lll:
        best_lll = val_stats["lll"]
        torch.save(model.state_dict(), "best_fusion_laplace_fixedscaler.pt")
        print(" -> saved best fusion model (LLL)")

# ---------- Sigma calibration on val ----------
res = evaluate(model, val_loader, device=DEVICE, sigma_floor=0.0)
preds, sigs, trues = res["preds"], res["sigs"], res["trues"]
sigs = np.maximum(sigs, 1e-6)
best_alpha, best_lll = 1.0, -1e18
for a in np.linspace(0.5, 3.0, 40):
    s_cal = np.maximum(sigs * a, SIGMA_FLOOR)
    lll = np.mean(-math.sqrt(2.0)*np.abs(trues-preds)/s_cal - np.log(math.sqrt(2.0)*s_cal))
    if lll > best_lll:
        best_lll = lll; best_alpha = a
print(f"Calibrated alpha: {best_alpha:.3f}, calibrated LLL: {best_lll:.6f}")
final = evaluate(model, val_loader, device=DEVICE, sigma_floor=SIGMA_FLOOR)
print("FINAL VAL (with floor):", {k: final[k] for k in ('rmse','r2','lll')})


Device: cuda
Loaded rows: 1549
Using explicit feature_cols: ['Age', 'Percent', 'Sex', 'SmokingStatus', 'rel_week', 'rel_week_sq', 'rel_week_x_percent', 'age_x_percent'] count: 8
Train rows: 1236, Val rows: 313
Scaler fit: mean shape (8,)  expected features: 8
y_mean, y_std: 2762.177978515625 847.3816528320312
Caching embeddings for 176 patients (one-time).
DEBUG sample shapes -> emb: torch.Size([48, 512]) tab: torch.Size([48, 8]) y: torch.Size([48])
Inferred emb_dim: 512 inferred_tab_dim: 8
Model params: 149506
DEBUG batch shapes: emb torch.Size([48, 512]) tab torch.Size([48, 8]) mu_s torch.Size([48]) log_s torch.Size([48]) y_s torch.Size([48])
Epoch 01 | TrainLoss: 1.302679 | Val RMSE: 416.34 | Val R2: 0.6490 | Val LLL: -7.5568 | time: 0.9s
 -> saved best fusion model (LLL)
Epoch 02 | TrainLoss: 0.745968 | Val RMSE: 313.53 | Val R2: 0.8010 | Val LLL: -7.1697 | time: 0.9s
 -> saved best fusion model (LLL)
Epoch 03 | TrainLoss: 0.510242 | Val RMSE: 257.10 | Val R2: 0.8662 | Val LLL: -7.