# üß† Multimodal CTR Prediction (MM-CTR)
1. Vue d‚Äôensemble
   
- Task 1 ‚Äì Multimodal Item Embedding G√©n√©ration de repr√©sentations items multimodales 128-d √† partir du contenu visuel et du comportement utilisateur.

- Task 1&2 ‚Äì CTR Prediction Entra√Ænement d‚Äôun mod√®le DIN (Deep Interest Network) avec des embeddings items fig√©s, conform√©ment aux r√®gles du challenge.

L‚Äôobjectif est d‚Äôam√©liorer la performance CTR (AUC) par rapport au baseline fourni, tout en respectant strictement l‚Äôint√©grit√© des donn√©es.

# üìò Task 1 : Multimodal Item Embedding

## 1.1 Motivation & ApprocheLe
baseline classique (concat√©nation Texte+Image suivie d'une PCA) √©choue √† capturer la structure comportementale implicite. Notre pipeline r√©sout cela via une distillation de connaissances : nous enseignons √† un encodeur visuel √† pr√©dire le contexte d'utilisation de l'item.

## 1.2 Extraction Visuelle (Teacher)
Nous utilisons CLIP (Contrastive Language-Image Pre-training) pour extraire la s√©mantique visuelle brute.
- Mod√®le : CLIP ViT-B/32 (OpenAI).
- Processus :
  1. Backbone vision gel√© (frozen).
  2. Projection visuelle + Normalisation L2.
  3. Traitement du Cold-Start (image par d√©faut si absente).
- Sortie : Vecteur s√©mantique de dimension $d=512$.

## 1.3 Apprentissage Comportemental (Target)
Nous construisons l'espace latent cible en analysant les s√©quences d'interactions dans item_seq.parquet.
- M√©thode : Word2Vec (Skip-Gram).
- Analogie : S√©quence utilisateur $\approx$ Phrase ; Item $\approx$ Mot.
- Hyperparam√®tres : Dimension $d=128$, Fen√™tre $= 5$.
- R√©sultat : Ces embeddings capturent les co-occurrences et la substituabilit√© des items.

## 1.4 Distillation Multimodale (Le Pont)
C'est l'√©tape critique o√π l'information visuelle est align√©e sur l'espace comportemental.
- Alignement : Intersection des items (CLIP $\cap$ W2V) via recherche binaire acc√©l√©r√©e (Numba).
- Architecture du Projecteur (MLP) :$$512 \xrightarrow{\text{Dense}} 512 \xrightarrow{\text{BN, ReLU, Drop}} 256 \xrightarrow{\text{Dense}} 128$$
- Objectif d'entra√Ænement :Minimiser la distance (MSE) entre la projection de l'image et l'embedding comportemental appris.

## 1.5 G√©n√©ration Finale
- Inf√©rence : Image $\to$ CLIP $\to$ Projecteur $\to$ Embedding (128-d).
- Sortie : Mise √† jour de item_info.parquet (colonne item_emb) sans alt√©rer les autres donn√©es.

In [1]:
import os
import gc
import time
import numpy as np
import pandas as pd
import polars as pl
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, TensorDataset
from torchvision import transforms
from PIL import Image
from tqdm.auto import tqdm
from transformers import CLIPProcessor, CLIPModel
from gensim.models import Word2Vec
import datasets
from numba import njit, prange

# ==========================================
# 0. CONFIGURATION
# ==========================================
class Config:
    # Chemins
    DATA_DIR = "/kaggle/input/data-ctr"
        
    ITEM_INFO_PATH = "/kaggle/input/data-ctr/item_info.parquet"
    SEQ_PATH = "/kaggle/input/data-ctr/item_seq.parquet"
    IMAGE_DIR = "/kaggle/input/data-ctr/item_images/item_images"
    
    # Mod√®le
    MODEL_ID = "openai/clip-vit-base-patch32"
    INPUT_DIM = 512
    TARGET_DIM = 128
    
    # Entra√Ænement Projecteur
    BATCH_SIZE = 128
    NUM_WORKERS = 2
    PROJ_LR = 1e-3
    PROJ_EPOCHS = 10
    
    # Word2Vec
    LIMIT_W2V_DATA = False 
    
    DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
    OUTPUT_PATH = "item_emb_projector.parquet"
    OUTPUT_INFO_UPDATED = "item_info_updated.parquet"

device = Config.DEVICE
print(f"Running on {device}")

# ==========================================
# 1. HELPER CLASSES (Optimized)
# ==========================================

# --- NUMBA ALIGNMENT KERNEL ---
@njit(parallel=True)
def numba_fast_alignment(clip_ids, w2v_ids_sorted, w2v_vecs_sorted, dim):
    n = len(clip_ids)
    
    # Output arrays
    # mask: 1 if found, 0 if not
    mask = np.zeros(n, dtype=np.bool_)
    # targets: vectors aligned with clip_ids
    targets = np.zeros((n, dim), dtype=np.float32)
    
    # Parallel Loop
    for i in prange(n):
        target_id = clip_ids[i]
        
        # Binary search manually or use np.searchsorted logic
        # Since we are inside Numba, np.searchsorted is supported and fast
        idx = np.searchsorted(w2v_ids_sorted, target_id)
        
        # Check if found
        if idx < len(w2v_ids_sorted) and w2v_ids_sorted[idx] == target_id:
            mask[i] = True
            targets[i] = w2v_vecs_sorted[idx]
            
    return mask, targets

# --- GENERATOR WITH TQDM ---
class SequenceGenerator:
    def __init__(self, sequences, total_len=None):
        self.sequences = sequences
        self.total_len = total_len
        
    def __iter__(self):
        # On wrap l'it√©rateur avec tqdm pour voir la vitesse de lecture
        iterator = self.sequences
        if self.total_len:
            iterator = tqdm(self.sequences, total=self.total_len, desc="Stream W2V Seq")
            
        for seq in iterator:
            if seq is not None:
                yield [str(x) for x in seq]

# --- CLIP & PROJECTOR ---
class CLIPWrapper(nn.Module):
    def __init__(self, model_id):
        super().__init__()
        self.model = CLIPModel.from_pretrained(model_id)
        self.model.vision_model.requires_grad_(False)
        
    def forward(self, pixel_values):
        vision_outputs = self.model.vision_model(pixel_values=pixel_values)
        image_embeds = self.model.visual_projection(vision_outputs[1])
        return image_embeds / image_embeds.norm(dim=-1, keepdim=True)

class VisualProjector(nn.Module):
    def __init__(self, input_dim, output_dim):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, 512),
            nn.BatchNorm1d(512),
            nn.ReLU(),
            nn.Dropout(0.2), 
            nn.Linear(512, 256),
            nn.BatchNorm1d(256),
            nn.ReLU(),
            nn.Linear(256, output_dim)
        )
    def forward(self, x):
        return self.net(x)

class ImageDataset(Dataset):
    def __init__(self, hf_dataset, image_dir, processor):
        self.data = hf_dataset
        self.image_dir = image_dir
        self.processor = processor
        self.default_img = Image.new("RGB", (224, 224), (0, 0, 0))
        
    def __len__(self): return len(self.data)
    
    def __getitem__(self, idx):
        item = self.data[idx]
        item_id = str(item['item_id'])
        img_path = os.path.join(self.image_dir, f"{item_id}.jpg")
        try: image = Image.open(img_path).convert("RGB")
        except: image = self.default_img
        inputs = self.processor(images=image, return_tensors="pt")
        return {
            "pixel_values": inputs["pixel_values"].squeeze(0),
            "item_id": int(item_id)
        }

# ==========================================
# 2. MAIN PIPELINE
# ==========================================

def main():
    # --- PHASE 1: IMAGES (TEACHER 1) ---
    print("\n" + "="*50)
    print("PHASE 1: CLIP Feature Extraction")
    print("="*50)
    
    hf_ds = datasets.load_dataset("parquet", data_files=Config.ITEM_INFO_PATH, split="train")
    processor = CLIPProcessor.from_pretrained(Config.MODEL_ID)
    clip_model = CLIPWrapper(Config.MODEL_ID).to(device)
    if torch.cuda.device_count() > 1: clip_model = nn.DataParallel(clip_model)

    loader = DataLoader(
        ImageDataset(hf_ds, Config.IMAGE_DIR, processor),
        batch_size=Config.BATCH_SIZE, shuffle=False, num_workers=Config.NUM_WORKERS, pin_memory=True
    )

    image_features_list = []
    item_ids_list = []

    # TQDM sur l'extraction
    with torch.no_grad():
        for batch in tqdm(loader, desc="[1/4] Extracting CLIP"):
            pixel_values = batch["pixel_values"].to(device)
            ids = batch["item_id"].numpy()
            img_emb = clip_model(pixel_values)
            image_features_list.append(img_emb.cpu().numpy())
            item_ids_list.append(ids)

    raw_image_feats = np.vstack(image_features_list)
    all_item_ids = np.concatenate(item_ids_list)
    
    del clip_model, loader
    torch.cuda.empty_cache()
    gc.collect()

    # --- PHASE 2a: WORD2VEC (TEACHER 2) ---
    print("\n" + "="*50)
    print("PHASE 2a: Behavioral Learning (Word2Vec)")
    print("="*50)

    df_seq = pd.read_parquet(Config.SEQ_PATH, columns=['item_seq'])
    if Config.LIMIT_W2V_DATA:
        seq_data = df_seq['item_seq'].iloc[:1000000]
    else:
        seq_data = df_seq['item_seq']

    # Generator with TQDM
    sentences_stream = SequenceGenerator(seq_data, total_len=len(seq_data))
    
    print("[2/4] Training Word2Vec...")
    w2v_model = Word2Vec(
        sentences=sentences_stream, 
        vector_size=Config.TARGET_DIM, 
        window=5, 
        min_count=1, 
        workers=4,
        epochs=3
    )
    
    # --- NUMBA ALIGNMENT ---
    print("\n[INFO] Preparing Numba Alignment...")
    # 1. Extract Vocab to Arrays
    vocab_keys = list(w2v_model.wv.index_to_key)
    vocab_ids = np.array([int(k) for k in vocab_keys], dtype=np.int64)
    vocab_vecs = np.array([w2v_model.wv[k] for k in vocab_keys], dtype=np.float32)
    
    # 2. Sort for Binary Search (Required for Numba speed)
    print("[INFO] Sorting W2V Vocab...")
    sort_idx = np.argsort(vocab_ids)
    vocab_ids_sorted = vocab_ids[sort_idx]
    vocab_vecs_sorted = vocab_vecs[sort_idx]
    
    # 3. Run Numba Kernel
    print("[INFO] Running Numba Fast Alignment...")
    mask, targets = numba_fast_alignment(
        all_item_ids.astype(np.int64), 
        vocab_ids_sorted, 
        vocab_vecs_sorted, 
        Config.TARGET_DIM
    )
    
    valid_count = np.sum(mask)
    print(f"‚úÖ Items Aligned: {valid_count} / {len(all_item_ids)} ({valid_count/len(all_item_ids)*100:.1f}%)")
    
    # Filter Data for Training
    X_train = raw_image_feats[mask]
    y_train = targets[mask]
    
    del df_seq, sentences_stream, w2v_model, vocab_ids, vocab_vecs
    gc.collect()

    # --- PHASE 2b: PROJECTOR TRAINING ---
    print("\n" + "="*50)
    print("PHASE 2b: Projector Training (Distillation)")
    print("="*50)
    
    train_ds = TensorDataset(torch.FloatTensor(X_train), torch.FloatTensor(y_train))
    train_loader = DataLoader(train_ds, batch_size=256, shuffle=True)

    projector = VisualProjector(Config.INPUT_DIM, Config.TARGET_DIM).to(device)
    opt = optim.Adam(projector.parameters(), lr=Config.PROJ_LR)
    crit = nn.MSELoss()
    
    projector.train()
    
    # Loop with TQDM
    for ep in range(Config.PROJ_EPOCHS):
        total_loss = 0
        pbar = tqdm(train_loader, desc=f"Ep {ep+1}/{Config.PROJ_EPOCHS}", leave=True)
        
        for bx, by in pbar:
            bx, by = bx.to(device), by.to(device)
            
            opt.zero_grad()
            pred = projector(bx)
            loss = crit(pred, by)
            loss.backward()
            opt.step()
            
            total_loss += loss.item()
            pbar.set_postfix({'loss': f"{loss.item():.4f}"})

    # --- PHASE 3: INFERENCE ---
    print("\n" + "="*50)
    print("PHASE 3: Final Inference & Save")
    print("="*50)
    
    projector.eval()
    final_embs_list = []
    
    # Infer on ALL items (Cold Start solution)
    inf_loader = DataLoader(TensorDataset(torch.FloatTensor(raw_image_feats)), batch_size=1024, shuffle=False)
    
    with torch.no_grad():
        for (bx,) in tqdm(inf_loader, desc="[4/4] Projecting"):
            bx = bx.to(device)
            emb = projector(bx)
            # L2 Normalize
            emb = emb / (emb.norm(dim=-1, keepdim=True) + 1e-6)
            final_embs_list.append(emb.cpu().numpy())

    final_embs = np.vstack(final_embs_list)

    # Save
    print(f"Saving to {Config.OUTPUT_PATH}...")
    df_out = pl.DataFrame({
        "item_id": all_item_ids,
        "item_emb": list(final_embs)
    })
    df_out.write_parquet(Config.OUTPUT_PATH)
    
    # Update item_info
    print("Updating item_info.parquet...")
    df_info = pl.read_parquet(Config.ITEM_INFO_PATH)
    if "item_emb" in df_info.columns:
        df_info = df_info.drop("item_emb")
    
    df_info = df_info.join(df_out, on="item_id", how="left")
    df_info.write_parquet(Config.OUTPUT_INFO_UPDATED)
    
    print("\n‚úÖ [DONE] Projector Pipeline Completed.")

if __name__ == "__main__":
    main()

2025-12-15 20:10:49.287689: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1765829449.523482      47 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1765829449.595579      47 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

Running on cuda

PHASE 1: CLIP Feature Extraction


Generating train split: 0 examples [00:00, ? examples/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


preprocessor_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/592 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/389 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/605M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/605M [00:00<?, ?B/s]

[1/4] Extracting CLIP:   0%|          | 0/717 [00:00<?, ?it/s]


PHASE 2a: Behavioral Learning (Word2Vec)
[2/4] Training Word2Vec...


Stream W2V Seq:   0%|          | 0/6000000 [00:00<?, ?it/s]

Stream W2V Seq:   0%|          | 0/6000000 [00:00<?, ?it/s]

Stream W2V Seq:   0%|          | 0/6000000 [00:00<?, ?it/s]

Stream W2V Seq:   0%|          | 0/6000000 [00:00<?, ?it/s]


[INFO] Preparing Numba Alignment...
[INFO] Sorting W2V Vocab...
[INFO] Running Numba Fast Alignment...
‚úÖ Items Aligned: 91298 / 91718 (99.5%)

PHASE 2b: Projector Training (Distillation)


Ep 1/10:   0%|          | 0/357 [00:00<?, ?it/s]

Ep 2/10:   0%|          | 0/357 [00:00<?, ?it/s]

Ep 3/10:   0%|          | 0/357 [00:00<?, ?it/s]

Ep 4/10:   0%|          | 0/357 [00:00<?, ?it/s]

Ep 5/10:   0%|          | 0/357 [00:00<?, ?it/s]

Ep 6/10:   0%|          | 0/357 [00:00<?, ?it/s]

Ep 7/10:   0%|          | 0/357 [00:00<?, ?it/s]

Ep 8/10:   0%|          | 0/357 [00:00<?, ?it/s]

Ep 9/10:   0%|          | 0/357 [00:00<?, ?it/s]

Ep 10/10:   0%|          | 0/357 [00:00<?, ?it/s]


PHASE 3: Final Inference & Save


[4/4] Projecting:   0%|          | 0/90 [00:00<?, ?it/s]

Saving to item_emb_projector.parquet...
Updating item_info.parquet...

‚úÖ [DONE] Projector Pipeline Completed.


# üìô Task 1&2 : CTR Prediction

## 2.1 Pr√©paration
Les embeddings g√©n√©r√©s en Task 1 sont int√©gr√©s comme features denses.
- Renommage : item_emb $\to$ item_emb_d128.
- Contrainte : Les poids de cet embedding sont fig√©s (non-trainable) pour isoler la performance du mod√®le CTR.

## 2.2 Mod√®le : Deep Interest Network (DIN)
Pour la pr√©diction, nous utilisons DIN impl√©ment√© via FuxiCTR (PyTorch).
- Pourquoi DIN ? Contrairement aux mod√®les statiques, DIN utilise un m√©canisme d'attention locale. Il calcule dynamiquement le poids de chaque item de l'historique utilisateur en fonction de sa pertinence par rapport √† l'item candidat actuel.
- Entr√©es :
  - User Profile & Context Features.
  - User Behavior Sequence (Historique).
  - Candidate Item (Target).
  - Multimodal Embedding (128-d).

## 2.3 Entra√Ænement & Inf√©rence
- Training : Sur train.parquet et valid.parquet (Optimiseur Adam, Early Stopping).
- Prediction : Sur test.parquet avec chargement du meilleur checkpoint.

## 2.4 Livrable
Le fichier de soumission prediction_task1&2.csv est g√©n√©r√© au format requis :
ID,Task2
0,0.8123
1,0.1345
...

In [2]:
import os
import shutil
import glob
import sys
import numpy as np
import pandas as pd
import importlib.util
from tqdm.auto import tqdm
import torch

# Numba Check
try:
    from numba import njit, prange
    NUMBA_AVAILABLE = True
except ImportError:
    NUMBA_AVAILABLE = False
    print("‚ö†Ô∏è Numba non d√©tect√©. Mode JIT d√©sactiv√©.")

# ====================================================
# 0. CONFIGURATION
# ====================================================
class Config:
    # PATH TO YOUR FILE
    CUSTOM_ITEM_INFO_PATH = "/kaggle/working/item_info_updated.parquet"
    
    # Normalisation L2 (Recommended)
    NORMALIZE_EMBEDDINGS = True  
    
    # Paths
    SOURCE_DATA_DIR = "/kaggle/input/data-ctr"
    WORKING_DIR = "/kaggle/working"
    
    REPO_URL = "https://github.com/Othmane999/WWW2025_MMCTR_Challenge"
    REPO_DIR = os.path.join(WORKING_DIR, "WWW2025_MMCTR_Challenge")
    DATASET_ID = "MicroLens_1M_x1"
    DEST_DATA_DIR = os.path.join(REPO_DIR, "data", DATASET_ID)
    
    FINAL_SUBMISSION = os.path.join(WORKING_DIR, "prediction_task1&2.csv")
    BEST_MODEL_DEST = os.path.join(WORKING_DIR, "best_model_task1_and_2_tuned.pth")
    
    # Hyperparameters
    EPOCHS = 8
    EARLY_STOP_PATIENCE = 4
    LEARNING_RATE = 0.0005
    BATCH_SIZE_TRAIN = 4096
    BATCH_SIZE_INFERENCE = 4096
    
    # Architecture
    DNN_HIDDEN_UNITS = [2048, 1024, 512, 256]
    ATTENTION_HIDDEN_UNITS = [1024, 512, 256]
    EMBEDDING_DIM = 128
    ATTENTION_DROPOUT = 0.2
    NET_DROPOUT = 0.2
    EMBEDDING_REGULARIZER = 5e-7 

# ====================================================
# 1. NUMBA JIT FUNCTIONS
# ====================================================
if NUMBA_AVAILABLE:
    @njit(parallel=True, fastmath=True)
    def check_embedding_integrity_jit(emb_matrix):
        rows, cols = emb_matrix.shape
        has_error = False
        for i in prange(rows):
            if has_error: break 
            for j in range(cols):
                val = emb_matrix[i, j]
                if np.isnan(val) or np.isinf(val):
                    has_error = True
        return not has_error
    
    @njit(fastmath=True)
    def normalize_embeddings_jit(emb_matrix):
        rows, cols = emb_matrix.shape
        normalized = np.zeros_like(emb_matrix)
        for i in range(rows):
            norm = 0.0
            for j in range(cols):
                norm += emb_matrix[i, j] ** 2
            norm = np.sqrt(norm)
            if norm > 1e-8:
                for j in range(cols):
                    normalized[i, j] = emb_matrix[i, j] / norm
            else:
                for j in range(cols):
                    normalized[i, j] = emb_matrix[i, j]
        return normalized
else:
    def check_embedding_integrity_jit(emb_matrix):
        return not (np.isnan(emb_matrix).any() or np.isinf(emb_matrix).any())
    
    def normalize_embeddings_jit(emb_matrix):
        norms = np.linalg.norm(emb_matrix, axis=1, keepdims=True)
        norms = np.where(norms > 1e-8, norms, 1.0)
        return emb_matrix / norms

# ====================================================
# 2. SETUP
# ====================================================
def setup_environment_task2():
    print("\n" + "="*70)
    print(f"=== Phase 1: Setup ===")
    print("="*70)
    
    if not os.path.exists(Config.REPO_DIR):
        print(f"[INFO] Cloning {Config.REPO_URL}...")
        os.system(f"git clone {Config.REPO_URL} {Config.REPO_DIR}")
    
    print("[INFO] Installing requirements...")
    os.system(f"pip install -q -r {Config.REPO_DIR}/requirements.txt")
    print("‚úÖ Environment ready.")

# ====================================================
# 3. DATA PREPARATION (MODIFIED)
# ====================================================
def prepare_data_optimized():
    print("\n" + "="*70)
    print("=== Phase 2: Data Preparation (Column Swap) ===")
    print("="*70)

    # A. Migration Datasets
    os.makedirs(Config.DEST_DATA_DIR, exist_ok=True)
    files = ["train.parquet", "valid.parquet", "test.parquet", "item_seq.parquet"]
    
    for f in tqdm(files, desc="üìÇ Migration Datasets"):
        src = os.path.join(Config.SOURCE_DATA_DIR, f)
        dst = os.path.join(Config.DEST_DATA_DIR, f)
        if os.path.exists(src): 
            shutil.copy(src, dst)

    # B. Load Custom Item Info
    print(f"[INFO] Loading Custom File: {Config.CUSTOM_ITEM_INFO_PATH}...")
    if not os.path.exists(Config.CUSTOM_ITEM_INFO_PATH):
        print(f"üî¥ ERROR: File not found at {Config.CUSTOM_ITEM_INFO_PATH}")
        sys.exit(1)
        
    df_final = pd.read_parquet(Config.CUSTOM_ITEM_INFO_PATH)
    
    # --- SPECIFIC MODIFICATION START ---
    print(f"[INFO] Columns before processing: {df_final.columns.tolist()}")

    # 1. Drop ancient item_emb_d128 if it exists
    if 'item_emb_d128' in df_final.columns:
        print("üóëÔ∏è Dropping ancient 'item_emb_d128' column...")
        df_final.drop(columns=['item_emb_d128'], inplace=True)
    
    # 2. Rename item_emb to item_emb_d128
    if 'item_emb' in df_final.columns:
        print("üîÑ Renaming 'item_emb' to 'item_emb_d128'...")
        df_final.rename(columns={'item_emb': 'item_emb_d128'}, inplace=True)
    else:
        print("üî¥ ERROR: Column 'item_emb' not found! Cannot proceed.")
        sys.exit(1)
        
    target_emb_col = "item_emb_d128"
    # --- SPECIFIC MODIFICATION END ---

    # Fill NaNs if any
    null_mask = df_final[target_emb_col].isnull()
    if null_mask.any():
        print(f"‚ö†Ô∏è {null_mask.sum()} items without embeddings. Filling with zeros.")
        zero_vec = [0.0] * 128
        df_final.loc[null_mask, target_emb_col] = pd.Series(
            [zero_vec] * null_mask.sum(), 
            index=df_final[null_mask].index
        )

    # D. Normalisation L2
    emb_matrix = np.stack(df_final[target_emb_col].values).astype(np.float32)
    
    if Config.NORMALIZE_EMBEDDINGS:
        print("[INFO] ‚ö° Ensuring L2 Normalization...")
        emb_matrix = normalize_embeddings_jit(emb_matrix)
        df_final[target_emb_col] = list(emb_matrix)
        print("‚úÖ Embeddings normalized.")
    
    # E. Check Integrity
    print("[INFO] Checking matrix integrity...")
    if check_embedding_integrity_jit(emb_matrix):
        print(f"‚úÖ Matrix shape {emb_matrix.shape} valid.")
    else:
        print("üî¥ Error: NaNs/Infs detected in custom embeddings.")
        sys.exit(1)

    # F. Save
    dst_info = os.path.join(Config.DEST_DATA_DIR, "item_info.parquet")
    df_final.to_parquet(dst_info, index=False)
    print(f"‚úÖ item_info.parquet generated successfully.")

# ====================================================
# 4. TRAINING CONFIG
# ====================================================
def modify_default_config():
    import yaml
    
    default_config = os.path.join(Config.REPO_DIR, "config", "DIN_microlens_mmctr_tuner_config_01.yaml")
    
    if not os.path.exists(default_config):
        print(f"‚ö†Ô∏è Config not found: {default_config}")
        return None
    
    try:
        with open(default_config, 'r') as f:
            config = yaml.safe_load(f)
        
        # Hyperparams
        if 'tuner_space' in config:
            config['tuner_space']['learning_rate'] = [Config.LEARNING_RATE]
            config['tuner_space']['batch_size'] = [Config.BATCH_SIZE_TRAIN]
            config['tuner_space']['embedding_dim'] = [Config.EMBEDDING_DIM]
            config['tuner_space']['dnn_hidden_units'] = [Config.DNN_HIDDEN_UNITS]
            config['tuner_space']['attention_hidden_units'] = [Config.ATTENTION_HIDDEN_UNITS]
            config['tuner_space']['attention_dropout'] = [Config.ATTENTION_DROPOUT]
            config['tuner_space']['net_dropout'] = [Config.NET_DROPOUT]
            config['tuner_space']['embedding_regularizer'] = [float(Config.EMBEDDING_REGULARIZER)]
        
        if 'model_config' in config:
            config['model_config']['epochs'] = Config.EPOCHS
            config['model_config']['early_stop_patience'] = Config.EARLY_STOP_PATIENCE
        
        optimized_path = default_config.replace('.yaml', '_optimized.yaml')
        with open(optimized_path, 'w') as f:
            yaml.dump(config, f, default_flow_style=False, sort_keys=False, allow_unicode=True)
        
        return optimized_path
    
    except Exception as e:
        print(f"‚ö†Ô∏è Error modifying config: {e}")
        return None

# ====================================================
# 5. TRAINING LOOP
# ====================================================
def run_training_optimized():
    print("\n" + "="*70)
    print("=== Phase 3: Training ===")
    print("="*70)
    
    cwd = os.getcwd()
    os.chdir(Config.REPO_DIR)
    
    optimized_config = modify_default_config()
    
    if optimized_config:
        config_file = os.path.basename(optimized_config)
        cmd = f"python run_param_tuner.py --config config/{config_file} --gpu 0"
    else:
        cmd = "python run_param_tuner.py --config config/DIN_microlens_mmctr_tuner_config_01.yaml --gpu 0"
    
    print(f"[EXEC] {cmd}")
    
    result = os.system(cmd)
    os.chdir(cwd)
    
    ckpt_dir = os.path.join(Config.REPO_DIR, "checkpoints", Config.DATASET_ID)
    models = glob.glob(os.path.join(ckpt_dir, "*.model")) if os.path.exists(ckpt_dir) else []
    
    if models:
        print("\n‚úÖ Training finished! Model saved.")
        return True
    elif result == 0:
        print("\n‚úÖ Training finished (no model found, check logs).")
        return True
    else:
        print("\nüî¥ Training failed.")
        return False

# ====================================================
# 6. INFERENCE
# ====================================================
def load_din_class(source_path):
    if not os.path.exists(source_path): 
        return None
    try:
        spec = importlib.util.spec_from_file_location("DIN_Custom", source_path)
        module = importlib.util.module_from_spec(spec)
        sys.modules["DIN_Custom"] = module
        spec.loader.exec_module(module)
        return getattr(module, "DIN")
    except Exception as e:
        print(f"‚ö†Ô∏è Error loading DIN class: {e}")
        return None

def run_inference_optimized():
    from fuxictr.features import FeatureMap
    
    print("\n" + "="*70)
    print("=== Phase 4: Inference ===")
    print("="*70)
    
    # A. Select Best Model
    ckpt_dir = os.path.join(Config.REPO_DIR, "checkpoints", Config.DATASET_ID)
    
    if not os.path.exists(ckpt_dir):
        print(f"üî¥ Checkpoints not found: {ckpt_dir}")
        return
    
    models = glob.glob(os.path.join(ckpt_dir, "*.model"))
    if not models:
        print("üî¥ No models found.")
        return
    
    best_model_path = max(models, key=os.path.getmtime)
    print(f"[INFO] Using model: {os.path.basename(best_model_path)}")
    shutil.copy(best_model_path, Config.BEST_MODEL_DEST)

    # B. Config
    params = {
        "model_id": "DIN_Inference_Optimized",
        "dataset_id": Config.DATASET_ID,
        "data_root": Config.DEST_DATA_DIR,
        "model_root": Config.WORKING_DIR,
        "feature_cols": [],
        "dnn_hidden_units": Config.DNN_HIDDEN_UNITS,
        "attention_hidden_units": Config.ATTENTION_HIDDEN_UNITS,
        "dnn_activations": "ReLU",
        "attention_hidden_activations": "ReLU",
        "embedding_dim": Config.EMBEDDING_DIM,
        "batch_norm": True,
        "din_use_softmax": False,
        "gpu": 0,
        "task": "binary_classification",
        "loss": "binary_crossentropy",
        "metrics": ["AUC", "logloss"],
        "optimizer": "adam",
        "learning_rate": Config.LEARNING_RATE,
        "verbose": 0,
        "attention_dropout": Config.ATTENTION_DROPOUT,
        "net_dropout": Config.NET_DROPOUT
    }

    # C. Load Map & Model
    fmap_path = os.path.join(Config.DEST_DATA_DIR, "feature_map.json")
    feature_map = FeatureMap(Config.DATASET_ID, Config.DEST_DATA_DIR)
    feature_map.load(fmap_path, params)
    
    DIN = load_din_class(os.path.join(Config.REPO_DIR, "src", "DIN.py"))
    model = DIN(feature_map, **params)
    model.load_state_dict(torch.load(best_model_path, map_location="cuda:0"))
    model.to("cuda")
    model.eval()

    # D. Inference Data
    print("[INFO] Loading Test Data...")
    df_test = pd.read_parquet(os.path.join(Config.SOURCE_DATA_DIR, "test.parquet"))
    df_info = pd.read_parquet(os.path.join(Config.DEST_DATA_DIR, "item_info.parquet"))
    df = pd.merge(df_test, df_info, on="item_id", how="left")
    
    def pad_tensor(seq_list, max_len):
        padded = []
        for s in seq_list:
            if not isinstance(s, (list, np.ndarray)): s = []
            s = list(s)
            if len(s) > max_len: s = s[:max_len]
            else: s = s + [0]*(max_len - len(s))
            padded.append(s)
        return torch.tensor(padded, dtype=torch.long).to("cuda")

    # E. Batch Prediction
    all_preds = []
    num_samples = len(df)
    
    print(f"[INFO] Predicting on {num_samples} samples...")
    
    with torch.no_grad():
        for start_idx in tqdm(range(0, num_samples, Config.BATCH_SIZE_INFERENCE), desc="üöÄ Computing"):
            end_idx = min(start_idx + Config.BATCH_SIZE_INFERENCE, num_samples)
            batch_df = df.iloc[start_idx:end_idx]
            bs = len(batch_df)
            
            # Input Dict Construction
            batch_input = {}
            if 'user_id' in batch_df.columns:
                batch_input['user_id'] = torch.tensor(batch_df['user_id'].values, dtype=torch.long).to("cuda")
            
            for c in ['likes_level', 'views_level']:
                if c in batch_df.columns:
                    batch_input[c] = torch.tensor(batch_df[c].fillna(0).values, dtype=torch.long).to("cuda")
            
            # Item Inputs
            target_ids = torch.tensor(batch_df['item_id'].values, dtype=torch.long).to("cuda")
            dummy_ids = torch.zeros_like(target_ids)
            
            item_input = {}
            item_input['item_id'] = torch.stack([dummy_ids, target_ids], dim=1).view(-1)
            
            if 'item_tags' in batch_df.columns:
                target_tags = pad_tensor(batch_df['item_tags'].values, 5)
                dummy_tags = torch.zeros_like(target_tags)
                item_input['item_tags'] = torch.stack([dummy_tags, target_tags], dim=1).view(-1, 5)
            
            # Use the RENAMED column
            if 'item_emb_d128' in batch_df.columns:
                embs_np = np.stack(batch_df['item_emb_d128'].values)
                target_emb = torch.tensor(embs_np, dtype=torch.float32).to("cuda")
                dummy_emb = torch.zeros_like(target_emb)
                item_input['item_emb_d128'] = torch.stack([dummy_emb, target_emb], dim=1).view(-1, 128)

            mask = torch.ones((bs, 1)).to("cuda")
            out = model.forward((batch_input, item_input, mask))
            pred = out['y_pred'] if isinstance(out, dict) else out
            all_preds.extend(pred.cpu().numpy().flatten())

    # F. Submission
    sub = pd.DataFrame({"ID": range(len(all_preds)), "Task1&2": all_preds})
    sub.to_csv(Config.FINAL_SUBMISSION, index=False)
    print(f"‚úÖ Prediction saved to {Config.FINAL_SUBMISSION}")

# ====================================================
# 7. EXECUTION
# ====================================================
if __name__ == "__main__":
    setup_environment_task2()
    prepare_data_optimized()
    if run_training_optimized():
        run_inference_optimized()


=== Phase 1: Setup ===
[INFO] Cloning https://github.com/Othmane999/WWW2025_MMCTR_Challenge...


Cloning into '/kaggle/working/WWW2025_MMCTR_Challenge'...


[INFO] Installing requirements...
   ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 88.1/88.1 kB 3.8 MB/s eta 0:00:00
   ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 42.6/42.6 kB 3.2 MB/s eta 0:00:00
   ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 31.0/31.0 MB 59.9 MB/s eta 0:00:00
‚úÖ Environment ready.

=== Phase 2: Data Preparation (Column Swap) ===


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf-polars-cu12 25.6.0 requires polars<1.29,>=1.25, but you have polars 1.0.0 which is incompatible.
cudf-polars-cu12 25.6.0 requires pylibcudf-cu12==25.6.*, but you have pylibcudf-cu12 25.2.2 which is incompatible.


üìÇ Migration Datasets:   0%|          | 0/4 [00:00<?, ?it/s]

[INFO] Loading Custom File: /kaggle/working/item_info_updated.parquet...
[INFO] Columns before processing: ['item_id', 'item_tags', 'item_emb_d128', 'item_emb']
üóëÔ∏è Dropping ancient 'item_emb_d128' column...
üîÑ Renaming 'item_emb' to 'item_emb_d128'...
[INFO] ‚ö° Ensuring L2 Normalization...
‚úÖ Embeddings normalized.
[INFO] Checking matrix integrity...
‚úÖ Matrix shape (91718, 128) valid.


prange or pndindex loop will not be executed in parallel due to there being more than one entry to or exit from the loop (e.g., an assertion).
[1m
File "../../tmp/ipykernel_47/4018851867.py", line 64:[0m
[1m<source missing, REPL/exec in use?>[0m
[0m
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see https://numba.readthedocs.io/en/stable/user/parallel.html#diagnostics for help.
[1m
File "../../tmp/ipykernel_47/4018851867.py", line 60:[0m
[1m<source missing, REPL/exec in use?>[0m
[0m


‚úÖ item_info.parquet generated successfully.

=== Phase 3: Training ===
[EXEC] python run_param_tuner.py --config config/DIN_microlens_mmctr_tuner_config_01_optimized.yaml --gpu 0


2025-12-15 20:38:51,266 P1707 INFO FuxiCTR version: 2.3.7
2025-12-15 20:38:51,266 P1707 INFO Params: {
    "accumulation_steps": "1",
    "attention_dropout": "0.2",
    "attention_hidden_activations": "ReLU",
    "attention_hidden_units": "[1024, 512, 256]",
    "attention_output_activation": "None",
    "batch_norm": "True",
    "batch_size": "4096",
    "data_format": "parquet",
    "data_root": "./data/",
    "dataset_id": "MicroLens_1M_x1",
    "debug_mode": "False",
    "din_use_softmax": "False",
    "dnn_activations": "ReLU",
    "dnn_hidden_units": "[2048, 1024, 512, 256]",
    "early_stop_patience": "3",
    "embedding_dim": "128",
    "embedding_regularizer": "5e-07",
    "epochs": "5",
    "eval_steps": "None",
    "feature_cols": "[{'active': True, 'dtype': 'int', 'name': 'user_id', 'type': 'meta'}, {'active': True, 'dtype': 'int', 'name': 'item_seq', 'type': 'meta'}, {'active': True, 'dtype': 'int', 'name': 'likes_level', 'type': 'categorical', 'vocab_size': 11}, {'active

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ| 878/879 [16:48<00:01,  1.16s/it]

2025-12-15 20:55:52,346 P1707 INFO Train loss: 0.136696
2025-12-15 20:55:52,346 P1707 INFO Evaluation @epoch 1 - batch 879: 



  0%|          | 0/3 [00:00<?, ?it/s][A
 33%|‚ñà‚ñà‚ñà‚ñé      | 1/3 [00:01<00:02,  1.08s/it][A
 67%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã   | 2/3 [00:01<00:00,  1.44it/s][A
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3/3 [00:01<00:00,  1.73it/s][A
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 879/879 [16:51<00:00,  1.15s/it]
  0%|          | 0/879 [00:00<?, ?it/s]

2025-12-15 20:55:54,096 P1707 INFO [Metrics] AUC: 0.808423
2025-12-15 20:55:54,097 P1707 INFO Save best model: monitor(max)=0.808423
2025-12-15 20:55:54,273 P1707 INFO ************ Epoch=1 end ************


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ| 878/879 [16:45<00:01,  1.16s/it]

2025-12-15 21:12:40,572 P1707 INFO Train loss: 0.049287
2025-12-15 21:12:40,573 P1707 INFO Evaluation @epoch 2 - batch 879: 



  0%|          | 0/3 [00:00<?, ?it/s][A
 33%|‚ñà‚ñà‚ñà‚ñé      | 1/3 [00:01<00:02,  1.07s/it][A
 67%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã   | 2/3 [00:01<00:00,  1.46it/s][A
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3/3 [00:01<00:00,  1.75it/s][A
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 879/879 [16:48<00:00,  1.70s/it]

2025-12-15 21:12:42,296 P1707 INFO [Metrics] AUC: 0.833927
2025-12-15 21:12:42,297 P1707 INFO Save best model: monitor(max)=0.833927


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 879/879 [16:48<00:00,  1.15s/it]
  0%|          | 0/879 [00:00<?, ?it/s]

2025-12-15 21:12:42,545 P1707 INFO ************ Epoch=2 end ************


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ| 878/879 [16:44<00:01,  1.15s/it]

2025-12-15 21:29:28,022 P1707 INFO Train loss: 0.030970
2025-12-15 21:29:28,022 P1707 INFO Evaluation @epoch 3 - batch 879: 



  0%|          | 0/3 [00:00<?, ?it/s][A
 33%|‚ñà‚ñà‚ñà‚ñé      | 1/3 [00:01<00:02,  1.11s/it][A
 67%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã   | 2/3 [00:01<00:00,  1.42it/s][A
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3/3 [00:01<00:00,  1.70it/s][A


2025-12-15 21:29:29,794 P1707 INFO [Metrics] AUC: 0.834422
2025-12-15 21:29:29,795 P1707 INFO Save best model: monitor(max)=0.834422


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 879/879 [16:47<00:00,  1.15s/it]
  0%|          | 0/879 [00:00<?, ?it/s]

2025-12-15 21:29:30,051 P1707 INFO ************ Epoch=3 end ************


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ| 878/879 [16:43<00:01,  1.15s/it]

2025-12-15 21:46:14,952 P1707 INFO Train loss: 0.020916
2025-12-15 21:46:14,953 P1707 INFO Evaluation @epoch 4 - batch 879: 



  0%|          | 0/3 [00:00<?, ?it/s][A
 33%|‚ñà‚ñà‚ñà‚ñé      | 1/3 [00:01<00:02,  1.08s/it][A
 67%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã   | 2/3 [00:01<00:00,  1.42it/s][A
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3/3 [00:01<00:00,  1.70it/s][A


2025-12-15 21:46:16,728 P1707 INFO [Metrics] AUC: 0.856528
2025-12-15 21:46:16,728 P1707 INFO Save best model: monitor(max)=0.856528


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 879/879 [16:46<00:00,  1.15s/it]
  0%|          | 0/879 [00:00<?, ?it/s]

2025-12-15 21:46:16,979 P1707 INFO ************ Epoch=4 end ************


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ| 878/879 [16:43<00:01,  1.16s/it]

2025-12-15 22:03:01,564 P1707 INFO Train loss: 0.014530
2025-12-15 22:03:01,564 P1707 INFO Evaluation @epoch 5 - batch 879: 



  0%|          | 0/3 [00:00<?, ?it/s][A
 33%|‚ñà‚ñà‚ñà‚ñé      | 1/3 [00:01<00:02,  1.07s/it][A
 67%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã   | 2/3 [00:01<00:00,  1.43it/s][A
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3/3 [00:01<00:00,  1.73it/s][A
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 879/879 [16:46<00:00,  1.70s/it]

2025-12-15 22:03:03,313 P1707 INFO [Metrics] AUC: 0.869683
2025-12-15 22:03:03,313 P1707 INFO Save best model: monitor(max)=0.869683


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 879/879 [16:46<00:00,  1.15s/it]
  0%|          | 0/3 [00:00<?, ?it/s]

2025-12-15 22:03:03,549 P1707 INFO ************ Epoch=5 end ************
2025-12-15 22:03:03,549 P1707 INFO Training finished.
2025-12-15 22:03:03,549 P1707 INFO Load best model: /kaggle/working/WWW2025_MMCTR_Challenge/checkpoints/MicroLens_1M_x1/DIN_MicroLens_1M_x1_001_2cd806c2.model
2025-12-15 22:03:03,622 P1707 INFO ****** Validation evaluation ******


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3/3 [00:01<00:00,  1.71it/s]


2025-12-15 22:03:05,400 P1707 INFO [Metrics] AUC: 0.869683 - logloss: 1.807775


Enumerate all tuner configurations done.

‚úÖ Training finished! Model saved.

=== Phase 4: Inference ===
[INFO] Using model: DIN_MicroLens_1M_x1_001_2cd806c2.model
[INFO] Loading Test Data...
[INFO] Predicting on 379142 samples...


üöÄ Computing:   0%|          | 0/93 [00:00<?, ?it/s]

‚úÖ Prediction saved to /kaggle/working/prediction_task1&2.csv
