# 04: Two-Stage Ranker Notebook

This notebook trains and evaluates an MLP ranker on two-stage candidate lists for Jarir:

- **Config & Imports**  
- **Load Sequences & Maps**  
- **Load Embeddings & Feature Tensors**  
- **Candidate Generation**  
- **Feature Engineering (sharded)**  
- **Ranker Model Definition**  
- **Training & Evaluation**  
- **Comparison with Baselines**  
- **Save Results**  
- **Final Summary**  
- **Feature Importance Analysis**  

## 0. Config & Imports

- Import standard libraries and PyTorch  
- Set up data paths and I/O settings  
- Define configuration dictionary `CFG`  
- Initialize device and random seeds 

In [1]:
# --- 0) Config & Imports --------------------------------------------------------
import os, json, math, time, gc, glob, bisect
from pathlib import Path
import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

import matplotlib.pyplot as plt

torch.set_float32_matmul_precision('high')

# Data paths
OUT_DIR = Path('../data/processed/jarir/')
OUT_DIR.mkdir(parents=True, exist_ok=True)
print("OUT_DIR:", OUT_DIR)

READ_KW  = dict(engine="fastparquet")
WRITE_KW = dict(engine="fastparquet", index=False)

CFG = {
    # Retrieval
    "cand_topk": 100,
    "cand_batch": 4096,          # Candidate generation batch size
    # Histories & features
    "hist_max": 15,               # Reduced for Jarir (smaller sequences)
    # Feature building
    "feat_batch_q": 1024,        # queries per GPU batch when building features
    "shard_rows": 1_000_000,     # approx rows per Parquet shard (features)
    "neg_per_query": 20,         # keep 1 pos + N hard negatives per query
    "hard_negatives": True,      # choose hardest by dot_uv; False=random
    # Ranker
    "batch_size": 2048,          # larger thanks to AMP
    "epochs": 20,
    "patience": 5,
    "lr": 1e-3,
    "weight_decay": 1e-4,
    "dropout": 0.2,
    "hidden": 256,               # Smaller for Jarir
    "eval_topk": 10,
    "seed": 42,
    "use_text": False,           # No text embeddings for Jarir
    # Two-Tower embeddings
    "embedding_dim": 256,
}

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device:", device)
torch.manual_seed(CFG["seed"])
if device.type == "cuda":
    torch.cuda.manual_seed_all(CFG["seed"])

OUT_DIR: ../data/processed/jarir
Device: cuda


## 1. Load Sequences & Maps
- Load precomputed train/val/test sequence tables
- Load item and customer ID maps
- Compute popularity counts for baseline features

In [2]:
# Load sequences
seq_train = pd.read_parquet(OUT_DIR/'sequences_train.parquet', **READ_KW)
seq_val   = pd.read_parquet(OUT_DIR/'sequences_val.parquet', **READ_KW)
seq_test  = pd.read_parquet(OUT_DIR/'sequences_test.parquet', **READ_KW)
print("Train/Val/Test:", seq_train.shape, seq_val.shape, seq_test.shape)

# Load item and customer maps
item_map     = pd.read_parquet(OUT_DIR/'item_id_map.parquet', **READ_KW)
customer_map = pd.read_parquet(OUT_DIR/'customer_id_map.parquet', **READ_KW)
print("Items:", len(item_map), "Customers:", len(customer_map))

# Popularity for features/fallback
pop_counts = seq_train['pos_item_idx'].value_counts()
pop_norm   = (pop_counts - pop_counts.min()) / (pop_counts.max() - pop_counts.min() + 1e-9)


Train/Val/Test: (1108, 6) (169, 6) (160, 6)
Items: 1735 Customers: 929


## 2. Load Two-Tower Embeddings & Feature Tensors
- Load or generate user/item embeddings from two‐tower model
- Move embeddings to GPU tensors
- Build popularity and price z‐score feature tensors

In [3]:
# Load Two-Tower embeddings
USER_EMB_PATH = OUT_DIR/'user_embeddings.npy'
ITEM_EMB_PATH = OUT_DIR/'item_embeddings.npy'

if USER_EMB_PATH.exists() and ITEM_EMB_PATH.exists():
    USER_EMB = np.load(USER_EMB_PATH).astype('float32')
    ITEM_EMB = np.load(ITEM_EMB_PATH).astype('float32')
    print("Two-Tower embeddings loaded:", USER_EMB.shape, ITEM_EMB.shape)
else:
    print("❌ Missing Two-Tower embeddings. Please run notebook 03 first.")
    n_users = len(customer_map)
    n_items = len(item_map)
    USER_EMB = np.random.randn(n_users, CFG["embedding_dim"]).astype('float32')
    ITEM_EMB = np.random.randn(n_items, CFG["embedding_dim"]).astype('float32')
    USER_EMB /= np.linalg.norm(USER_EMB, axis=1, keepdims=True)
    ITEM_EMB /= np.linalg.norm(ITEM_EMB, axis=1, keepdims=True)
    print("Using dummy embeddings for testing")

# GPU tensors for embeddings
USER_EMB_T = torch.from_numpy(USER_EMB).to(device, non_blocking=True)
ITEM_EMB_T = torch.from_numpy(ITEM_EMB).to(device, non_blocking=True)

# Popularity tensor
pop_vec = torch.zeros(len(item_map), dtype=torch.float32, device=device)
pop_idx = torch.tensor(pop_counts.index.values, dtype=torch.long, device=device)
pop_val = torch.tensor(pop_norm.loc[pop_counts.index].values, dtype=torch.float32, device=device)
pop_vec[pop_idx] = pop_val

# Load item metadata for additional features
items_clean_path = OUT_DIR/'items_clean.parquet'
price_z = None
if items_clean_path.exists():
    items_clean = pd.read_parquet(items_clean_path, **READ_KW)
    if 'price_median' in items_clean.columns:
        m = items_clean[['stock_code','price_median']].dropna()
        m = m.merge(item_map, on='stock_code', how='inner')
        if len(m) > 0:
            mu, sigma = m['price_median'].mean(), m['price_median'].std() + 1e-6
            z = ((m['price_median'] - mu) / sigma).astype(float)
            price_z = torch.zeros(len(item_map), dtype=torch.float32, device=device)
            ii = torch.tensor(m['item_idx'].astype(int).values, dtype=torch.long, device=device)
            price_z[ii] = torch.tensor(z.values, dtype=torch.float32, device=device)
            print("Price features loaded")


Two-Tower embeddings loaded: (929, 256) (1735, 256)
Price features loaded


## 3. Candidate Generation
 
- Define helpers to parse history strings and compute user vectors
- Build tensor of history indices
- Generate top‐K candidates per user from two‐tower embeddings
- Save and reload candidate tables

In [4]:
def parse_hist(s):
    if not isinstance(s, str) or not s.strip():
        return []
    return [int(x) for x in s.strip().split()]

@torch.no_grad()
def user_vecs_from_hist_batch(hist_tensor):
    """Compute user vectors from history using Two-Tower item embeddings"""
    B, L = hist_tensor.shape
    safe_idx = hist_tensor.clamp(min=0)
    H = ITEM_EMB_T.index_select(0, safe_idx.view(-1)).view(B, L, -1)
    mask = (hist_tensor >= 0).float().unsqueeze(-1)
    U = (H * mask).sum(1) / mask.sum(1).clamp_min(1e-6)
    return F.normalize(U, dim=-1)

def build_hist_tensor(series, L):
    """Build history tensor from string series"""
    B = len(series)
    H = torch.full((B, L), -1, dtype=torch.long)
    for i, s in enumerate(series):
        h = parse_hist(s)
        if len(h) > L: h = h[-L:]
        if h:
            H[i, -len(h):] = torch.tensor(h, dtype=torch.long)
    return H

def gen_candidates_gpu(df, topk=100, batch_q=4096):
    """Generate candidates using Two-Tower embeddings"""
    hist_series = df['history_idx'].astype(str).tolist()
    H = build_hist_tensor(hist_series, CFG["hist_max"])
    U_chunks = []
    for i in range(0, H.size(0), batch_q):
        Ub = user_vecs_from_hist_batch(H[i:i+batch_q].to(device))
        U_chunks.append(Ub.cpu())
    U = torch.cat(U_chunks, 0).numpy().astype('float32')

    with torch.no_grad():
        U_t = torch.from_numpy(U).to(device)
        sims = U_t @ ITEM_EMB_T.t()
        _, top_k_indices = torch.topk(sims, k=topk, dim=1)
        I_full = top_k_indices.cpu().numpy().astype('int32')

    pos_list = df['pos_item_idx'].astype(int).tolist()
    ts_list  = df['ts'].astype(str).tolist() if 'ts' in df.columns else ['']*len(pos_list)
    rows = []
    for pos, cand_idx, ts, h_s in zip(pos_list, I_full, ts_list, hist_series):
        cand = cand_idx.tolist()
        if pos not in cand:
            cand[-1] = int(pos)
        rows.append((h_s, int(pos), " ".join(map(str,cand)), ts))
    return pd.DataFrame(rows, columns=['history_idx','pos_item_idx','cands','ts'])

# Build & save candidates if missing
CAND_TRAIN_PATH = OUT_DIR/'candidates_train.parquet'
CAND_VAL_PATH   = OUT_DIR/'candidates_val.parquet'
CAND_TEST_PATH  = OUT_DIR/'candidates_test.parquet'

if not (CAND_TRAIN_PATH.exists() and CAND_VAL_PATH.exists() and CAND_TEST_PATH.exists()):
    print("Generating training candidates (GPU)...")
    gen_candidates_gpu(seq_train, topk=CFG["cand_topk"], batch_q=CFG["cand_batch"]).to_parquet(CAND_TRAIN_PATH, **WRITE_KW)
    print("Generating validation candidates (GPU)...")
    gen_candidates_gpu(seq_val,   topk=CFG["cand_topk"], batch_q=CFG["cand_batch"]).to_parquet(CAND_VAL_PATH,   **WRITE_KW)
    print("Generating test candidates (GPU)...")
    gen_candidates_gpu(seq_test,  topk=CFG["cand_topk"], batch_q=CFG["cand_batch"]).to_parquet(CAND_TEST_PATH,  **WRITE_KW)

cand_train = pd.read_parquet(CAND_TRAIN_PATH, **READ_KW)
cand_val   = pd.read_parquet(CAND_VAL_PATH,   **READ_KW)
cand_test  = pd.read_parquet(CAND_TEST_PATH,  **READ_KW)
print("Candidates:", cand_train.shape, cand_val.shape, cand_test.shape)


Generating training candidates (GPU)...
Generating validation candidates (GPU)...
Generating test candidates (GPU)...
Candidates: (1108, 4) (169, 4) (160, 4)


## 4. Feature Engineering on GPU (Sharded)
- Pack features (dot product, max-sim, popularity, hist-length, price z)
- Subsample hard negatives per query
- Shard features into Parquet files

In [5]:
def _pack_batch_features(Ub, Hb, Cb, Pb):
    """Pack features for a batch of candidates"""
    B, d = Ub.shape
    K = Cb.size(1)
    L = Hb.size(1)

    Vc = ITEM_EMB_T.index_select(0, Cb.view(-1)).view(B, K, d)
    dot_uv = (Ub.unsqueeze(1) * Vc).sum(-1)

    safe_hist = Hb.clamp(min=0)
    Hvec = ITEM_EMB_T.index_select(0, safe_hist.view(-1)).view(B, L, d)
    Hvec = F.normalize(Hvec, dim=-1); Vc_n = F.normalize(Vc, dim=-1)
    sims = torch.matmul(Hvec, Vc_n.transpose(1,2))
    maskL = (Hb >= 0).unsqueeze(-1).float()
    sims = sims + (maskL - 1.0) * 1e9
    max_sim_recent = sims.max(dim=1).values

    pop    = pop_vec.index_select(0, Cb.view(-1)).view(B, K)
    hlen   = (Hb >= 0).float().sum(1)/float(CFG["hist_max"])
    hlen   = hlen.unsqueeze(1).expand(B, K)
    price  = price_z.index_select(0, Cb.view(-1)).view(B, K) if isinstance(price_z, torch.Tensor) else torch.zeros((B,K), device=Ub.device)
    labels = (Cb == Pb.view(-1,1)).float()

    return {"dot_uv": dot_uv, "max_sim_recent": max_sim_recent, "pop": pop,
            "hist_len": hlen, "price_z": price, "label": labels, "item_idx": Cb.float()}

def _select_negatives(feats):
    """Keep 1 positive + N negatives per query if configured"""
    if CFG["neg_per_query"] is None:
        return feats
    B, K = feats["label"].shape
    pos_col = torch.argmax(feats["label"], dim=1, keepdim=True)
    if CFG["hard_negatives"]:
        neg_scores = feats["dot_uv"].clone()
        neg_scores.scatter_(1, pos_col, -1e9)
        _, neg_idx = torch.topk(neg_scores, k=min(CFG["neg_per_query"], K-1), dim=1)
    else:
        rnd = torch.rand_like(feats["dot_uv"])
        rnd.scatter_(1, pos_col, 1e9)
        _, neg_idx = torch.topk(-rnd, k=min(CFG["neg_per_query"], K-1), dim=1)
    keep_cols = torch.cat([pos_col, neg_idx], dim=1)
    for k in ["dot_uv","max_sim_recent","pop","hist_len","price_z","label","item_idx"]:
        feats[k] = torch.gather(feats[k], 1, keep_cols)
    return feats

def build_feats_gpu_sharded(cand_df, split_name):
    """Build features on GPU and save to sharded Parquet files"""
    N = len(cand_df)
    L = CFG["hist_max"]
    bq = CFG["feat_batch_q"]
    out_dir = OUT_DIR / f"ranker_feats_{split_name}_shards"
    out_dir.mkdir(parents=True, exist_ok=True)

    def write_shard(idx, feats_dict):
        cpu = {k: v.detach().float().view(-1).cpu().numpy() for k,v in feats_dict.items()}
        df = pd.DataFrame(cpu); df['item_idx'] = df['item_idx'].astype(np.int32)
        df.to_parquet(out_dir / f"part_{idx:03d}.parquet", **WRITE_KW)

    hist_series  = cand_df['history_idx'].astype(str).tolist()
    pos_list     = cand_df['pos_item_idx'].astype(int).tolist()
    cands_series = cand_df['cands'].astype(str).tolist()

    rows_written = 0; shard_idx = 0; buf = None
    for i in range(0, N, bq):
        H = build_hist_tensor(hist_series[i:i+bq], L).to(device, non_blocking=True)
        P = torch.tensor(pos_list[i:i+bq], dtype=torch.long, device=device)
        C = torch.tensor([[int(x) for x in s.split()] for s in cands_series[i:i+bq]],
                         dtype=torch.long, device=device)
        U = user_vecs_from_hist_batch(H)
        feats = _pack_batch_features(U, H, C, P)
        feats = _select_negatives(feats)

        if buf is None:
            buf = {k: v.detach().clone() for k,v in feats.items()}
        else:
            for k in buf:
                buf[k] = torch.cat([buf[k], feats[k]], dim=0)

        rows_in_buf = int(buf["label"].numel())
        if rows_in_buf >= CFG["shard_rows"]:
            write_shard(shard_idx, buf); shard_idx += 1
            buf = None; torch.cuda.empty_cache()

        rows_written += int(feats["label"].numel())
        if (i//bq) % 10 == 0:
            print(f"[{split_name}] Built ~{rows_written/1e6:.2f}M rows...")

    if buf is not None:
        write_shard(shard_idx, buf)

    print(f"[{split_name}] Done. Rows ~{rows_written:,}. Shards -> {out_dir}")
    return out_dir

# Build shards if missing
train_shards_dir = OUT_DIR/"ranker_feats_train_shards"
val_shards_dir   = OUT_DIR/"ranker_feats_val_shards"
test_shards_dir  = OUT_DIR/"ranker_feats_test_shards"

if not train_shards_dir.exists():
    train_shards_dir = build_feats_gpu_sharded(cand_train, "train")
if not val_shards_dir.exists():
    val_shards_dir   = build_feats_gpu_sharded(cand_val,   "val")
if not test_shards_dir.exists():
    test_shards_dir  = build_feats_gpu_sharded(cand_test,  "test")

print("Shard dirs:", train_shards_dir, val_shards_dir, test_shards_dir)


[train] Built ~0.02M rows...
[train] Done. Rows ~23,268. Shards -> ../data/processed/jarir/ranker_feats_train_shards
[val] Built ~0.00M rows...
[val] Done. Rows ~3,549. Shards -> ../data/processed/jarir/ranker_feats_val_shards
[test] Built ~0.00M rows...
[test] Done. Rows ~3,360. Shards -> ../data/processed/jarir/ranker_feats_test_shards
Shard dirs: ../data/processed/jarir/ranker_feats_train_shards ../data/processed/jarir/ranker_feats_val_shards ../data/processed/jarir/ranker_feats_test_shards


## 5. Ranker Model Definition
- Define feature columns and data loader for shard batches
- Implement RankerMLP neural network class
- Initialize model, optimizer, loss, and mixed‐precision scaler

In [6]:
RANKER_COLS = ["dot_uv","max_sim_recent","pop","hist_len","price_z"]

def shard_batches(files, batch_size):
    """Generate batches from sharded Parquet files"""
    for f in sorted(glob.glob(str(Path(files)/"part_*.parquet"))):
        df = pd.read_parquet(f, engine="fastparquet", columns=RANKER_COLS+["label"])
        X_np = df[RANKER_COLS].to_numpy(dtype='float32', copy=False)
        y_np = df["label"].to_numpy(dtype='float32', copy=False)
        X = torch.from_numpy(X_np).to(device, non_blocking=True)
        y = torch.from_numpy(y_np).to(device, non_blocking=True)
        perm = torch.randperm(X.size(0), device=device)
        for i in range(0, X.size(0), batch_size):
            idx = perm[i:i+batch_size]
            yield X[idx], y[idx]
    torch.cuda.empty_cache()

class RankerMLP(nn.Module):
    """MLP ranker for candidate re-ranking"""
    def __init__(self, d_in, hidden=256, dropout=0.2):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(d_in, hidden), nn.ReLU(), nn.Dropout(dropout),
            nn.Linear(hidden, hidden//2), nn.ReLU(), nn.Dropout(dropout),
            nn.Linear(hidden//2, 1)
        )
    def forward(self, x): return self.net(x).squeeze(-1)

model   = RankerMLP(len(RANKER_COLS), hidden=CFG["hidden"], dropout=CFG["dropout"]).to(device)
opt     = torch.optim.AdamW(model.parameters(), lr=CFG["lr"], weight_decay=CFG["weight_decay"])
loss_fn = nn.BCEWithLogitsLoss()
use_amp = (device.type == "cuda")
amp_device = "cuda" if use_amp else "cpu"
scaler  = torch.amp.GradScaler(amp_device, enabled=use_amp)
torch.backends.cudnn.benchmark = True

## 6. Training & Evaluation
- Define train_ranker() to run epoch loops with mixed‐precision
- Evaluate on validation shards each epoch
- Implement rerank_one_batch and eval_reranked for final metrics

In [7]:
def rerank_one_batch(cand_df_slice):
    """Re-rank candidates for one batch"""
    H = build_hist_tensor(cand_df_slice['history_idx'].astype(str).tolist(), CFG["hist_max"]).to(device)
    P = torch.tensor(cand_df_slice['pos_item_idx'].astype(int).tolist(), dtype=torch.long, device=device)
    C = torch.tensor([[int(x) for x in s.split()] for s in cand_df_slice['cands'].astype(str).tolist()],
                     dtype=torch.long, device=device)
    U = user_vecs_from_hist_batch(H)
    feats = _pack_batch_features(U, H, C, P)
    X = torch.stack([feats[c] for c in RANKER_COLS], dim=-1).view(-1, len(RANKER_COLS))
    scores = model(X).view(len(feats["label"]), -1)
    topk = min(CFG["eval_topk"], C.size(1))
    _, idx = torch.topk(scores, k=topk, dim=1)
    return [C[i][idx[i]].tolist() for i in range(C.size(0))]

def eval_reranked(cand_df, split="val"):
    """Evaluate re-ranked candidates"""
    model.eval()
    hits = 0; ndcgs = 0.0; tot = 0
    B = 1024
    for i in range(0, len(cand_df), B):
        batch_df = cand_df.iloc[i:i+B]
        reranked = rerank_one_batch(batch_df)
        for pos, rr in zip(batch_df['pos_item_idx'].tolist(), reranked):
            hits += float(int(pos) in rr)
            if int(pos) in rr:
                rank = rr.index(int(pos)) + 1
                ndcgs += 1.0 / math.log2(rank + 1.0)
            tot += 1
    return hits/max(1,tot), ndcgs/max(1,tot)

def train_ranker():
    """Train the ranker model"""
    best_recall = -1.0; bad = 0
    for ep in range(1, CFG["epochs"]+1):
        model.train(); total_loss=0.0; nobs=0
        for Xb, yb in shard_batches(train_shards_dir, CFG["batch_size"]):
            opt.zero_grad(set_to_none=True)
            with torch.amp.autocast(amp_device, enabled=use_amp, dtype=torch.float16):
                logits = model(Xb)
                loss   = loss_fn(logits, yb)
            scaler.scale(loss).backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
            scaler.step(opt); scaler.update()
            total_loss += float(loss.item()) * yb.numel()
            nobs       += yb.numel()

        val_recall, val_ndcg = eval_reranked(cand_val, split="val")
        print(f"Epoch {ep:02d} | train BCE {total_loss/max(1,nobs):.4f} | "
              f"val Recall@{CFG['eval_topk']} {val_recall:.4f} | val NDCG@{CFG['eval_topk']} {val_ndcg:.4f}")
        if val_recall > best_recall + 1e-4:
            best_recall, bad = val_recall, 0
            torch.save(model.state_dict(), OUT_DIR/'ranker_best.pt')
        else:
            bad += 1
            if bad >= CFG["patience"]:
                print("Early stopping on Recall@10."); break
    print("Best val Recall@{} = {:.4f}".format(CFG["eval_topk"], best_recall))

print("\n" + "="*50)
print("TRAINING RANKER")
print("="*50)
train_ranker()

print("\n" + "="*50)
print("FINAL EVALUATION")
print("="*50)
model.load_state_dict(torch.load(OUT_DIR/'ranker_best.pt', map_location=device, weights_only=True))
test_recall, test_ndcg = eval_reranked(cand_test, split="test")
print(f"TEST — Recall@{CFG['eval_topk']}: {test_recall:.4f}, NDCG@{CFG['eval_topk']}: {test_ndcg:.4f}")



TRAINING RANKER
Epoch 01 | train BCE 0.4653 | val Recall@10 0.4911 | val NDCG@10 0.3792
Epoch 02 | train BCE 0.2149 | val Recall@10 0.3787 | val NDCG@10 0.2760
Epoch 03 | train BCE 0.1918 | val Recall@10 0.4852 | val NDCG@10 0.3701
Epoch 04 | train BCE 0.1868 | val Recall@10 0.7337 | val NDCG@10 0.6177
Epoch 05 | train BCE 0.1819 | val Recall@10 0.7870 | val NDCG@10 0.7043
Epoch 06 | train BCE 0.1772 | val Recall@10 0.7337 | val NDCG@10 0.6327
Epoch 07 | train BCE 0.1730 | val Recall@10 0.6982 | val NDCG@10 0.5967
Epoch 08 | train BCE 0.1679 | val Recall@10 0.6627 | val NDCG@10 0.5524
Epoch 09 | train BCE 0.1634 | val Recall@10 0.6272 | val NDCG@10 0.5363
Epoch 10 | train BCE 0.1590 | val Recall@10 0.6568 | val NDCG@10 0.5492
Early stopping on Recall@10.
Best val Recall@10 = 0.7870

FINAL EVALUATION
TEST — Recall@10: 0.8063, NDCG@10: 0.7193


## 7. Comparison with Baselines
- Load precomputed baseline results
- Print Recall@K for each baseline and compare

In [8]:
print("\n" + "="*50)
print("COMPARISON WITH BASELINES")
print("="*50)

baseline_results_path = OUT_DIR/'baseline_results.json'
if baseline_results_path.exists():
    with open(baseline_results_path, 'r') as f:
        baseline_results = json.load(f)
    print("Baseline Results (Held-out Interactions):")
    for model_name, results in baseline_results['held_out_interactions'].items():
        print(f"  {model_name}: {results}")
    best_baseline = max(
        [(m, r[f"Recall@{CFG['eval_topk']}"]) for m,r in baseline_results['held_out_interactions'].items()],
        key=lambda x: x[1]
    )
    print(f"\nBest baseline ({best_baseline[0]}): Recall@{CFG['eval_topk']} = {best_baseline[1]:.4f}")
    print(f"Two-Stage Ranker: Recall@{CFG['eval_topk']} = {test_recall:.4f}")
    improvement = ((test_recall - best_baseline[1]) / best_baseline[1]) * 100 if best_baseline[1]>0 else 0
    print(f"Improvement: {improvement:+.2f}%")
else:
    print("No baseline results found for comparison")



COMPARISON WITH BASELINES
Baseline Results (Held-out Interactions):
  Popularity: {'Recall@5': 0.0650887573964497, 'Recall@10': 0.09467455621301775, 'Recall@20': 0.1242603550295858}
  ItemKNN: {'Recall@5': 0.0, 'Recall@10': 0.0, 'Recall@20': 0.011834319526627219}
  UserKNN: {'Recall@5': 0.029585798816568046, 'Recall@10': 0.029585798816568046, 'Recall@20': 0.047337278106508875}
  MatrixFactorization: {'Recall@5': 0.0, 'Recall@10': 0.005917159763313609, 'Recall@20': 0.005917159763313609}

Best baseline (Popularity): Recall@10 = 0.0947
Two-Stage Ranker: Recall@10 = 0.8063
Improvement: +751.60%


## 9. Results Saving

- Persist final test metrics and model configuration to JSON  


In [9]:
results = {
    'model_config': CFG,
    'evaluation_results': {
        'test_recall': test_recall,
        'test_ndcg': test_ndcg,
        'eval_topk': CFG['eval_topk']
    },
    'data_stats': {
        'n_users': len(customer_map),
        'n_items': len(item_map),
        'train_sequences': len(seq_train),
        'val_sequences': len(seq_val),
        'test_sequences': len(seq_test)
    }
}
with open(OUT_DIR/'ranker_results.json', 'w') as f:
    json.dump(results, f, indent=2)
print(f"\nSaved results to {OUT_DIR/'ranker_results.json'}")



Saved results to ../data/processed/jarir/ranker_results.json
