TransformerConv (Neighbor Sampling) for Ranking on Graph3 (Goodbooks-10k)

In this notebook we train a Transformer-like GNN for recommendations using PyG TransformerConv with mini-batch neighbor sampling on the Goodbooks-10k interaction graph.

Motivation

R-GCN (BCE) failed due to an objective–metric mismatch.

Sampling-based models with BPR restored ranking performance.

GAT + BPR outperformed GraphSAGE, suggesting attention helps.

Here we test TransformerConv, which is an attention-style message passing layer inspired by Transformers.

Setup

Graph: start with user–book bipartite graph (train interactions only) to keep comparisons fair.

Training: NeighborLoader + BPR loss.

Evaluation: LOO candidate-based ranking:

Hit@10/20/50, NDCG@10/20/50

C=1000 (main) and optionally C=2000 (stricter)

Artifacts

We load splits from the frozen Graph3 bundle:
D:/ML/GNN/graph_recsys/artifacts/v2_proper/graph3_bundle/

In [1]:
# ============================
# Cell 1: Imports + device + paths
# ============================

import os
import json
import math
import numpy as np
import pandas as pd
import torch

from pathlib import Path
from tqdm.auto import tqdm

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("DEVICE:", DEVICE)

PROJECT_ROOT = Path(r"D:/ML/GNN/graph_recsys")
ARTIFACTS = PROJECT_ROOT / "artifacts" / "v2_proper"
BUNDLE_DIR = ARTIFACTS / "graph3_bundle"

print("BUNDLE_DIR:", BUNDLE_DIR)
assert BUNDLE_DIR.exists(), f"Missing bundle: {BUNDLE_DIR}"

DEVICE: cuda
BUNDLE_DIR: D:\ML\GNN\graph_recsys\artifacts\v2_proper\graph3_bundle


In [2]:
# ============================
# Cell 2: Load LOO splits
# Purpose:
# - Load train interactions and LOO validation/test ground truth
# ============================

z = np.load(BUNDLE_DIR / "splits_ui.npz", allow_pickle=True)

train_ui = z["train_ui"].astype(np.int64)
val_ui   = z["val_ui"].astype(np.int64)
test_ui  = z["test_ui"].astype(np.int64)

U = int(z["U"])
B = int(z["B"])

print("train/val/test:", train_ui.shape, val_ui.shape, test_ui.shape)
print("U, B:", U, B)

train/val/test: (4926384, 2) (53398, 2) (53398, 2)
U, B: 53398 9999


In [3]:
# ============================
# Cell 3: Build train positives + LOO ground truth
# Purpose:
# - train_pos: what items each user interacted with in train
# - val_gt/test_gt: leave-one-out item per user
# - leak check: gt must NOT be in train_pos
# ============================

from collections import defaultdict

train_pos = defaultdict(set)
for u, i in train_ui:
    train_pos[int(u)].add(int(i))

val_gt  = {int(u): int(i) for u, i in val_ui}
test_gt = {int(u): int(i) for u, i in test_ui}

leaks_val = sum(1 for u, i in val_gt.items() if i in train_pos[u])
leaks_test = sum(1 for u, i in test_gt.items() if i in train_pos[u])

print("train_pos users:", len(train_pos))
print("val_gt:", len(val_gt), "test_gt:", len(test_gt))
print("[val] leaks:", leaks_val, "/", len(val_gt))
print("[test] leaks:", leaks_test, "/", len(test_gt))

train_pos users: 53398
val_gt: 53398 test_gt: 53398
[val] leaks: 0 / 53398
[test] leaks: 0 / 53398


In [4]:
# ============================
# Cell 4: Build user-book edge_index
# Notes:
# - user nodes: [0..U-1]
# - item nodes: [U..U+B-1]
# - undirected edges (u<->i) as required by message passing
# ============================

u = torch.from_numpy(train_ui[:, 0]).long()
i = torch.from_numpy(train_ui[:, 1]).long() + U

row = torch.cat([u, i], dim=0)
col = torch.cat([i, u], dim=0)

edge_index_ui = torch.stack([row, col], dim=0)
num_nodes_ui = U + B

print("edge_index_ui:", tuple(edge_index_ui.shape), "num_nodes_ui:", num_nodes_ui)

edge_index_ui: (2, 9852768) num_nodes_ui: 63397


In [5]:
# ============================
# Cell 5: PyG Data + NeighborLoader config
# Notes:
# - TransformerConv can be heavier than GAT depending on heads/dropout
# - Start with moderate batch size and small number of heads
# ============================

from torch_geometric.data import Data
from torch_geometric.loader import NeighborLoader

CFG = {
    "embedding_dim": 64,
    "num_layers": 2,
    "heads": 2,                 # multi-head attention
    "dropout": 0.1,
    "batch_size_users": 512,    # safer default (TransformerConv can be heavy)
    "neighbors": [15, 10],
    "lr": 1e-3,
    "weight_decay": 1e-6,
    "epochs": 30,
    "bpr_reg": 1e-6,
    "seed": 42,
    "patience": 5,
    "min_delta": 1e-4,
}
CFG

{'embedding_dim': 64,
 'num_layers': 2,
 'heads': 2,
 'dropout': 0.1,
 'batch_size_users': 512,
 'neighbors': [15, 10],
 'lr': 0.001,
 'weight_decay': 1e-06,
 'epochs': 30,
 'bpr_reg': 1e-06,
 'seed': 42,
 'patience': 5,
 'min_delta': 0.0001}

In [6]:
# ============================
# Cell 5b: Instantiate NeighborLoader + smoke batch
# ============================

SEED = CFG["seed"]
torch.manual_seed(SEED)
np.random.seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(SEED)

data_ui = Data(edge_index=edge_index_ui, num_nodes=num_nodes_ui)

train_user_nodes = torch.arange(U, dtype=torch.long)

train_loader = NeighborLoader(
    data_ui,
    input_nodes=train_user_nodes,
    num_neighbors=CFG["neighbors"],
    batch_size=CFG["batch_size_users"],
    shuffle=True,
    num_workers=0,
    persistent_workers=False
)

batch = next(iter(train_loader))
print(batch)
print("batch.num_nodes:", batch.num_nodes, "batch.edge_index:", batch.edge_index.shape)

Data(edge_index=[2, 45090], num_nodes=30042, n_id=[30042], e_id=[45090], num_sampled_nodes=[3], num_sampled_edges=[2], input_id=[512], batch_size=512)
batch.num_nodes: 30042 batch.edge_index: torch.Size([2, 45090])


In [7]:
# ============================
# Cell 6: Random positive sampling + negative sampling + BPR loss
# ============================

train_pos_arr = {}
train_pos_size = np.zeros(U, dtype=np.int32)

for uu in range(U):
    arr = np.fromiter(train_pos[uu], dtype=np.int64)
    train_pos_arr[uu] = arr
    train_pos_size[uu] = arr.size

print("min/mean/max train_pos size:", train_pos_size.min(), train_pos_size.mean(), train_pos_size.max())

rng = np.random.default_rng(SEED)

def sample_positives(users_np: np.ndarray):
    pos = np.empty_like(users_np, dtype=np.int64)
    for idx, uu in enumerate(users_np):
        arr = train_pos_arr[int(uu)]
        pos[idx] = int(arr[rng.integers(0, arr.size)])
    return pos

def sample_negatives(users_np: np.ndarray):
    neg = np.empty_like(users_np, dtype=np.int64)
    for idx, uu in enumerate(users_np):
        seen = train_pos[int(uu)]
        while True:
            j = int(rng.integers(0, B))
            if j not in seen:
                neg[idx] = j
                break
    return neg

def bpr_loss(u_emb, p_emb, n_emb, reg=0.0):
    pos_scores = (u_emb * p_emb).sum(dim=-1)
    neg_scores = (u_emb * n_emb).sum(dim=-1)
    loss = -torch.log(torch.sigmoid(pos_scores - neg_scores) + 1e-8).mean()
    if reg > 0:
        loss = loss + reg * (u_emb.pow(2).mean() + p_emb.pow(2).mean() + n_emb.pow(2).mean())
    return loss

min/mean/max train_pos size: 3 92.25783737218623 197


In [8]:
# ============================
# Cell 7: TransformerConv recommender model
# Notes:
# - concat=False keeps output dim == embedding_dim (no heads*dim explosion)
# - We use ELU + dropout similar to GAT experiments
# ============================

import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import TransformerConv

class TransformerRec(nn.Module):
    def __init__(self, num_nodes: int, dim: int, num_layers: int, heads: int, dropout: float):
        super().__init__()
        self.num_nodes = num_nodes
        self.dim = dim
        self.dropout = dropout

        self.emb = nn.Embedding(num_nodes, dim)
        nn.init.normal_(self.emb.weight, std=0.1)

        self.convs = nn.ModuleList()
        for _ in range(num_layers):
            self.convs.append(
                TransformerConv(
                    in_channels=dim,
                    out_channels=dim,
                    heads=heads,
                    concat=False,
                    dropout=dropout,
                    beta=True  # allows residual weighting (often helps)
                )
            )

    def forward(self, n_id, edge_index):
        h = self.emb(n_id)
        for conv in self.convs:
            h = conv(h, edge_index)
            h = F.elu(h)
            h = F.dropout(h, p=self.dropout, training=self.training)
        return h

model = TransformerRec(
    num_nodes=num_nodes_ui,
    dim=CFG["embedding_dim"],
    num_layers=CFG["num_layers"],
    heads=CFG["heads"],
    dropout=CFG["dropout"],
).to(DEVICE)

optimizer = torch.optim.Adam(model.parameters(), lr=CFG["lr"], weight_decay=CFG["weight_decay"])

print(model)

TransformerRec(
  (emb): Embedding(63397, 64)
  (convs): ModuleList(
    (0-1): 2 x TransformerConv(64, 64, heads=2)
  )
)


In [9]:
# ============================
# Cell 8: Train one epoch (NeighborLoader + BPR)
# Notes:
# - Same approach as GraphSAGE/GAT notebooks for fair comparison.
# - Fallback to raw embeddings if pos/neg nodes are not in the sampled subgraph.
# ============================

def train_one_epoch():
    model.train()
    total_loss = 0.0
    steps = 0

    for batch in tqdm(train_loader, desc="train"):
        batch = batch.to(DEVICE)
        n_id = batch.n_id
        seed_users = batch.input_id
        users_np = seed_users.detach().cpu().numpy()

        pos_items = sample_positives(users_np)
        neg_items = sample_negatives(users_np)

        pos_nodes = torch.from_numpy(pos_items).long().to(DEVICE) + U
        neg_nodes = torch.from_numpy(neg_items).long().to(DEVICE) + U

        h = model(n_id, batch.edge_index)

        idx_map = torch.full((num_nodes_ui,), -1, device=DEVICE, dtype=torch.long)
        idx_map[n_id] = torch.arange(n_id.size(0), device=DEVICE)

        u_loc = idx_map[seed_users]
        p_loc = idx_map[pos_nodes]
        n_loc = idx_map[neg_nodes]

        u_emb = h[u_loc]

        def get_item_emb(loc_idx, global_nodes):
            mask = loc_idx >= 0
            out = torch.empty((global_nodes.size(0), CFG["embedding_dim"]), device=DEVICE)
            out[mask] = h[loc_idx[mask]]
            out[~mask] = model.emb(global_nodes[~mask])  # fallback
            return out

        p_emb = get_item_emb(p_loc, pos_nodes)
        n_emb = get_item_emb(n_loc, neg_nodes)

        loss = bpr_loss(u_emb, p_emb, n_emb, reg=CFG["bpr_reg"])

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_loss += float(loss.detach().cpu())
        steps += 1

    return total_loss / max(1, steps)

In [10]:
# ============================
# Cell 9: Candidate-based LOO evaluation (fast)
# Notes:
# - Uses raw embedding table for speed (as in previous notebooks)
# - Main comparison signal: NDCG@10 with C=1000
# ============================

@torch.no_grad()
def eval_loo_sampled(model, gt_dict, users_subset, C=1000, Ks=(10, 20, 50), seed=42):
    model.eval()
    rng_local = np.random.default_rng(seed)

    hits = {k: 0 for k in Ks}
    ndcgs = {k: 0.0 for k in Ks}

    emb = model.emb.weight.detach()

    for u in tqdm(users_subset, desc=f"eval(C={C})"):
        gt = int(gt_dict[int(u)])

        negs = []
        seen = train_pos[int(u)]
        while len(negs) < C - 1:
            j = int(rng_local.integers(0, B))
            if (j not in seen) and (j != gt):
                negs.append(j)

        cand_items = np.array([gt] + negs, dtype=np.int64)
        cand_nodes = torch.from_numpy(cand_items).long().to(DEVICE) + U

        u_node = torch.tensor([int(u)], device=DEVICE, dtype=torch.long)
        u_vec = emb[u_node]
        i_vec = emb[cand_nodes]
        scores = (u_vec * i_vec).sum(dim=-1)

        rank = torch.argsort(scores, descending=True)
        gt_pos = (rank == 0).nonzero(as_tuple=False).item()

        for k in Ks:
            if gt_pos < k:
                hits[k] += 1
                ndcgs[k] += 1.0 / math.log2(gt_pos + 2)

    n = len(users_subset)
    out = {f"Hit@{k}": hits[k] / n for k in Ks}
    out.update({f"NDCG@{k}": ndcgs[k] / n for k in Ks})
    return out

subset_2k = np.random.default_rng(SEED).choice(np.arange(U), size=2000, replace=False)
subset_10k = np.random.default_rng(SEED).choice(np.arange(U), size=10000, replace=False)

print("Smoke eval (C=200):", eval_loo_sampled(model, val_gt, subset_2k, C=200, Ks=(10,20,50), seed=SEED))

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

Smoke eval (C=200): {'Hit@10': 0.055, 'Hit@20': 0.1035, 'Hit@50': 0.2595, 'NDCG@10': 0.025147360479841042, 'NDCG@20': 0.03740782827916556, 'NDCG@50': 0.06784492906237218}


In [11]:
# ============================
# Cell 10: Train up to 30 epochs + early stopping on val NDCG@10 (C=1000)
# ============================

from dataclasses import dataclass

@dataclass
class EarlyStopper:
    patience: int = 5
    min_delta: float = 1e-4
    best: float = -1e9
    best_epoch: int = -1
    bad_count: int = 0
    best_state: dict = None

    def step(self, metric_value: float, model: torch.nn.Module, epoch: int) -> bool:
        improved = metric_value > (self.best + self.min_delta)
        if improved:
            self.best = metric_value
            self.best_epoch = epoch
            self.bad_count = 0
            self.best_state = {k: v.detach().cpu().clone() for k, v in model.state_dict().items()}
        else:
            self.bad_count += 1
        return self.bad_count >= self.patience

    def load_best(self, model: torch.nn.Module, device=DEVICE):
        model.load_state_dict({k: v.to(device) for k, v in self.best_state.items()})

EARLY = EarlyStopper(patience=CFG["patience"], min_delta=CFG["min_delta"])

history = []
for ep in range(1, CFG["epochs"] + 1):
    loss = train_one_epoch()

    m200 = eval_loo_sampled(model, val_gt, subset_2k, C=200, Ks=(10,20,50), seed=SEED)
    m1000 = eval_loo_sampled(model, val_gt, subset_10k, C=1000, Ks=(10,20,50), seed=SEED)
    val_ndcg10 = float(m1000["NDCG@10"])

    history.append({"epoch": ep, "loss": loss,
                    **{f"val200_{k}": v for k, v in m200.items()},
                    **{f"val1000_{k}": v for k, v in m1000.items()}})

    print(
        f"epoch={ep:02d} loss={loss:.4f} | "
        f"val(C=200) NDCG@10={m200['NDCG@10']:.5f} | "
        f"val(C=1000) NDCG@10={val_ndcg10:.6f}"
    )

    if EARLY.step(val_ndcg10, model, ep):
        print(f"Early stopping at epoch {ep}. Best epoch={EARLY.best_epoch} best NDCG@10={EARLY.best:.6f}")
        break

EARLY.load_best(model, device=DEVICE)
print("Loaded best checkpoint:", EARLY.best_epoch, "best val NDCG@10:", EARLY.best)

train:   0%|          | 0/105 [00:00<?, ?it/s]

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

epoch=01 loss=0.5326 | val(C=200) NDCG@10=0.04031 | val(C=1000) NDCG@10=0.011215


train:   0%|          | 0/105 [00:00<?, ?it/s]

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

epoch=02 loss=0.4658 | val(C=200) NDCG@10=0.05711 | val(C=1000) NDCG@10=0.017288


train:   0%|          | 0/105 [00:00<?, ?it/s]

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

epoch=03 loss=0.4554 | val(C=200) NDCG@10=0.06539 | val(C=1000) NDCG@10=0.020264


train:   0%|          | 0/105 [00:00<?, ?it/s]

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

epoch=04 loss=0.4426 | val(C=200) NDCG@10=0.07072 | val(C=1000) NDCG@10=0.022463


train:   0%|          | 0/105 [00:00<?, ?it/s]

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

epoch=05 loss=0.4434 | val(C=200) NDCG@10=0.07174 | val(C=1000) NDCG@10=0.023806


train:   0%|          | 0/105 [00:00<?, ?it/s]

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

epoch=06 loss=0.4283 | val(C=200) NDCG@10=0.07460 | val(C=1000) NDCG@10=0.026128


train:   0%|          | 0/105 [00:00<?, ?it/s]

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

epoch=07 loss=0.4287 | val(C=200) NDCG@10=0.07900 | val(C=1000) NDCG@10=0.027205


train:   0%|          | 0/105 [00:00<?, ?it/s]

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

epoch=08 loss=0.4210 | val(C=200) NDCG@10=0.08175 | val(C=1000) NDCG@10=0.027982


train:   0%|          | 0/105 [00:00<?, ?it/s]

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

epoch=09 loss=0.4129 | val(C=200) NDCG@10=0.08701 | val(C=1000) NDCG@10=0.030402


train:   0%|          | 0/105 [00:00<?, ?it/s]

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

epoch=10 loss=0.3971 | val(C=200) NDCG@10=0.08883 | val(C=1000) NDCG@10=0.030716


train:   0%|          | 0/105 [00:00<?, ?it/s]

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

epoch=11 loss=0.3862 | val(C=200) NDCG@10=0.09338 | val(C=1000) NDCG@10=0.033329


train:   0%|          | 0/105 [00:00<?, ?it/s]

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

epoch=12 loss=0.3751 | val(C=200) NDCG@10=0.09626 | val(C=1000) NDCG@10=0.033771


train:   0%|          | 0/105 [00:00<?, ?it/s]

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

epoch=13 loss=0.3708 | val(C=200) NDCG@10=0.09726 | val(C=1000) NDCG@10=0.033391


train:   0%|          | 0/105 [00:00<?, ?it/s]

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

epoch=14 loss=0.3615 | val(C=200) NDCG@10=0.09587 | val(C=1000) NDCG@10=0.033835


train:   0%|          | 0/105 [00:00<?, ?it/s]

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

epoch=15 loss=0.3544 | val(C=200) NDCG@10=0.09532 | val(C=1000) NDCG@10=0.032505


train:   0%|          | 0/105 [00:00<?, ?it/s]

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

epoch=16 loss=0.3449 | val(C=200) NDCG@10=0.09625 | val(C=1000) NDCG@10=0.032885


train:   0%|          | 0/105 [00:00<?, ?it/s]

eval(C=200):   0%|          | 0/2000 [00:00<?, ?it/s]

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

epoch=17 loss=0.3492 | val(C=200) NDCG@10=0.09470 | val(C=1000) NDCG@10=0.033470
Early stopping at epoch 17. Best epoch=12 best NDCG@10=0.033771
Loaded best checkpoint: 12 best val NDCG@10: 0.03377127617123508


In [12]:
# ============================
# Cell 11: Final TEST evaluation
# ============================

test_subset_10k = np.random.default_rng(SEED + 123).choice(np.arange(U), size=10000, replace=False)

test_m1000 = eval_loo_sampled(model, test_gt, test_subset_10k, C=1000, Ks=(10,20,50), seed=SEED + 123)
print("TEST (C=1000, 10k users):", test_m1000)

test_m2000 = eval_loo_sampled(model, test_gt, test_subset_10k, C=2000, Ks=(10,20,50), seed=SEED + 123)
print("TEST (C=2000, 10k users):", test_m2000)

eval(C=1000):   0%|          | 0/10000 [00:00<?, ?it/s]

TEST (C=1000, 10k users): {'Hit@10': 0.0672, 'Hit@20': 0.1064, 'Hit@50': 0.1794, 'NDCG@10': 0.033345572142068285, 'NDCG@20': 0.04318613558125753, 'NDCG@50': 0.057575686738235994}


eval(C=2000):   0%|          | 0/10000 [00:00<?, ?it/s]

TEST (C=2000, 10k users): {'Hit@10': 0.0412, 'Hit@20': 0.0659, 'Hit@50': 0.1213, 'NDCG@10': 0.020895186405055798, 'NDCG@20': 0.027123912058835215, 'NDCG@50': 0.03805636473930741}


In [13]:
# ============================
# Cell 12: Save run artifacts
# ============================

hist_df = pd.DataFrame(history)
out_dir = ARTIFACTS / "ablation_runs" / "transformerconv_sampling"
out_dir.mkdir(parents=True, exist_ok=True)

hist_path = out_dir / "history_transformerconv_bpr_sampled_eval.csv"
hist_df.to_csv(hist_path, index=False)

meta = {
    "best_epoch": EARLY.best_epoch,
    "best_val_ndcg10_C1000_10k": float(EARLY.best),
    "config": CFG,
    "bundle_dir": str(BUNDLE_DIR),
}
with open(out_dir / "run_meta.json", "w", encoding="utf-8") as f:
    json.dump(meta, f, ensure_ascii=False, indent=2)

ckpt_path = out_dir / "transformerconv_bpr_best.pt"
torch.save({"state_dict": EARLY.best_state, "meta": meta}, ckpt_path)

print("Saved:", hist_path)
print("Saved:", ckpt_path)
print("Saved:", out_dir / "run_meta.json")

Saved: D:\ML\GNN\graph_recsys\artifacts\v2_proper\ablation_runs\transformerconv_sampling\history_transformerconv_bpr_sampled_eval.csv
Saved: D:\ML\GNN\graph_recsys\artifacts\v2_proper\ablation_runs\transformerconv_sampling\transformerconv_bpr_best.pt
Saved: D:\ML\GNN\graph_recsys\artifacts\v2_proper\ablation_runs\transformerconv_sampling\run_meta.json


## Results & Conclusions
### What we did

Trained a TransformerConv-based recommender on the user–book bipartite graph using:

Neighbor sampling (mini-batch training with NeighborLoader)

BPR loss (pairwise ranking objective aligned with recommendation metrics)

Evaluated with leave-one-out (LOO) candidate-based ranking:

Hit@10/20/50 and NDCG@10/20/50

candidates: C=1000 (main) and C=2000 (stricter)

### Key results

Validation improved quickly and peaked early.

Early stopping selected the best checkpoint at epoch 12:

Val NDCG@10 (C=1000, 10k users): 0.03377

Final test performance (10k users):

TEST (C=1000): Hit@10 = 0.0672, NDCG@10 = 0.03335

TEST (C=2000): Hit@10 = 0.0412, NDCG@10 = 0.02090

### Interpretation

TransformerConv provides a clear ranking signal and outperforms simpler sampling baselines (e.g., GraphSAGE) in this setup.

However, it underperforms the GAT + BPR model trained under the same protocol, suggesting that attention in GAT is more effective here (or that TransformerConv needs different tuning / training fidelity).

### Limitations

Candidate-based evaluation (not full-ranking over all items).

Training uses a scalability workaround: some sampled positives/negatives may fall outside sampled subgraphs and fall back to raw embeddings.

### Conclusion

TransformerConv + BPR + neighbor sampling is a viable ranking model, but GAT remains the strongest architecture among the tested sampling-based GNNs under this evaluation protocol.