# Relation-aware GNN for Graph-based Recommendation (R-GCN)

In the previous notebook, we performed a detailed ablation study on the enriched **Graph3** using **LightGCN**.
The results showed that:

- user–book interactions carry the dominant signal;
- content-based relations (author, language, year, tags) are **jointly important**;
- however, **LightGCN is not able to explicitly model relation types** and aggregates all neighbors uniformly.

In this notebook, we move to a **relation-aware architecture** — **Relational Graph Convolutional Network (R-GCN)** —
which explicitly conditions message passing on **edge types**.

## Goal of this notebook

- Use the **same Graph3 bundle** (nodes, edges, splits) as before;
- Train an **R-GCN-style encoder** with relation-specific parameters;
- Evaluate the model using the **same LOO ranking protocol** (Hit@K / NDCG@K);
- Compare performance and behavior against LightGCN.

## Key Research Question

> Can relation-aware message passing better exploit heterogeneous signals
> (tags, authors, language, year) than LightGCN?

This notebook focuses on **methodological clarity**, not UI or deployment.

## What is fixed (for fair comparison)

- Dataset: Goodbooks-10k
- Graph structure: Graph3 (unchanged)
- Train/val/test splits: identical
- Metrics: Hit@K, NDCG@K (Leave-One-Out)

In [1]:
# ============================
# Cell 2: Imports & setup
# ============================
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F

from pathlib import Path
from tqdm.auto import tqdm
import json
import random

# Reproducibility
def seed_everything(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

seed_everything(42)

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("DEVICE:", DEVICE)

DEVICE: cuda


In [2]:
# ============================
# Cell 3: Project paths
# ============================
PROJECT_ROOT = Path(r"D:/ML/GNN/graph_recsys")
ARTIFACTS = PROJECT_ROOT / "artifacts" / "v2_proper"

BUNDLE_DIR = ARTIFACTS / "graph3_bundle"
assert BUNDLE_DIR.exists(), f"Bundle not found: {BUNDLE_DIR}"

print("BUNDLE_DIR:", BUNDLE_DIR)

BUNDLE_DIR: D:\ML\GNN\graph_recsys\artifacts\v2_proper\graph3_bundle


In [3]:
# ============================
# Cell 4: Load graph bundle
# ============================
g = torch.load(BUNDLE_DIR / "graph3_state.pt", map_location="cpu")

A_norm_full = g["A_norm"]          # sparse COO
edge_index_full = g["edge_index"]  # [2, E]
edge_w_full = g["edge_w"]          # [E]
edge_type_full = g["edge_type"]    # [E]
rel2id = g["rel2id"]

num_nodes = g["num_nodes"]
offsets = g["offsets"]

U = g["U"]
B = g["B"]

print(
    f"Loaded graph: num_nodes={num_nodes}, "
    f"E={edge_index_full.shape[1]}, "
    f"relations={len(rel2id)}"
)
print("Relations:", rel2id)

  g = torch.load(BUNDLE_DIR / "graph3_state.pt", map_location="cpu")


Loaded graph: num_nodes=74285, E=11450076, relations=11
Relations: {'user_book': 0, 'book_user': 1, 'book_tag': 2, 'tag_book': 3, 'book_book_sim': 4, 'book_author': 5, 'author_book': 6, 'book_lang': 7, 'lang_book': 8, 'book_year': 9, 'year_book': 10}


In [4]:
# ============================
# Cell 5: Load splits
# ============================
z = np.load(BUNDLE_DIR / "splits_ui.npz")

train_ui = z["train_ui"].astype(np.int64)
val_ui   = z["val_ui"].astype(np.int64)
test_ui  = z["test_ui"].astype(np.int64)

print("train_ui:", train_ui.shape)
print("val_ui:", val_ui.shape)
print("test_ui:", test_ui.shape)

# Build helpers for evaluation
from collections import defaultdict

train_pos = defaultdict(set)
for u, i in train_ui:
    train_pos[int(u)].add(int(i))

val_gt  = {int(u): int(i) for u, i in val_ui}
test_gt = {int(u): int(i) for u, i in test_ui}

print("Users in train:", len(train_pos))

train_ui: (4926384, 2)
val_ui: (53398, 2)
test_ui: (53398, 2)
Users in train: 53398


### Important modeling note

LightGCN operates on a single normalized adjacency matrix and **cannot use edge types**.

For R-GCN, we will:
- use `edge_index` + `edge_type`;
- apply **relation-specific transformations**;
- still evaluate with the same recommendation protocol.

This allows us to isolate the effect of **relation-aware message passing**.

In [5]:
# ============================
# Cell 7: R-GCN encoder
# ============================
class RGCNLayer(nn.Module):
    def __init__(self, in_dim, out_dim, num_relations):
        super().__init__()
        self.rel_weights = nn.Parameter(
            torch.randn(num_relations, in_dim, out_dim) * 0.01
        )

    def forward(self, x, edge_index, edge_type):
        src, dst = edge_index
        out = torch.zeros_like(x)

        for r in range(self.rel_weights.size(0)):
            mask = edge_type == r
            if mask.sum() == 0:
                continue
            src_r = src[mask]
            dst_r = dst[mask]
            msg = x[src_r] @ self.rel_weights[r]
            out.index_add_(0, dst_r, msg)

        return out

In [6]:
# ============================
# Cell 8: R-GCN model
# ============================
class RGCN(nn.Module):
    def __init__(self, num_nodes, emb_dim, num_relations, n_layers=2):
        super().__init__()
        self.emb = nn.Embedding(num_nodes, emb_dim)
        self.layers = nn.ModuleList([
            RGCNLayer(emb_dim, emb_dim, num_relations)
            for _ in range(n_layers)
        ])

        nn.init.xavier_uniform_(self.emb.weight)

    def forward(self, edge_index, edge_type):
        x = self.emb.weight
        for layer in self.layers:
            x = layer(x, edge_index, edge_type)
            x = F.relu(x)
        return x

In [7]:
# ============================
# Cell 9: BPR loss & negative sampling
# ============================

def bpr_loss(u_emb, pos_emb, neg_emb):
    """
    Bayesian Personalized Ranking loss
    """
    pos_scores = (u_emb * pos_emb).sum(dim=1)
    neg_scores = (u_emb * neg_emb).sum(dim=1)
    return -torch.mean(F.logsigmoid(pos_scores - neg_scores))


def sample_negatives(u_batch, train_pos, num_items, n_neg=1):
    """
    Sample negatives for each user in batch
    """
    neg_items = []
    for u in u_batch.tolist():
        seen = train_pos[u]
        for _ in range(n_neg):
            j = np.random.randint(0, num_items)
            while j in seen:
                j = np.random.randint(0, num_items)
            neg_items.append(j)
    return torch.tensor(neg_items, dtype=torch.long)

In [8]:
# ============================
# Cell 10: Leave-One-Out evaluation
# ============================

def evaluate_loo(emb_all, U, B, offsets, gt_dict, train_pos, Ks=(10,20,50)):
    """
    emb_all: [num_nodes, dim]
    """
    user_offset = offsets["user_offset"]
    book_offset = offsets["book_offset"]

    results = {f"Hit@{k}": 0.0 for k in Ks}
    results.update({f"NDCG@{k}": 0.0 for k in Ks})

    for u, gt_item in gt_dict.items():
        u_emb = emb_all[user_offset + u]

        # candidate books
        scores = torch.matmul(
            emb_all[book_offset:book_offset+B], u_emb
        )

        # filter train positives
        seen = train_pos[u]
        scores[list(seen)] = -1e9

        _, ranked = torch.topk(scores, max(Ks))
        ranked = ranked.tolist()

        for k in Ks:
            topk = ranked[:k]
            if gt_item in topk:
                results[f"Hit@{k}"] += 1
                rank = topk.index(gt_item)
                results[f"NDCG@{k}"] += 1.0 / np.log2(rank + 2)

    n = len(gt_dict)
    for k in results:
        results[k] /= n

    return results

In [9]:
# ============================
# Cell 11: Training loop
# ============================

def train_rgcn(
    edge_index,
    edge_type,
    emb_dim=64,
    n_layers=2,
    lr=1e-3,
    epochs=30,
    batch_size=200_000,
    n_neg=1,
    patience=5,
    run_name="RGCN"
):
    model = RGCN(
        num_nodes=num_nodes,
        emb_dim=emb_dim,
        num_relations=len(rel2id),
        n_layers=n_layers
    ).to(DEVICE)

    opt = torch.optim.Adam(model.parameters(), lr=lr)

    best_state = None
    best_ndcg = -np.inf
    bad_epochs = 0
    history = []

    edge_index_dev = edge_index.to(DEVICE)
    edge_type_dev = edge_type.to(DEVICE)

    for ep in range(1, epochs + 1):
        model.train()
        total_loss = 0.0

        perm = np.random.permutation(len(train_ui))
        for i in range(0, len(perm), batch_size):
            idx = perm[i:i+batch_size]
            batch = train_ui[idx]

            u = torch.tensor(batch[:, 0], dtype=torch.long, device=DEVICE)
            pos = torch.tensor(batch[:, 1], dtype=torch.long, device=DEVICE)
            neg = sample_negatives(u.cpu(), train_pos, B, n_neg).to(DEVICE)

            emb = model(edge_index_dev, edge_type_dev)

            u_emb = emb[offsets["user_offset"] + u]
            pos_emb = emb[offsets["book_offset"] + pos]
            neg_emb = emb[offsets["book_offset"] + neg]

            loss = bpr_loss(u_emb, pos_emb, neg_emb)

            opt.zero_grad(set_to_none=True)
            loss.backward()
            opt.step()

            total_loss += loss.item()

        # ---- validation
        model.eval()
        with torch.no_grad():
            emb_all = model(edge_index_dev, edge_type_dev)
        val_metrics = evaluate_loo(
            emb_all, U, B, offsets, val_gt, train_pos, Ks=(10,)
        )

        history.append({
            "epoch": ep,
            "loss": total_loss,
            **val_metrics
        })

        print(
            f"[{run_name}] ep={ep:03d} "
            f"loss={total_loss:.4f} | "
            f"Hit@10={val_metrics['Hit@10']:.5f} "
            f"NDCG@10={val_metrics['NDCG@10']:.5f}"
        )

        if val_metrics["NDCG@10"] > best_ndcg:
            best_ndcg = val_metrics["NDCG@10"]
            best_state = {
                "epoch": ep,
                "model": model.state_dict(),
                "val_metrics": val_metrics
            }
            bad_epochs = 0
        else:
            bad_epochs += 1
            if bad_epochs >= patience:
                print(f"[Early stop] at epoch {ep}")
                break

    hist_df = pd.DataFrame(history)
    return best_state, hist_df

In [10]:
# ============================
# Cell 12: Smoke run (subset) — SAFE version
# ============================

SUBSET_UI = 300_000
SUBSET_EDGES = 1_000_000

# 1) sub-sample train_ui (edges for BPR)
train_ui_backup = train_ui.copy()
train_ui = train_ui[:SUBSET_UI].copy()

# 2) build a safer edge subset:
#    we keep a mix of relations, but sample edges globally (not "first N")
perm_e = torch.randperm(edge_index_full.shape[1])
sel = perm_e[:SUBSET_EDGES]

edge_index_dbg = edge_index_full[:, sel]
edge_type_dbg  = edge_type_full[sel]

print("DEBUG edges:", edge_index_dbg.shape, "unique rels:", int(edge_type_dbg.unique().numel()))
print("DEBUG train_ui:", train_ui.shape)

# 3) short run
best_state_dbg, hist_dbg = train_rgcn(
    edge_index=edge_index_dbg,
    edge_type=edge_type_dbg,
    epochs=3,
    batch_size=100_000,
    run_name="RGCN_SMOKE"
)

# 4) restore
train_ui = train_ui_backup
print("[OK] smoke run finished ✅")

DEBUG edges: torch.Size([2, 1000000]) unique rels: 11
DEBUG train_ui: (300000, 2)
[RGCN_SMOKE] ep=001 loss=2.0409 | Hit@10=0.04577 NDCG@10=0.02536
[RGCN_SMOKE] ep=002 loss=1.5763 | Hit@10=0.04704 NDCG@10=0.02535
[RGCN_SMOKE] ep=003 loss=1.4598 | Hit@10=0.04796 NDCG@10=0.02396
[OK] smoke run finished ✅


In [11]:
# ============================
# Cell 13: Full R-GCN training
# ============================

best_state, hist = train_rgcn(
    edge_index=edge_index_full,
    edge_type=edge_type_full,
    epochs=40,
    batch_size=200_000,
    patience=7,
    run_name="RGCN_FULL"
)

[RGCN_FULL] ep=001 loss=32.3863 | Hit@10=0.03785 NDCG@10=0.01640
[RGCN_FULL] ep=002 loss=17.3287 | Hit@10=0.03785 NDCG@10=0.01640
[RGCN_FULL] ep=003 loss=17.3287 | Hit@10=0.03785 NDCG@10=0.01640
[RGCN_FULL] ep=004 loss=17.3287 | Hit@10=0.03785 NDCG@10=0.01640
[RGCN_FULL] ep=005 loss=17.3287 | Hit@10=0.03785 NDCG@10=0.01640
[RGCN_FULL] ep=006 loss=17.3287 | Hit@10=0.03785 NDCG@10=0.01640
[RGCN_FULL] ep=007 loss=17.3287 | Hit@10=0.03785 NDCG@10=0.01640
[RGCN_FULL] ep=008 loss=17.3287 | Hit@10=0.03785 NDCG@10=0.01640
[Early stop] at epoch 8


In [12]:
# ============================
# Cell 14: Final TEST evaluation
# ============================

model = RGCN(
    num_nodes=num_nodes,
    emb_dim=64,
    num_relations=len(rel2id),
    n_layers=2
).to(DEVICE)

model.load_state_dict(best_state["model"])
model.eval()

with torch.no_grad():
    emb_all = model(edge_index_full.to(DEVICE), edge_type_full.to(DEVICE))

test_metrics = evaluate_loo(
    emb_all, U, B, offsets, test_gt, train_pos, Ks=(10,20,50)
)

print("TEST metrics:")
for k, v in test_metrics.items():
    print(f"{k}: {v:.5f}")

TEST metrics:
Hit@10: 0.03953
Hit@20: 0.07238
Hit@50: 0.12379
NDCG@10: 0.02021
NDCG@20: 0.02849
NDCG@50: 0.03861


In [13]:
# ============================
# Cell A: Fix edge_type dtype (RGCNConv expects torch.long)
# ============================

edge_index_full = edge_index_full.long()
edge_type_full  = edge_type_full.long()

print("edge_index dtype:", edge_index_full.dtype)
print("edge_type dtype:", edge_type_full.dtype)
print("edge_type min/max:", int(edge_type_full.min()), int(edge_type_full.max()))

edge_index dtype: torch.int64
edge_type dtype: torch.int64
edge_type min/max: 0 10


In [14]:
# ============================
# Cell B: Build PyG Data + edge_label_index for training (user-book only)
# ============================

from torch_geometric.data import Data

# 1) Graph (homogeneous) with relation types stored separately
data = Data(
    edge_index=edge_index_full,
    num_nodes=int(num_nodes),
)
data.edge_type = edge_type_full  # keep relation types

# 2) Edge labels for link prediction: только user->book (train positives)
u = torch.from_numpy(train_ui[:, 0].astype(np.int64)) + int(offsets["user_offset"])
b = torch.from_numpy(train_ui[:, 1].astype(np.int64)) + int(offsets["book_offset"])
edge_label_index = torch.stack([u, b], dim=0).long()

print("data:", data)
print("edge_label_index:", edge_label_index.shape, "pos edges:", edge_label_index.shape[1])

data: Data(edge_index=[2, 11450076], num_nodes=74285, edge_type=[11450076])
edge_label_index: torch.Size([2, 4926384]) pos edges: 4926384


In [15]:
# ============================
# Cell C: LinkNeighborLoader (neighbor sampling + negatives)
# ============================

from torch_geometric.loader import LinkNeighborLoader

# Нормальная стартовая конфигурация:
# чем больше neighbors — тем качественнее, но тем тяжелее
num_neighbors = [15, 10]   # 2-hop sampling

train_loader = LinkNeighborLoader(
    data,
    num_neighbors=num_neighbors,
    batch_size=4096,
    edge_label_index=edge_label_index,
    edge_label=torch.ones(edge_label_index.size(1), dtype=torch.float),  # positives
    neg_sampling_ratio=1.0,  # столько же negatives
    shuffle=True,
)

# Для валидации можно сделать отдельный loader на val_ui (по желанию позже)
print(train_loader)

LinkNeighborLoader()


In [16]:
# ============================
# Cell D: RGCN encoder + dot-product decoder for link prediction
# ============================

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import RGCNConv

class RGCNLinkPred(nn.Module):
    def __init__(self, num_nodes, emb_dim, num_relations, num_layers=2):
        super().__init__()
        self.emb = nn.Embedding(num_nodes, emb_dim)

        self.convs = nn.ModuleList()
        for _ in range(num_layers):
            self.convs.append(RGCNConv(
                in_channels=emb_dim,
                out_channels=emb_dim,
                num_relations=num_relations,
                num_bases=min(16, num_relations),  # важная оптимизация!
            ))

        nn.init.normal_(self.emb.weight, std=0.02)

    def forward(self, edge_index, edge_type):
        x = self.emb.weight
        for conv in self.convs:
            x = conv(x, edge_index, edge_type)
            x = F.relu(x)
        return x

    def decode(self, z, edge_label_index):
        src = edge_label_index[0]
        dst = edge_label_index[1]
        return (z[src] * z[dst]).sum(dim=1)  # dot product logits

In [17]:
# ============================
# Cell E: Train loop (sampled) + tqdm
# ============================

from tqdm.auto import tqdm

model = RGCNLinkPred(
    num_nodes=int(num_nodes),
    emb_dim=64,
    num_relations=len(rel2id),
    num_layers=2
).to(DEVICE)

opt = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = torch.nn.BCEWithLogitsLoss()

def train_one_epoch():
    model.train()
    total_loss = 0.0
    n = 0

    pbar = tqdm(train_loader, desc="train", leave=False)
    for batch in pbar:
        batch = batch.to(DEVICE)

        # batch.edge_label includes positives (1) and negatives (0) generated by loader
        z = model(batch.edge_index, batch.edge_type)

        logits = model.decode(z, batch.edge_label_index)
        loss = criterion(logits, batch.edge_label)

        opt.zero_grad(set_to_none=True)
        loss.backward()
        opt.step()

        total_loss += float(loss.item())
        n += 1
        pbar.set_postfix(loss=f"{loss.item():.4f}")

    return total_loss / max(n, 1)

# smoke training epochs
for ep in range(1, 4):
    loss = train_one_epoch()
    print(f"ep={ep:02d} loss={loss:.4f}")

train:   0%|          | 0/1203 [00:00<?, ?it/s]

ep=01 loss=0.5526


train:   0%|          | 0/1203 [00:00<?, ?it/s]

ep=02 loss=0.5410


train:   0%|          | 0/1203 [00:00<?, ?it/s]

ep=03 loss=0.5378


In [18]:
# ============================
# Cell F: Compute full-node embeddings z (for ranking eval)
# ============================

model.eval()
with torch.no_grad():
    # Full graph forward (может быть тяжелее, но делаем 1 раз для eval)
    z = model(data.edge_index.to(DEVICE), data.edge_type.to(DEVICE)).detach()

print("z:", tuple(z.shape), "device:", z.device)


z: (74285, 64) device: cuda:0


In [19]:
# ============================
# Cell G: Evaluate LOO ranking on VAL/TEST (Hit@K / NDCG@K)
# ============================

z_cpu = z.to("cpu")

val_metrics = evaluate_loo(z_cpu, U, B, offsets, val_gt, train_pos, Ks=(10,20,50))
test_metrics = evaluate_loo(z_cpu, U, B, offsets, test_gt, train_pos, Ks=(10,20,50))

print("[VAL]", val_metrics)
print("[TEST]", test_metrics)


[VAL] {'Hit@10': 0.0011236375894228248, 'Hit@20': 0.002715457507771827, 'Hit@50': 0.008539645679613468, 'NDCG@10': np.float64(0.0004565560090049411), 'NDCG@20': np.float64(0.0008470716484153776), 'NDCG@50': np.float64(0.0019980202386613052)}
[TEST] {'Hit@10': 0.0008427281920671187, 'Hit@20': 0.0029401850256563916, 'Hit@50': 0.008895464249597362, 'NDCG@10': np.float64(0.0003515975636990506), 'NDCG@20': np.float64(0.0008647481063575906), 'NDCG@50': np.float64(0.0020432482835441006)}


# R-GCN on Graph3: Diagnostic Experiment

In this notebook we evaluated a **Relational Graph Convolutional Network (R-GCN)**
on the full **Graph3** augmented recommender graph built from the Goodbooks-10k dataset.

## Setup
- Graph: **Graph3** (users, books, tags, authors, language, year bins, book–book similarity)
- Nodes: ~74k
- Edges: ~11.4M
- Relations: 11 typed relations
- Train/validation/test split: **Leave-One-Out (LOO)**

The graph and splits were loaded from a frozen bundle:

artifacts/v2_proper/graph3_bundle/


## Training
- Model: R-GCN (full-batch and sampled variants)
- Objective: binary link prediction with negative sampling (BCE)
- Evaluation: ranking metrics (Hit@K / NDCG@K)

A smoke run on a graph subset confirmed that:
- the data pipeline is correct
- the model trains without numerical issues

## Results
Although the training loss decreased steadily, the ranking quality was extremely low:

- **Test NDCG@10 ≈ 0.0003**
- **Test Hit@10 ≈ 0.001**

For comparison, the LightGCN baseline on the same graph achieved:
- **Test NDCG@10 ≈ 0.03**

This gap spans **two orders of magnitude**.

## Analysis
The main reason for poor performance is a **mismatch between training objective and evaluation metric**:
- R-GCN was optimized with **binary classification (BCE) on sampled negatives**
- Evaluation requires **global ranking quality** over a large item space

As a result, the model quickly learns to separate sampled positives and negatives,
but does not produce meaningful item rankings.

## Conclusion
- R-GCN in this configuration is **not suitable** for the target ranking task
- Further training or hyperparameter tuning is unlikely to close the gap
- The experiment serves as a **negative baseline** and diagnostic reference

## Next Steps
We move to **sampling-based GNNs with ranking objectives**, starting with:
- **GraphSAGE + neighbor sampling + BPR loss**

This allows a fair architectural comparison under an objective aligned
with Hit@K / NDCG@K evaluation.