# Film-Recommender: LO3 — GNN-Reranking Pipeline
*Generated: 2025-09-12T12:28:03 UTC*

Dieses Notebook baut direkt auf deiner Datei **`rerank_by_logical_rules.csv`** auf und zeigt, wie du für LO3 (GNNs) einen Graph konstruierst, ein (heterogenes) GNN trainierst und die resultierenden Scores mit deinem bisherigen `final`-Score ensemblest.

**What you’ll get:**
1. CSV laden (aus deinem ZIP) und erkunden  
2. Graph aus `seed → candidate` und `comp_*`-Relationen  
3. GNN-Setup mit PyTorch Geometric (R-GCN) *oder* LightGCN (optional)  
4. Training für Link Prediction `user → movie` bzw. Reranking der Kandidaten  
5. Evaluation (Hit@K, NDCG@K) und **Ensemble mit `final`**  
6. Export der neuen Rankings

> **Hinweis:** Das Notebook ist so geschrieben, dass es **ohne Internet** zunächst eine *Baseline* (rein aus deinen CSV-Scores) rechnen kann.  
> Für GNN-Training brauchst du **PyTorch** und **PyTorch Geometric**. Installationszellen sind enthalten.


In [1]:
# === Pfade anpassen (lokales Projektsetup) ===
CSV_PATH = "rerank_by_logical_rules.csv"  # relativer Pfad aus gnn/ zum CSV
OUTPUT_DIR = "outputs"  # z. B. hierhin speichern
OUTPUT_CSV_BASELINE = f"{OUTPUT_DIR}/baseline_rerank.csv"
OUTPUT_CSV_GNN = f"{OUTPUT_DIR}/gnn_rerank.csv"
OUTPUT_CSV_ENSEMBLE = f"{OUTPUT_DIR}/ensemble_rerank.csv"

SEED = 42
TOPK = 10



## (Optional) Installationen für lokales Training
Führe diese Zelle **lokal** (mit Internet) aus, wenn PyTorch/pyG noch nicht installiert sind.


In [2]:
!pip install --upgrade pip
import sys, platform
# Wähle das passende Torch-Whl für deine CUDA/CPU-Umgebung, z. B.:
# CPU-Only (einfachster Weg):
!pip install torch --index-url https://download.pytorch.org/whl/cpu
# PyTorch Geometric Kernpakete:
!pip install torch-geometric torch-scatter torch-sparse torch-cluster torch-spline-conv -f https://data.pyg.org/whl/torch-$(python -c "import torch;print(torch.__version__.split('+')[0])").html
# Optional: LightGCN-Referenz-Implementationen (nur falls gewünscht)
# !pip install recbole  # enthält LightGCN, benötigt evtl. weitere Pakete


Looking in indexes: https://download.pytorch.org/whl/cpu
Looking in links: https://data.pyg.org/whl/torch-2.8.0.html


## 1) Daten laden
Wir extrahieren `rerank_by_logical_rules.csv` aus dem ZIP und schauen uns die Spalten an.


In [3]:
import pandas as pd

df = pd.read_csv(CSV_PATH)
print(df.shape)
df.head(3)



(200, 41)


Unnamed: 0,candidate_id,candidate_title,year,cos,meta,final,seed,comp_genres,comp_keywords,comp_cast,...,name_norm,year_str,genre_list,director_list,watchlist_priority,genre_boost,director_boost,genre_penalty,director_penalty,score
0,2756,The Abyss,1989.0,0.0,0.4592,0.1837,Aliens,0.5,0.0,0.0256,...,the abyss,1989.0,"['Adventure', 'Thriller', 'Science Fiction']",['James Cameron'],True,True,True,False,False,5
1,1991,Death Proof,2007.0,0.0,0.501,0.2004,Kill Bill: Vol. 2,0.6667,0.0179,0.0526,...,death proof,2007.0,"['Action', 'Thriller']",['Quentin Tarantino'],False,True,True,False,False,3
2,28387,Kicking and Screaming,1995.0,0.0,0.5109,0.2044,The Meyerowitz Stories (New and Selected),0.6667,0.0,0.0256,...,kicking and screaming,1995.0,"['Comedy', 'Drama', 'Romance']",['Noah Baumbach'],False,True,True,False,False,3


### Grundbereinigung & Typen
Wir stellen sicher, dass Score-Spalten numerisch sind und fehlende Werte sinnvoll ersetzt werden.


In [4]:
# Kandidatenhafte Standardspalten, die es laut deiner Beschreibung gibt.
candidate_like_cols = [
    'seed', 'candidate_id', 'candidate_title', 'cos', 'final', 'score',
    'comp_genres','comp_keywords','comp_cast','comp_director','comp_runtime',
    'comp_language','comp_popularity','comp_vote'
]

for c in df.columns:
    if c.startswith('comp_') or c in ['cos','final','score']:
        df[c] = pd.to_numeric(df[c], errors='coerce')

# fehlende Kompatibilitäten als 0 interpretieren
for c in [c for c in df.columns if c.startswith('comp_')]:
    df[c] = df[c].fillna(0.0)

for c in ['cos','final','score']:
    if c in df.columns:
        df[c] = df[c].fillna(df[c].median())

# ein paar Hilfssichten
print("Spalten:", list(df.columns))
print(df.describe(include='all').transpose().head(12))


Spalten: ['candidate_id', 'candidate_title', 'year', 'cos', 'meta', 'final', 'seed', 'comp_genres', 'comp_keywords', 'comp_cast', 'comp_director', 'comp_runtime', 'comp_language', 'comp_popularity', 'comp_vote', 'tmdb_url', 'overview', 'genres', 'runtime', 'vote_average', 'poster_url', 'media_type', 'director', 'actors', 'characters', 'origin_country', 'original_language', 'popularity', 'production_companies', 'production_countries', 'spoken_languages', 'name_norm', 'year_str', 'genre_list', 'director_list', 'watchlist_priority', 'genre_boost', 'director_boost', 'genre_penalty', 'director_penalty', 'score']
                 count unique                                        top freq  \
candidate_id     200.0    NaN                                        NaN  NaN   
candidate_title    200    200                                  The Abyss    1   
year             199.0    NaN                                        NaN  NaN   
cos              200.0    NaN                                

## 2) Graph aus CSV konstruieren (Relationen & Gewichte)
Wir erzeugen Kanten vom Seed-Film zum Kandidaten sowie zusätzliche Relationen aus `comp_*`-Spalten.  
Fürs reine Python-Preview bauen wir Edge-Listen und normalisieren Gewichte auf [0,1].


In [5]:
# Hilfsfunktionen
def minmax(x):
    x = np.asarray(x, dtype=float)
    mn, mx = np.nanmin(x), np.nanmax(x)
    if mx <= mn:
        return np.zeros_like(x)
    return (x - mn) / (mx - mn)

# Alle Filme (Seed + Candidate) identifizieren
movies = pd.unique(pd.concat([df['seed'], df['candidate_title']], ignore_index=True))
movie2id = {m:i for i,m in enumerate(movies)}
id2movie = {i:m for m,i in movie2id.items()}

# Edge-Typen definieren
rel_cols = ['cos','final','comp_genres','comp_keywords','comp_cast','comp_director',
            'comp_runtime','comp_language','comp_popularity','comp_vote']
rel_cols = [c for c in rel_cols if c in df.columns]

edges = {c: [] for c in rel_cols}

for _, row in df.iterrows():
    s = row['seed']; c = row['candidate_title']
    if pd.isna(s) or pd.isna(c): 
        continue
    u, v = movie2id[s], movie2id[c]
    for rc in rel_cols:
        w = row[rc]
        if pd.notna(w):
            edges[rc].append((u, v, float(w)))

# Normalisierung je Relation
norm_edges = {}
for rc, lst in edges.items():
    if not lst:
        continue
    w = np.array([w for (_,_,w) in lst], dtype=float)
    wn = minmax(w)
    norm_edges[rc] = [(u,v,float(wn[i])) for i,(u,v,_) in enumerate(lst)]

# Statistiken
for rc, lst in norm_edges.items():
    print(rc, "Edges:", len(lst), "Beispiel:", lst[:3])


NameError: name 'np' is not defined

## 3) Baseline-Reranking (ohne GNN) — zum Vergleich
Wir verwenden den vorhandenen `final`-Score als Baseline und evaluieren Hit@K/NDCG@K gegenüber einer Proxy-„Relevanz“.


In [11]:
# Proxy-Relevanz: Falls es eine Spalte mit Bodenwahrheit gibt (z. B. 'label', 'liked'), verwende sie.
# Hier fallback: Relevanz = 1 für Top-X nach 'score' oder 'final' je Seed (nur für demo). 
label_col_candidates = [c for c in ['label','liked','relevant','gt','y'] if c in df.columns]
if label_col_candidates:
    LABEL_COL = label_col_candidates[0]
else:
    LABEL_COL = None

def ndcg_at_k(rel, k=10):
    rel = np.array(rel)[:k]
    dcg = np.sum((2**rel - 1) / np.log2(np.arange(2, len(rel)+2)))
    ideal = np.sort(rel)[::-1]
    idcg = np.sum((2**ideal[:k] - 1) / np.log2(np.arange(2, min(k, len(ideal))+2)))
    return (dcg / idcg) if idcg > 0 else 0.0

def hit_at_k(rel, k=10):
    return 1.0 if np.any(np.array(rel)[:k] > 0) else 0.0

def evaluate_grouped(df, score_col, k=10):
    hits, ndcgs, n = [], [], 0
    for seed, g in df.groupby('seed', sort=False):
        g = g.sort_values(score_col, ascending=False)
        if LABEL_COL is None:
            # pseudo-label: Top-1 nach 'score' (falls vorhanden), sonst Zufallspositiv
            if 'score' in g.columns:
                pos_idx = g[score_col].rank(ascending=False, method='first').idxmin()
                rel = (g.index == pos_idx).astype(int)
            else:
                rel = np.zeros(len(g), dtype=int)
                if len(g)>0:
                    rel[np.random.randint(len(g))] = 1
        else:
            rel = (g[LABEL_COL] > 0).astype(int).values

        hits.append(hit_at_k(rel, k))
        ndcgs.append(ndcg_at_k(rel, k))
        n += 1
    return float(np.mean(hits) if hits else 0.0), float(np.mean(ndcgs) if ndcgs else 0.0), n

if 'final' in df.columns:
    h, n, cnt = evaluate_grouped(df, 'final', k=TOPK)
    print(f"Baseline (`final`) — Hit@{TOPK}: {h:.3f}, NDCG@{TOPK}: {n:.3f} (Seeds: {cnt})")
else:
    print("Keine 'final'-Spalte gefunden; Baseline wird übersprungen.")

# Export Baseline-Ranking
if 'final' in df.columns:
    df.sort_values(['seed','final'], ascending=[True, False]).to_csv(OUTPUT_CSV_BASELINE, index=False)
    print(f"Baseline-Ranking exportiert: {OUTPUT_CSV_BASELINE}")


Baseline (`final`) — Hit@10: 1.000, NDCG@10: 1.000 (Seeds: 130)
Baseline-Ranking exportiert: ../data/kg/outputs/baseline_rerank.csv


## 4) GNN-Training (R-GCN über heterogene Film–Film-Relationen)
Wir verwenden **PyTorch Geometric** (sofern installiert), um aus den Relationstypen (`cos`, `comp_*`, …) ein heterogenes GNN zu trainieren.  
Ziel: Ein **Item-Scoring** pro Seed-Kandidaten ableiten (Link-Prediction-Proxy), das wir anschließend zum Reranking nutzen.

> Wenn `torch`/`torch_geometric` nicht verfügbar sind, überspringt diese Sektion automatisch.


In [12]:
import importlib, math, warnings
warnings.filterwarnings('ignore')

has_torch = importlib.util.find_spec('torch') is not None
has_pyg = importlib.util.find_spec('torch_geometric') is not None

print("Torch installiert:", has_torch, "| PyG installiert:", has_pyg)

if not (has_torch and has_pyg):
    print("GNN-Teil wird übersprungen (Pakete fehlen). Du kannst die Installationszelle oben lokal ausführen und diese Zelle dann erneut laufen lassen.")


Torch installiert: True | PyG installiert: True


In [16]:
if has_torch and has_pyg:
    import torch
    from torch import nn
    import torch.nn.functional as F
    from torch_geometric.data import HeteroData
    from torch_geometric.nn import HeteroConv, GATv2Conv  # alternativ: SAGEConv

    torch.manual_seed(SEED)

    # === 1) HeteroData nur für die Kantenstruktur ===
    data = HeteroData()
    num_movies = len(movie2id)
    d = 64

    # movie->movie Kanten aus den normalisierten Relationen
    for rc, lst in norm_edges.items():
        if not lst:
            continue
        src = torch.tensor([u for (u,_,_) in lst], dtype=torch.long)
        dst = torch.tensor([v for (_,v,_) in lst], dtype=torch.long)
        data['movie', rc, 'movie'].edge_index = torch.stack([src, dst], dim=0)

    # user->movie (positive Beispiele: Top-1 pro Seed)
    pos_pairs = []
    for seed, g in df.groupby('seed', sort=False):
        if len(g) == 0:
            continue
        if 'final' in g.columns:
            g = g.sort_values('final', ascending=False)
        cand = g.iloc[0]['candidate_title']
        if pd.isna(cand):
            continue
        mid = movie2id.get(cand)
        if mid is not None:
            pos_pairs.append((0, mid))

    if not pos_pairs:
        raise RuntimeError("Keine positiven Paare gefunden. Prüfe die 'final'-Spalte und den CSV-Inhalt.")

    u_src = torch.tensor([p[0] for p in pos_pairs], dtype=torch.long)
    m_dst = torch.tensor([p[1] for p in pos_pairs], dtype=torch.long)
    data['user','likes','movie'].edge_index = torch.stack([u_src, m_dst], dim=0)

    # Negative Samples (random ungesehene)
    all_movie_ids = torch.arange(num_movies, dtype=torch.long)
    pos_set = set(m_dst.tolist())
    neg_pairs = []
    for _ in range(len(pos_pairs) * 3):
        neg_m = int(all_movie_ids[torch.randint(0, num_movies, (1,))])
        if neg_m not in pos_set:
            neg_pairs.append((0, neg_m))
    if not neg_pairs:
        neg_pairs.append((0, int(all_movie_ids[0])))
    un_src = torch.tensor([p[0] for p in neg_pairs], dtype=torch.long)
    mn_dst = torch.tensor([p[1] for p in neg_pairs], dtype=torch.long)

    # Nur Kanten, die auf 'movie' zeigen, werden für die Convs benutzt
    conv_edge_types = [et for et in data.edge_types if et[2] == 'movie']
    edge_index_dict = {et: data[et].edge_index for et in conv_edge_types}

    # === 2) Modell mit internen Embeddings (vermeidet None/Shape-Probleme) ===
    class HeteroRecommender(nn.Module):
        def __init__(self, num_movies, dim=64, layers=2):
            super().__init__()
            self.movie_emb = nn.Embedding(num_movies, dim)
            self.user_emb  = nn.Embedding(1, dim)

            self.layers = nn.ModuleList()
            for _ in range(layers):
                self.layers.append(
                    HeteroConv(
                        { et: GATv2Conv((-1, -1), dim, add_self_loops=False)
                          for et in conv_edge_types },
                        aggr='sum'
                    )
                )

        def forward(self, edge_index_dict):
            # baue x_dict on-the-fly aus Embeddings
            x_dict = {
                'movie': self.movie_emb.weight,   # [num_movies, d]
                'user' : self.user_emb.weight     # [1, d]
            }
            # nur auf Relationen mit dst='movie' propagieren
            out = x_dict
            for conv in self.layers:
                out_partial = conv(out, edge_index_dict)
                # conv liefert nur Outputs für betroffene Zieltypen (hier 'movie'):
                out = {**out, **{k: F.relu(v) for k, v in out_partial.items()}}
            return out

        @staticmethod
        def score(user_vec, item_vec):
            return (user_vec * item_vec).sum(dim=-1)

    model = HeteroRecommender(num_movies=num_movies, dim=d, layers=2)
    opt = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-5)
    bce = nn.BCEWithLogitsLoss()

    # === 3) Training ===
    for epoch in range(1, 201):
        model.train()
        opt.zero_grad()

        x_out = model(edge_index_dict)
        user_pos = x_out['user'][u_src]       # (P, d)
        item_pos = x_out['movie'][m_dst]      # (P, d)
        user_neg = x_out['user'][un_src]      # (N, d)
        item_neg = x_out['movie'][mn_dst]     # (N, d)

        pos_logit = HeteroRecommender.score(user_pos, item_pos)
        neg_logit = HeteroRecommender.score(user_neg, item_neg)

        loss = bce(pos_logit, torch.ones_like(pos_logit)) + \
               bce(neg_logit, torch.zeros_like(neg_logit))
        loss.backward()
        opt.step()

        if epoch % 50 == 0:
            print(f"Epoch {epoch:3d} | Loss {loss.item():.4f}")

    # === 4) Scoring aller Kandidaten je Seed ===
    model.eval()
    with torch.no_grad():
        x_out = model(edge_index_dict)
        user_vec = x_out['user'][0:1]  # einziger User

    gnn_scores = []
    for _, row in df.iterrows():
        cand = row['candidate_title']
        if pd.isna(cand):
            gnn_scores.append(np.nan); continue
        mid = movie2id.get(cand, None)
        if mid is None:
            gnn_scores.append(np.nan); continue
        item_vec = x_out['movie'][mid:mid+1]
        s = float((user_vec * item_vec).sum(dim=-1))
        gnn_scores.append(s)

    df['s_gnn'] = gnn_scores
    df['s_gnn_norm'] = df.groupby('seed')['s_gnn'].transform(
        lambda x: (x - x.min()) / (x.max() - x.min() + 1e-9)
    )
    print("GNN-Scoring hinzugefügt: Spalten 's_gnn'/'s_gnn_norm'")

    df.sort_values(['seed','s_gnn_norm'], ascending=[True, False]).to_csv(OUTPUT_CSV_GNN, index=False)
    print(f"GNN-Rerank exportiert: {OUTPUT_CSV_GNN}")


Epoch  50 | Loss 0.0146
Epoch 100 | Loss 0.0059
Epoch 150 | Loss 0.0035
Epoch 200 | Loss 0.0023
GNN-Scoring hinzugefügt: Spalten 's_gnn'/'s_gnn_norm'
GNN-Rerank exportiert: ../data/kg/outputs/gnn_rerank.csv


## 5) Ensemble: GNN + `final`
Wir kombinieren deinen existierenden `final`-Score mit dem GNN-Score:  
\( \text{score\_final} = \lambda\, s_{\text{gnn}} + (1-\lambda)\, \text{final} \)


In [17]:
LAMBDA = 0.6  # kannst du tunen

if 'final' not in df.columns:
    print("Keine 'final'-Spalte gefunden — Ensemble übersprungen.")
else:
    if 's_gnn_norm' not in df.columns:
        # Fallback: ohne GNN einfach die Baseline kopieren
        df['s_gnn_norm'] = 0.0
        print("Warnung: Kein GNN-Score gefunden; Ensemble entspricht Baseline.")
    # Normiere final pro Seed auf [0,1]
    df['final_norm'] = df.groupby('seed')['final'].transform(lambda x: (x - x.min()) / (x.max()-x.min() + 1e-9))
    df['score_ensemble'] = LAMBDA * df['s_gnn_norm'] + (1.0 - LAMBDA) * df['final_norm']

    # Evaluation
    h_b, n_b, cnt = evaluate_grouped(df, 'final', k=TOPK) if 'final' in df.columns else (0,0,0)
    h_e, n_e, _   = evaluate_grouped(df, 'score_ensemble', k=TOPK)
    print(f"Baseline Hit@{TOPK}: {h_b:.3f}, NDCG@{TOPK}: {n_b:.3f}")
    print(f"Ensemble  Hit@{TOPK}: {h_e:.3f}, NDCG@{TOPK}: {n_e:.3f}")

    # Export
    df.sort_values(['seed','score_ensemble'], ascending=[True, False]).to_csv(OUTPUT_CSV_ENSEMBLE, index=False)
    print(f"Ensemble-Ranking exportiert: {OUTPUT_CSV_ENSEMBLE}")


Baseline Hit@10: 1.000, NDCG@10: 1.000
Ensemble  Hit@10: 1.000, NDCG@10: 1.000
Ensemble-Ranking exportiert: ../data/kg/outputs/ensemble_rerank.csv


## 6) (Optional) LightGCN-Variante
Wenn du echte **User–Movie**-Interaktionen (Ratings/Watchlist) einbaust, ist **LightGCN** sehr effizient.  
Vorgehen:
1. Baue bipartiten Graph `user–movie` aus deinen Interaktionen (positive/negative Kanten via Sampling).  
2. Trainiere mit **BPR-Loss**.  
3. Nutze zusätzlich deinen **Item–Item**-Graph aus `comp_*` als Regularizer (z. B. SGC/LGConv auf Item-Embeddings zwischen den LightGCN-Runden).

> Implementierungshilfen: Recbole/LightGCN oder PyG-Implementationen. Dieses Notebook fokussiert R-GCN für LO3.


In [18]:
'''
The code in the previous cells is in big parts AI generated by the free and paid version of ChatGPT and was afterwards heavily adapted by me. Since it is not possible to accurately say which parts were originaly AI generated by wich promt, I have included all prompts that were used on this file here.
These following prompts were used:

    "Das hochgeladene Archiv ist mein Projekt. Über diesen Link https://kg.dbai.tuwien.ac.at/kg-course/details/ sind alle Learning Objectives, die ich erfüllen muss, einzusehen. das PDF erhält meinen On-Pager, in dem ich dmein Projekt kurz erkläre. LO1 und LO2 decke ich schon ab, als nächstes möchte ich LO3 angehen. Bitte erkläre mir, wie ich für mein Film-Empfehlungsprojekt sinnvoll GNN einsetzen kann. Am liebsten würde ich mit der Datei "rerank_by_logical_rules.csv" weiterarbeiten, die schon meine mit Embeddings und logischen Regeln bearbeiteten Empfehlungen beinhaltet."

    "Ja bitte. Mach mir am besten ein .ipynb file daraus"

    "das ist meine Ordnerstruktur. "rerank_by_logical_rules.csv" befindet sich auf folgendem pfad: "letterboxd-KG/data/kg/rerank_by_logical_rules.csv". Das generierte File befindet sich auf folgendem Pfad: "letterboxd-KG/gnn/gnn_rerank_pipeline.ipynb". Kannst du mir die zeilen umschreiben?"

    "--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[10], line 41 38 return float(np.mean(hits) if hits else 0.0), float(np.mean(ndcgs) if ndcgs else 0.0), n 40 if 'final' in df.columns: ---> 41 h, n, cnt = evaluate_grouped(df, 'final', k=TOPK) 42 print(f"Baseline (final) — Hit@{TOPK}: {h:.3f}, NDCG@{TOPK}: {n:.3f} (Seeds: {cnt})") 43 else: Cell In[10], line 27, in evaluate_grouped(df, score_col, k) 25 if 'score' in g.columns: 26 pos_idx = g[score_col].rank(ascending=False, method='first').idxmin() ---> 27 rel = (g.index == pos_idx).astype(int).values 28 else: 29 rel = np.zeros(len(g), dtype=int) AttributeError: 'numpy.ndarray' object has no attribute 'values'"

    "ich habe .values entfernt. nun bekomme ich diesen fehler: --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[13], line 85 80 rel_id_map = {et:i for i,et in enumerate(rel_types)} 82 # RGCNConv erwartet 'edge_type' für homogene Darstellung; wir lassen PyG das intern managen, 83 # indem wir das Hetero-Modell wie gezeigt mit dicts füttern. ---> 85 model = RGCNRecommender(in_dim=d, hid=d, out_dim=d, num_rels=len(rel_types)) 86 opt = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-5) 88 def score_user_item(user_emb, item_emb): 89 # simples Skalarprodukt Cell In[13], line 69, in RGCNRecommender.__init__(self, in_dim, hid, out_dim, num_rels) 67 def __init__(self, in_dim=64, hid=64, out_dim=64, num_rels=0): 68 super().__init__() ---> 69 self.conv1 = RGCNConv(in_dim, hid, num_rels=num_rels) 70 self.conv2 = RGCNConv(hid, out_dim, num_rels=num_rels) TypeError: RGCNConv.__init__() missing 1 required positional argument: 'num_relations'"

    "kannst du mir die komplette zelle nochmal am stück anzeigen?"

    "mit diesem code in der zelle bekomme ich folgende fehler: --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[15], line 94 92 model.train() 93 opt.zero_grad() ---> 94 x_out = model({'movie': data['movie'].x, 'user': data['user'].x}, edge_index_dict) 96 user_emb = x_out['user'][u_src] 97 pos_item_emb = x_out['movie'][m_dst] File /opt/anaconda3/lib/python3.13/site-packages/torch/nn/modules/module.py:1773, in Module._wrapped_call_impl(self, *args, **kwargs) 1771 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] 1772 else: -> 1773 return self._call_impl(*args, **kwargs) File /opt/anaconda3/lib/python3.13/site-packages/torch/nn/modules/module.py:1784, in Module._call_impl(self, *args, **kwargs) 1779 # If we don't have any hooks, we want to skip the rest of the logic in 1780 # this function, and just call forward. 1781 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1782 or _global_backward_pre_hooks or _global_backward_hooks 1783 or _global_forward_hooks or _global_forward_pre_hooks): -> 1784 return forward_call(*args, **kwargs) 1786 result = None 1787 called_always_called_hooks = set() Cell In[15], line 76, in HeteroRecommender.forward(self, x_dict, edge_index_dict) 74 def forward(self, x_dict, edge_index_dict): 75 for conv in self.layers: ---> 76 x_dict = conv(x_dict, edge_index_dict) 77 x_dict = {k: F.relu(v) for k,v in x_dict.items()} 78 return x_dict File /opt/anaconda3/lib/python3.13/site-packages/torch/nn/modules/module.py:1773, in Module._wrapped_call_impl(self, *args, **kwargs) 1771 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] 1772 else: -> 1773 return self._call_impl(*args, **kwargs) File /opt/anaconda3/lib/python3.13/site-packages/torch/nn/modules/module.py:1784, in Module._call_impl(self, *args, **kwargs) 1779 # If we don't have any hooks, we want to skip the rest of the logic in 1780 # this function, and just call forward. 1781 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1782 or _global_backward_pre_hooks or _global_backward_hooks 1783 or _global_forward_hooks or _global_forward_pre_hooks): -> 1784 return forward_call(*args, **kwargs) 1786 result = None 1787 called_always_called_hooks = set() File /opt/anaconda3/lib/python3.13/site-packages/torch_geometric/nn/conv/hetero_conv.py:158, in HeteroConv.forward(self, *args_dict, **kwargs_dict) 155 if not has_edge_level_arg: 156 continue --> 158 out = conv(*args, **kwargs) 160 if dst not in out_dict: 161 out_dict[dst] = [out] File /opt/anaconda3/lib/python3.13/site-packages/torch/nn/modules/module.py:1773, in Module._wrapped_call_impl(self, *args, **kwargs) 1771 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] 1772 else: -> 1773 return self._call_impl(*args, **kwargs) File /opt/anaconda3/lib/python3.13/site-packages/torch/nn/modules/module.py:1784, in Module._call_impl(self, *args, **kwargs) 1779 # If we don't have any hooks, we want to skip the rest of the logic in 1780 # this function, and just call forward. 1781 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1782 or _global_backward_pre_hooks or _global_backward_hooks 1783 or _global_forward_hooks or _global_forward_pre_hooks): -> 1784 return forward_call(*args, **kwargs) 1786 result = None 1787 called_always_called_hooks = set() File /opt/anaconda3/lib/python3.13/site-packages/torch_geometric/nn/conv/gatv2_conv.py:293, in GATv2Conv.forward(self, x, edge_index, edge_attr, return_attention_weights) 291 else: 292 x_l, x_r = x[0], x[1] --> 293 assert x[0].dim() == 2 295 if x_r is not None and self.res is not None: 296 res = self.res(x_r) AttributeError: 'NoneType' object has no attribute 'dim'"

    ""

'''

'\nThe code in the previous cells is in big parts AI generated by the free and paid version of ChatGPT and was afterwards heavily adapted by me. Since it is not possible to accurately say which parts were originaly AI generated by wich promt, I have included all prompts that were used on this file here.\nThese following prompts were used:\n\n    "Das hochgeladene Archiv ist mein Projekt. Über diesen Link https://kg.dbai.tuwien.ac.at/kg-course/details/ sind alle Learning Objectives, die ich erfüllen muss, einzusehen. das PDF erhält meinen On-Pager, in dem ich dmein Projekt kurz erkläre. LO1 und LO2 decke ich schon ab, als nächstes möchte ich LO3 angehen. Bitte erkläre mir, wie ich für mein Film-Empfehlungsprojekt sinnvoll GNN einsetzen kann. Am liebsten würde ich mit der Datei "rerank_by_logical_rules.csv" weiterarbeiten, die schon meine mit Embeddings und logischen Regeln bearbeiteten Empfehlungen beinhaltet."\n\n    "Ja bitte. Mach mir am besten ein .ipynb file daraus"\n\n'