# Pytorch Link prediction example

This notebook is a short toy example to gain an understanding on how link prediction inputs and outputs work, This follows along from the tutorial from pytorch geometric.
https://colab.research.google.com/drive/1xpzn1Nvai1ygd_P5Yambc_oe4VBPK_ZT?usp=sharing
## Imports and Setup

In [3]:
import torch
from torch import tensor
print(torch.__version__)


2.3.0+cu121


In [4]:
# Install required packages.
import os
os.environ['TORCH'] = torch.__version__

!pip install torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install pyg-lib -f https://data.pyg.org/whl/nightly/torch-${TORCH}.html
!pip install git+https://github.com/pyg-team/pytorch_geometric.git

Looking in links: https://data.pyg.org/whl/torch-2.3.0+cu121.html
Looking in links: https://data.pyg.org/whl/torch-2.3.0+cu121.html
Looking in links: https://data.pyg.org/whl/nightly/torch-2.3.0+cu121.html
Collecting git+https://github.com/pyg-team/pytorch_geometric.git
  Cloning https://github.com/pyg-team/pytorch_geometric.git to /tmp/pip-req-build-kd5w9207
  Running command git clone --filter=blob:none --quiet https://github.com/pyg-team/pytorch_geometric.git /tmp/pip-req-build-kd5w9207
  Resolved https://github.com/pyg-team/pytorch_geometric.git to commit 61c47ee404f8e26b3a1cd0db56448b6254920d0e
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone


In [5]:
from torch_geometric.data import download_url, extract_zip

url = 'https://files.grouplens.org/datasets/movielens/ml-latest-small.zip'
extract_zip(download_url(url, '.'), '.')

movies_path = './ml-latest-small/movies.csv'
ratings_path = './ml-latest-small/ratings.csv'

Using existing file ml-latest-small.zip
Extracting ./ml-latest-small.zip


In [7]:
import pandas as pd

print("Movies.csv:\n====================")
print(pd.read_csv(movies_path)[["movieId", "genres"]].head())
print()
print('Ratings.csv:\n====================')
print(pd.read_csv(ratings_path)[["userId", "movieId", "rating"]].head())

Movies.csv:
   movieId                                       genres
0        1  Adventure|Animation|Children|Comedy|Fantasy
1        2                   Adventure|Children|Fantasy
2        3                               Comedy|Romance
3        4                         Comedy|Drama|Romance
4        5                                       Comedy

Ratings.csv:
   userId  movieId  rating
0       1        1     4.0
1       1        3     4.0
2       1        6     4.0
3       1       47     5.0
4       1       50     5.0


In [8]:
# Load the entire movie data frame into memory
movies_df = pd.read_csv(movies_path,index_col='movieId')

#split the genres and convert into indicator variables
genres = movies_df['genres'].str.get_dummies('|')
print(genres[['Action', 'Adventure', 'Comedy', 'Drama']].head())

# User genres as movie input features
movie_feat = torch.from_numpy(genres.values).to(torch.float)
assert movie_feat.size() == (9742, 20)

         Action  Adventure  Comedy  Drama
movieId                                  
1             0          1       1      0
2             0          1       0      0
3             0          0       1      0
4             0          0       1      1
5             0          0       1      0
