# Practice HW 2

In [None]:
# import torch

# !pip uninstall torch-scatter torch-sparse torch-geometric torch-cluster  --y
# !pip install torch-scatter -f https://data.pyg.org/whl/torch-{torch.__version__}.html
# !pip install torch-sparse -f https://data.pyg.org/whl/torch-{torch.__version__}.html
# !pip install torch-cluster -f https://data.pyg.org/whl/torch-{torch.__version__}.html
# !pip install git+https://github.com/pyg-team/pytorch_geometric.git

## 1. Implementation of TransE (10 points)

In this assignment, you need to implement a training pipeline for learning **knowledge graph embeddings** using [TransE](https://proceedings.neurips.cc/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf) for the task of **predicting missing edges** on the [Freebase](https://paperswithcode.com/dataset/fb15k) (FB15k-237) dataset, as well as implement the model itself.

In [None]:
import torch_geometric
from torch_geometric.datasets.rel_link_pred_dataset import RelLinkPredDataset


dataset = RelLinkPredDataset('data', 'FB15k-237')
data = dataset[0]

### TransE

Edges in the knowledge graph are represented as triples \((h, r, t)\). In **TransE**, we model both objects and relationships in the embedding space and try to obtain embeddings as  

$$\mathbf{h} + \mathbf{l} \approx \mathbf{t}$$

Formally, the loss looks like:

$$\sum_{((h, l, t), (h', l, t') \in T_{batch}} [\gamma + d(\mathbf{h} + \mathbf{l}, \mathbf{t}) - d(\mathbf{h'} + \mathbf{l}, \mathbf{t'})]$$

where $(h', l, t')$ represents a triple, replacing the head or tail with a random object.  
$d(\mathbf{h} + \mathbf{l}, \mathbf{t})$ is the **difference** measure of the positive edge.  
Besides, $d(\mathbf{h'} + \mathbf{l}, \mathbf{t'})$ is the **difference** estimate for the negative triple, obtained by changing either the head or the tail (but not both) of the positive triple.  

Thus, **TransE prefers lower scores for positive edges and higher scores for negative edges**.

Regarding the parameter $\gamma$, it is used to ensure that the positive edge score differs from the negative edge score by at least $\gamma$.

#### TransE Algorithm

The TransE algorithm is as follows:

![](https://production-media.paperswithcode.com/methods/Screen_Shot_2020-05-27_at_12.01.23_AM.png)

#### Implementation of the Model

According to the pseudocode above, you can initialize $\mathbf{l}$ and $\mathbf{e}$.  
To compute $d(\mathbf{h} + \mathbf{l}, \mathbf{t})$, take the **L2 norm** of $\mathbf{h} + \mathbf{l} - \mathbf{t}$.

*Note: To improve performance, normalize $\mathbf{e}$ every epoch instead of every mini-batch.*

**Auxiliary Functions:**  

One of the key aspects of training the model is generating **corrupted triples** by replacing the **head** or **tail** with a random object.

In [None]:
def create_neg_edge_index(edge_index, edge_type, num_entities):
    head_or_tail = torch.randint(high=2, size=edge_type.size(),
                                 device=device)
    rand_entities = torch.randint(high=num_entities,
                                  size=edge_type.size(), device=device)
    # change when 1, otherwise regular head
    heads = torch.where(head_or_tail == 1, rand_entities,
                        edge_index[0, :])
    # change when 0, otherwise regular tail
    tails = torch.where(head_or_tail == 0, rand_entities,
                        edge_index[1, :])
    return torch.stack([heads, tails], dim=0)

We will evaluate the model's performance using **Hits@10, Mean Rank, and MRR (Mean Reciprocal Rank)**.

- **Hits@10**:  
  $$\frac{|\{r \in P | r \leq 10\}|}{|P|}$$
  where \( |P| \) is the number of rankings, and \( r \) is the rank. This metric measures the percentage of correct entities ranked within the top 10.

- **Mean Rank**:  
  $$\frac{1}{|P|}\sum_{r \in P}r$$
  This metric calculates the average rank of the correct entities.

- **MRR (Mean Reciprocal Rank)**:  
  $$\frac{1}{|P|}\sum_{r \in P}\frac{1}{r}$$
  This metric evaluates ranking quality by considering the inverse of the rank, giving higher importance to top-ranked correct entities.

For more details on these metrics, refer to [this paper](https://arxiv.org/pdf/2002.06914.pdf).

In [None]:
def mrr(predictions, gt):
    indices = predictions.argsort()
    return (1.0 / (indices == gt).nonzero()[:, 1].float().add(1.0)).sum().item()


def mr(predictions, gt):
    indices = predictions.argsort()
    return ((indices == gt).nonzero()[:, 1].float().add(1.0)).sum().item()


def hit_at_k(predictions, gt, device, k=10):
    zero_tensor = torch.tensor([0], device=device)
    one_tensor = torch.tensor([1], device=device)
    _, indices = predictions.topk(k=k, largest=False)
    return torch.where(indices == gt, one_tensor, zero_tensor).sum().item()

**Requirement:**  

Achieve at least **0.17 MRR** and **0.30 Hits@10** to meet the performance threshold.

## 1.1 Question on Normalization (2 points) 

Try training **TransE** **without the fifth line of the algorithm** (without normalization by entities). **What happens to the training?** **Why is this line needed?**

## 2. Neural Network on Heterogeneous Data (3 points)

Take one of the two datasets (**Freebase** / synthetic dataset **hetero_graph** below).

In [None]:
import numpy as np
import torch

n_users = 1000
n_items = 500
n_follows = 3000
n_clicks = 5000
n_dislikes = 500
n_hetero_features = 10
n_user_classes = 5
n_max_clicks = 10

follow_src = np.random.randint(0, n_users, n_follows)
follow_dst = np.random.randint(0, n_users, n_follows)
click_src = np.random.randint(0, n_users, n_clicks)
click_dst = np.random.randint(0, n_items, n_clicks)
dislike_src = np.random.randint(0, n_users, n_dislikes)
dislike_dst = np.random.randint(0, n_items, n_dislikes)

hetero_graph = dgl.heterograph({
    ('user', 'follow', 'user'): (follow_src, follow_dst),
    ('user', 'followed-by', 'user'): (follow_dst, follow_src),
    ('user', 'click', 'item'): (click_src, click_dst),
    ('item', 'clicked-by', 'user'): (click_dst, click_src),
    ('user', 'dislike', 'item'): (dislike_src, dislike_dst),
    ('item', 'disliked-by', 'user'): (dislike_dst, dislike_src)})

hetero_graph.nodes['user'].data['feature'] = torch.randn(n_users, n_hetero_features)
hetero_graph.nodes['item'].data['feature'] = torch.randn(n_items, n_hetero_features)
hetero_graph.nodes['user'].data['label'] = torch.randint(0, n_user_classes, (n_users,))
hetero_graph.edges['click'].data['label'] = torch.randint(1, n_max_clicks, (n_clicks,)).float()
# randomly generate training masks on user nodes and click edges
hetero_graph.nodes['user'].data['train_mask'] = torch.zeros(n_users, dtype=torch.bool).bernoulli(0.6)
hetero_graph.edges['click'].data['train_mask'] = torch.zeros(n_clicks, dtype=torch.bool).bernoulli(0.6)

Using any library (**torch_geometric**, **DGL**, **StellarGraph**), build a neural network and train it to solve the **Node Classification** task on one of the two datasets mentioned above.  

---

### Bonuses:  
- **(+3 points)** Train the neural network to solve the **Link Prediction** task.  
- **(+2 point)** Use **another heterogeneous dataset** (not small and not synthetic) and train the model on it.  
- **(+5 points)** Implement **Relational GCN (R-GCN)** yourself and demonstrate the functionality of your layer.