# Graph Neural Networks

## Why GNNs?

Answer briefly:

1. Consider the given list of limitations of shallow encoders like random walk based node embedding methods. How do GNNs solve them?
    - Poor scalability ($|V|d$ parameters are needed, $d$: dimension of the embedding space)
    - Transductive nature (Cannot obtain embeddings for nodes not in the training set.)
    - Cannot capture structural similarity
    - Cannot utilize node, edge, and graph features

1. Given an undirected graph $G=(V,E)$, let's flatten the adjacency matrix $A$ (i.e., concatenate the rows into a single vector) and feed it to a Multi-Layer Perceptron (MLP). What's wrong with this approach?

1. In a CNN, a convolutional kernel aggregates information from local pixel neighborhoods. What would "locality" mean on a graph? How could we define a convolution operation that respects this locality?

## Permutation Invariance and Equivariance

1. Consider a graph with three node labels A, B, C and the adjacency matrix:
    $$
        A = \begin{bmatrix}
        0 & 1 & 1\\
        1 & 0 & 0\\
        1 & 0 & 0
        \end{bmatrix}
    $$

    Suppose we feed this graph into a GNN layer defined as:
    $$
        h_v^{(1)} = \sigma (W \cdot \text{AGG}(\{h_u^{(0)}: u \in N(v)\}))
    $$

    where AGG is the **sum** function.

    If we permute the node order to (C, A, B), will the node embeddings change after the layer? Why or why not?

2. Let $A \in \{0,1\}^{n \times n}$ be the adjacency matrix and $X \in \mathbb{R}^{n \times d}$ be the node feature matrix. Let $P$ be an $n \times n$ permutation matrix (it reorders node indices). Permutation of the graph means:
    $$
        A^\prime = PAP^\intercal, \quad X^\prime=PX
    $$

    For each function $f(A,X)$ below, determine wheteher it is:
    - Permutation invariant: $f(A^\prime ,X^\prime) = f(A,X)$
    - Permutation equivariant: $f(A^\prime ,X^\prime) = Pf(A,X)$
    - Neither

    | Function                             | Inv./Equiv./Neither |
    |--------------------------------------|---------------------|
    | $f(A,X) = 1^\intercal X$             |                     |
    | $f(A,X) = X$                         |                     |
    | $f(A,X) = AX$                        |                     |
    | $f(A,X) = A^\intercal X$             |                     |
    | $f(A,X) = \text{ReLU}(AXW)$          |                     |
    | $f(A,X) = \frac{1}{n} 1^\intercal X$ |                     |
    | $f(A,X) = X^\intercal X$             |                     |
    | $f(A,X) = XW$                        |                     |
    | $f(A,X) = A_{1,:}X$                  |                     |
    | $f(A,X) = \text{sort}(X)$            |                     |

## One GNN Layer

Consider the following simple undirected graph $G=(V,E)$:
$$
    V=\{ 1,2,3 \}, \quad E=\{ \{1,2\},\{2,3\} \}
$$

The initial node feature matrix and the (unweighted) adjacency matrix are given as follows:
$$
    X = 
    \begin{bmatrix}
        1 & 0 \\
        0 & 1 \\
        1 & 1 
    \end{bmatrix}, \quad
    A = 
    \begin{bmatrix}
        0 & 1 & 0 \\
        1 & 0 & 1 \\
        0 & 1 & 0 
    \end{bmatrix}
$$

We apply one Graph Convolutional Network (GCN) layer as defined by Kipf & Welling (2017):
$$
    H = \sigma(\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}XW)
$$
where 

- $D$ is the degree matrix ($D=\text{diag}(1,2,1)$)
- $\tilde{A}=A+I$
- $\tilde{D}_{ii}=\sum_j \tilde{A}_{ij}$
- $\sigma(\cdot)$ is ReLU

and for simplicity, the weight matrix is 
$$
    W = 
    \begin{bmatrix}
        1 & 0 \\
        0 & 1 \\
    \end{bmatrix}
$$

Tasks:

1. Compute $\tilde{A}$ and $\tilde{D}$
1. Compute the normalized adjacency $\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}$
1. Multiply with $XW$
1. Apply the ReLU
1. Write down the resulting node embeddings $H_1,H_2,H_3$
1. Now compare the initial node features $X$ and the embeddings $H$. How did each node's location in the embedding space change and why?

## Programming: One GNN Layer with Torch

In this exercise, you'll implement the same task given in the previous example with `torch`. 

1. Complete the following code snippet and reproduce your result.

In [17]:
import torch

# initial features
X = torch.tensor([
    [1., 0.],  # Node 1
    [0., 1.],  # Node 2
    [1., 1.]   # Node 3
])

# toy graph: 1–2–3
A = torch.tensor([
    [0., 1., 0.],
    [1., 0., 1.],
    [0., 1., 0.]
])

# weight matrix (identity for simplicity)
W = torch.eye(2)

# Task 1: Compute A_hat and D_hat
A_hat = 0
D_hat = 0

# Task 2: Compute normalized adjacency
A_norm = 0

# Task 3-4: Multiply normalized adjacency with XW and apply ReLU
H = 0

print("Initial features X:\n", X)
print("\nNormalized adjacency A_norm:\n", A_norm)
print("\nEmbeddings H:\n", H)


Initial features X:
 tensor([[1., 0.],
        [0., 1.],
        [1., 1.]])

Normalized adjacency A_norm:
 0

Embeddings H:
 0


2. Now, re-apply the GCN layer 10 more times. What happens to the embeddings? Interpret your results.
