<a href="https://colab.research.google.com/github/AchrafAsh/gnn-linear-receptive-fields/blob/main/bottlenecks_of_gnn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## TODO
- [x] implement over-smoothing metrics (MAD, MADGap)
- [x] load benchmark datasets (Cora, CiteSeer, QM9, Amazon, Reddit, PPI, ENZYMES, etc)
- [x] implement benchmark model:
    - [x] Vanilla GCN (built-in: GCNConv)
    - [x] GAT (built-in: GATConv)
    - [x] GIN (built-in: GINConv)
- [ ] Implement the different approaches
    - [x] JK-Net (built-in: JumpingKnowledge)
    - [ ] AdaGCN (code not public yet, so might have to do it myself from [this](https://github.com/datake/AdaGCN))
    - [ ] N-GCN [(official / tensorflow implementation)](https://github.com/samihaija/mixhop), [(weird implementation)](https://github.com/benedekrozemberczki/MixHop-and-N-GCN)
    - [ ] AdaEdge: [algo implementation](https://github.com/zhao-tong/GAug/blob/21af8b6bd054a484f17b1c431cc70efdfdbcefcb/models/adaedge.py#L14)

Observe the different insights:
- [ ] Correlation between MADGap and Accuracy and decrease of MAD (and MADGap) passing through layers
- [ ] Increase in accuracy by adding one Fully-Adjacent layer


# Initialization

## Import
Import needed libraries

In [1]:
import os, sys
import os.path as osp
from google.colab import drive
drive.mount('/content/mnt')
nb_path = '/content/notebooks'
os.symlink('/content/mnt/My Drive/Colab Notebooks', nb_path)
sys.path.insert(0, nb_path)  # or append(nb_path)

Mounted at /content/mnt


In [2]:
import networkx as nx
import torch
import torch_geometric as tg
from torch_geometric.datasets import Planetoid, TUDataset

In [3]:
!wget https://raw.githubusercontent.com/AchrafAsh/gnn-linear-receptive-fields/main/utils.py
!wget https://raw.githubusercontent.com/AchrafAsh/gnn-linear-receptive-fields/main/data.py

Continuing in background, pid 294.
Output will be written to ‘wget-log’.


In [None]:
from utils import mean_average_distance, mean_average_distance_gap
from data import load_dataset

## Load Data
- load different datasets to play with

In [None]:
G_karate = nx.karate_club_graph()

In [None]:
path = osp.join(os.getcwd(), 'data')
cora_dataset = load_dataset(path, 'Cora')
enzymes_dataset = load_dataset(path, 'ENZYMES')
qm9_dataset = load_dataset(path, 'QM9')

Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.x
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.tx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.allx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.y
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ty
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ally
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.graph
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.test.index
Processing...
Done!


In [None]:
G = cora_dataset[0]
print(G.x)
print(G.edge_index)
print(G.y)
print(G['train_mask'])
print(G)

tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])
tensor([[   0,    0,    0,  ..., 2707, 2707, 2707],
        [ 633, 1862, 2582,  ...,  598, 1473, 2706]])
tensor([3, 4, 4,  ..., 3, 3, 3])
tensor([ True,  True,  True,  ..., False, False, False])
Data(edge_index=[2, 10556], test_mask=[2708], train_mask=[2708], val_mask=[2708], x=[2708, 1433], y=[2708])


# Observing Over-smoothing

Metrics for over-smoothing:
- Mean Average Distance
- Mean Average Distance Gap

## TODO: improve efficacy of these two functions as they take a lot of time to execute

## Initial MAD and MADGap

In [None]:
MAD_cora = mean_average_distance(x=G.x)
MADGap_cora = mean_average_distance_gap(x=G.x, adj_matrix=tg.utils.to_dense_adj(G.edge_index)[0])

print(f'Initial MAD for Cora: {MAD_cora}')
print(f'Initial MADGap for Cora: {MADGap_cora}')

tensor(2.1652)

In [None]:
MAD_karate = mean_average_distance(x=torch.tensor(nx.attr_matrix(G_karate)[0]))

MADGap_cora = mean_average_distance_gap(x=torch.tensor(nx.attr_matrix(G_karate)[0]),
                          adj_matrix=torch.tensor(nx.adjacency_matrix(G_karate).todense()))

print(f'Initial MAD for the Karate Club Graph: {MAD_karate}')
print(f'Initial MADGap for the Karate Club Graph: {MADGap_karate}')

tensor(2.3117, dtype=torch.float64)

## Evolution of MAD and MADGap

In [5]:
from torch_geometric.nn import GCNConv, JumpingKnowledge

In [8]:
print(JumpingKnowledge)

<class 'torch_geometric.nn.models.jumping_knowledge.JumpingKnowledge'>


# Bottleneck of GNNs