<a href="https://colab.research.google.com/github/almazav/Masters-Project/blob/main/GNN_course5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Graph Embedding methods 3
Downsides from DeepWalk and Node2Vec:
* Computationla Expensive
* Features are not considered
* Cannot predivt embedding for unseen data.<br>
Convolution gets rid of these problems

## Simple Graph Convolution SGC
A simple method is to average the nodes features of neighbour nodes. Lets assume a directional graph with node $a$ having an edge with node $b$ then the average is: $v_a^{(1)} = (v_a^{(0)} + v_b^{(0)})/2$ with the subscript representing the iteration. in general<br>$v_i^{(1)} = ∑_{all neighbour nodes} v_j^{(0)}/(d_i + 1)$<br> In matrix form<br> $V^{(1)} = (D + I)^{-1}(A+I)V^{(0)}$<br>Where $d_i$ is number of neighbour nodes, $D$ is the degree matrix, $A$ is the adjecency matrix, $I$ is the identity matrix and $V= [v_0......v_N]$<br>In oreder to define a class-prediction is possible to apply a function with a learnable parameter as follos> <br>$Y = softmax(V^{(k)},𝚯)$, where $𝚯$ is a learnable parameter by taining set.<br> In sumary $V^{k}$ is the final feature vector of each node in the em,bedding space, and $Y$ is a node class-prediction.<br> Downsides of this method
* not weighted neighbor nodes
* is average the correct operation?



# SGC Example
https://github.com/pyg-team/pytorch_geometric/blob/master/examples/sgc.py

In [20]:
import torch
# def format_pytorch_version(version):
#   return version.split('+')[0]

# TORCH_version = torch.__version__
# TORCH = format_pytorch_version(TORCH_version)

# def format_cuda_version(version):
#   return 'cu' + version.replace('.', '')

# CUDA_version = torch.version.cuda
# CUDA = format_cuda_version(CUDA_version)

# !pip install torch-scatter     -f https://pytorch-geometric.com/whl/torch-{TORCH}+{CUDA}.html
# !pip install torch-sparse      -f https://pytorch-geometric.com/whl/torch-{TORCH}+{CUDA}.html
# !pip install torch-cluster     -f https://pytorch-geometric.com/whl/torch-{TORCH}+{CUDA}.html
# !pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-{TORCH}+{CUDA}.html
# !pip install torch-geometric 

from torch_geometric.datasets import Planetoid #"Cora"
from torch_geometric.nn import SGConv
from sklearn.manifold import TSNE
import torch.nn.functional as F
import matplotlib.pyplot as plt


In [2]:
#get data
path = "C:/home/maza/Desktop"   # directory
dataset = Planetoid(path,"Cora")

Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.x
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.tx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.allx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.y
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ty
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ally
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.graph
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.test.index
Processing...
Done!


In [3]:
data = dataset[0]   #tensro representation of planetiod cora 
#x = nodes,features, y = labels, edge_index = edge list,  mask = split fro training
data

Data(x=[2708, 1433], edge_index=[2, 10556], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708])

In [7]:
#construct the model

SGC_model = SGConv(in_channels=data.num_features,#Number of features,
                   out_channels=dataset.num_classes, #Dimension of embedding,
                   K=1,cached=True)

In [9]:
#get embedding
print(f'Shape of the original data: {data.x.shape}')
print(f'Shape of the embedding data: {SGC_model(data.x,data.edge_index).shape}')

Shape of the original data: torch.Size([2708, 1433])
Shape of the embedding data: torch.Size([2708, 7])


In [14]:
# construct the model for classification
class SGCNet(torch.nn.Module):
  def __init__(self):
        super().__init__()
        self.conv1 = SGConv(in_channels=data.num_features,#Number of features,
                   out_channels=dataset.num_classes, #Dimension of embedding,
                   K=1,cached=True)

  def forward(self):
      
      x = self.conv1(data.x, data.edge_index) # applying convolution to data 
      # computation of log softmax
      return F.log_softmax(x, dim=1)

In [15]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
SGC_model, data = SGCNet().to(device), data.to(device)
optimizer = torch.optim.Adam(SGC_model.parameters(), lr=0.2, weight_decay=0.005) # we dont need batches 

In [16]:
#what are the learning parameters
for i, parameter in SGC_model.named_parameters():
  print(f'Paramater: {i}')
  print(f'Shape: {parameter.shape}')


Paramater: conv1.lin.weight
Shape: torch.Size([7, 1433])
Paramater: conv1.lin.bias
Shape: torch.Size([7])


In [17]:
# train function
def train():
  SGC_model.train() #set the training model to be true 
  optimizer.zero_grad() #reset the gradient
  predicted_y = SGC_model() # predivted y in log softmax prob
  true_y = data.y #true labels
  losses = F.nll_loss(predicted_y[data.train_mask], true_y[data.train_mask])
  losses.backward()
  optimizer.step()  # update the parameters sucvh that it minimize the losses
  

In [18]:
#test function
def test():
    SGC_model.eval()   # set the model .training false
    logits= SGC_model()  #log prob of all data 
    accs=  []  
    for _, mask in data('train_mask', 'val_mask', 'test_mask'):
        pred = logits[mask].max(1)[1]  #transform the log prob to true label
        acc = pred.eq(data.y[mask]).sum().item() / mask.sum().item()
        accs.append(acc)
    return accs

In [19]:
#put everthing together
best_val_acc = test_acc = 0
for epoch in range(1, 101):
    train()
    train_acc, val_acc, tmp_test_acc = test()
    if val_acc > best_val_acc:
        best_val_acc = val_acc
        test_acc = tmp_test_acc
    print(f'Epoch: {epoch:03d}, Train: {train_acc:.4f}, '
          f'Val: {best_val_acc:.4f}, Test: {test_acc:.4f}')

Epoch: 001, Train: 0.9571, Val: 0.5020, Test: 0.5540
Epoch: 002, Train: 0.9929, Val: 0.7340, Test: 0.7610
Epoch: 003, Train: 0.9929, Val: 0.7340, Test: 0.7610
Epoch: 004, Train: 1.0000, Val: 0.7340, Test: 0.7610
Epoch: 005, Train: 0.9857, Val: 0.7340, Test: 0.7610
Epoch: 006, Train: 0.9857, Val: 0.7340, Test: 0.7610
Epoch: 007, Train: 1.0000, Val: 0.7340, Test: 0.7610
Epoch: 008, Train: 1.0000, Val: 0.7340, Test: 0.7610
Epoch: 009, Train: 1.0000, Val: 0.7340, Test: 0.7610
Epoch: 010, Train: 1.0000, Val: 0.7380, Test: 0.7770
Epoch: 011, Train: 0.9929, Val: 0.7380, Test: 0.7770
Epoch: 012, Train: 0.9929, Val: 0.7440, Test: 0.7740
Epoch: 013, Train: 1.0000, Val: 0.7600, Test: 0.7740
Epoch: 014, Train: 1.0000, Val: 0.7600, Test: 0.7740
Epoch: 015, Train: 1.0000, Val: 0.7600, Test: 0.7740
Epoch: 016, Train: 1.0000, Val: 0.7600, Test: 0.7740
Epoch: 017, Train: 1.0000, Val: 0.7600, Test: 0.7740
Epoch: 018, Train: 1.0000, Val: 0.7600, Test: 0.7740
Epoch: 019, Train: 1.0000, Val: 0.7600, Test: 

The SGC Model achieved an accuracy of 0.79 after 42 epochs, 7% better than the node2vec and considerable faster 