<a href="https://colab.research.google.com/github/g4aidl-upc-winter-2020/3D-Shape-classification/blob/main/Train_GNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Install all needed packages from PyG:
!pip install -q torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
!pip install -q torch-sparse -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
!pip install -q torch-cluster -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
!pip install -q torch-geometric

[K     |████████████████████████████████| 2.6MB 14.1MB/s 
[K     |████████████████████████████████| 1.5MB 24.5MB/s 
[K     |████████████████████████████████| 1.0MB 6.8MB/s 
[K     |████████████████████████████████| 194kB 17.9MB/s 
[K     |████████████████████████████████| 235kB 45.6MB/s 
[K     |████████████████████████████████| 2.2MB 49.9MB/s 
[K     |████████████████████████████████| 51kB 8.9MB/s 
[?25h  Building wheel for torch-geometric (setup.py) ... [?25l[?25hdone


In [2]:
import os
import sys
import torch
from torch_geometric.datasets import ModelNet
from torch_geometric.data import DataLoader
from torch_geometric.utils import to_dense_batch
import torch_geometric.transforms as T
from torch_geometric.transforms import SamplePoints, KNNGraph, NormalizeScale, Compose, RandomRotate, RandomFlip, RandomScale
import torch.nn as nn
import torch.nn.functional as F
import datetime
from time import time
from tensorboard import notebook
from torch.utils.tensorboard import SummaryWriter
%load_ext tensorboard

import matplotlib.pyplot as plt
import numpy as np

##Import drive folder

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Set a fixed seed

In [4]:
seed = 42

#Controlling sources of randomness
torch.manual_seed(seed)  #Sets the seed for generating random numbers for all devices (both CPU and CUDA)

#Random number generators in other libraries
np.random.seed(seed)

#CUDA convolution benchmarking
torch.backends.cudnn.benchmark = False #ensures that CUDA selects the same algorithm each time an application is run

## Hyper-parameters

In [5]:
learning_rate = 0.001  #0.01
train_batch_size = 32
val_batch_size = 32
num_epochs = 20
graph_type = 'GAT'        #'GAT', 'GCN'

## Instantiate Tensorboard Writer

### Create log folders

In [6]:
root='/content/drive/MyDrive/Proyecto/Colabs/logs_GNN'
if graph_type == 'GCN':
  train_logdir = os.path.join(root, "GCN", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"), 'train')
  val_logdir = os.path.join(root, "GCN", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"), 'validation')
else:
  train_logdir = os.path.join(root, "GAT", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"), 'train')
  val_logdir = os.path.join(root, "GAT", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"), 'validation')

### Create summary writer

In [7]:
train_writer = SummaryWriter(log_dir=train_logdir)
val_writer = SummaryWriter(log_dir=val_logdir)

# Graph Classification with Graph Neural Networks

Graph classification refers to the problem of classifiying entire graphs, given a **dataset of graphs**, based on some structural graph properties.
Here, we want to embed entire graphs, and we want to embed those graphs in such a way so that they are linearly separable given a task at hand.

## Dataset

In [8]:
# Import ModelNet10 dataset from PyTorch Geometric
dataset = ModelNet(root='/content/drive/MyDrive/Proyecto/Colabs/ModelNet', name= "10", pre_transform=T.SamplePoints(num=1024)) #train dataset

In [9]:
#We load from two txt files the indices of training and validation data 

train = open("/content/drive/MyDrive/Proyecto/Colabs/train_split.txt", 'r')
train_idx = []
for idx in train:
  train_idx.append(int(idx))

val = open("/content/drive/MyDrive/Proyecto/Colabs/val_split.txt", 'r')
val_idx = []
for idx in val:
  val_idx.append(int(idx))

#print("Train indices: ", train_idx)
#print("Val indices: ", val_idx)

In [10]:
train_dataset = dataset[train_idx]   #train data
val_dataset = dataset[val_idx]       #val data

print('Datasets info:')
print('--------------')
print('Train dataset size: ', len(train_dataset))
print('Validation dataset size: ', len(val_dataset))
print('Number of classes: ', dataset.num_classes) 

Datasets info:
--------------
Train dataset size:  3193
Validation dataset size:  798
Number of classes:  10


### Graph Generation and Data Transformation

In [11]:
#transformations: RandomFlip(1, p=0.5), RandomRotate(45.0, axis=0))

train_dataset.transform = Compose([NormalizeScale(), KNNGraph(k=9, loop=True, force_undirected=True)]) #Creates a k-NN graph based on node positions. (undirected graph)
val_dataset.transform = Compose([NormalizeScale(), KNNGraph(k=9, loop=True, force_undirected=True)])

print(f'Dataset: {dataset}:')
print('====================')
print(f'Number of training graphs: {len(train_dataset)}')
print(f'Number of validation graphs: {len(val_dataset)}')
print(f'Number of features: {dataset.num_features}')
print(f'Number of classes: {dataset.num_classes}')

Dataset: ModelNet10(3991):
Number of training graphs: 3193
Number of validation graphs: 798
Number of features: 0
Number of classes: 10


This dataset provides **3991 different graphs**, and the task is to classify each graph into **one out of ten classes**.

In [12]:
data = train_dataset[0] # Get the first training graph object.
print(data)
print('=============================================================')

# Gather some statistics about the previous graph.
print(f'Number of training nodes: {data.num_nodes}')
print(f'Number of training edges: {data.num_edges}')
print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}')
print(f'Contains isolated nodes: {data.contains_isolated_nodes()}')
print(f'Contains self-loops: {data.contains_self_loops()}')
print(f'Is directed: {data.is_directed()}')


Data(edge_index=[2, 10652], pos=[1024, 3], y=[1])
Number of training nodes: 1024
Number of training edges: 10652
Average node degree: 10.40
Contains isolated nodes: False
Contains self-loops: True
Is directed: False


By inspecting the first graph object of the train dataset, we can see that it comes with **1024 nodes (with 3-dimensional spatial vectors)** and **10652 edges** (leading to an average node degree of 10.40). It also comes with exactly **one graph label** (`y=[1]`).

Let's visualize the graph:

In [13]:
sys.path.append('/content/drive/MyDrive/Proyecto/Colabs')
from Visualization_Functions import plotPC3D, plotPC, plotGraph3D

In [14]:
data = train_dataset[0]
plotGraph3D(data)


Nodes: 1024 	Edges: 10652 	Edges/Node avg: 10.40
Is Directed:  False  (if False, edges defined both ways)


## Training

### Make sure your runtime has a GPU

In [15]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
assert not device.type == 'cpu', "Change Runtime Type -> GPU"

### Loading the model architecture

In [16]:
if graph_type == 'GCN':
  #We include a new file path that will point to modules we want to import
  sys.path.append('/content/drive/MyDrive/Proyecto/Colabs/Architectures/GCN')  

  ## Available architectures
  #from GCN_Architecture_BatchNorm_AVGpool import GCN
  #from GCN_Architecture_BatchNorm_MAXpool import GCN
  #from GCN_Architecture_BatchNorm_DoubleCapacity_MAXpool import GCN
  from GCN_Architecture_BatchNorm_DoubleCapacity_Dropout_MAXpool import GCN
  ##
  model = GCN()                     # instantiate the model
  
else:
  #We include a new file path that will point to modules we want to import
  sys.path.append('/content/drive/MyDrive/Proyecto/Colabs/Architectures/GAT')

  ## Available architectures
  #from GAT_Architecture_BatchNorm_4heads_MAXpool import GAT
  #from GAT_Architecture_BatchNorm_2heads_MAXpool import GAT
  from GAT_Architecture_BatchNorm_8heads_MAXpool import GAT
  #from GAT_Architecture_BatchNorm_8heads_MAXpool_AVG import GAT
  #from GAT_Architecture_BatchNorm_8heads_MAXpool_Dropout import GAT
  ##
  model = GAT()

model.to(device)    

GAT(
  (conv1): GATConv(3, 16, heads=8)
  (conv2): GATConv(128, 32, heads=8)
  (conv3): GATConv(256, 64, heads=8)
  (fc): Linear(in_features=512, out_features=10, bias=True)
  (bn1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (bn2): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (bn3): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)

#### Pararameters of the model

In [17]:
num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print('Number of parameters: %d' % num_params)

Number of parameters: 173834


### Accuracy function

In [18]:
def accuracy(output, target):
  pred = output.argmax(dim=1)  # get the index of the max log-probability
  return (pred == target).sum().item() / target.numel()  #return the mean accuracy in the batch

### Train function

In [19]:
def train_epoch(model, train_loader, optimizer, criterion, epoch, scheduler):  #Training function for one epoch
  
  model.train() #Activate the train=True flag inside the model
  losses = []
  accs = []

  lr = scheduler.get_last_lr()

  for batch_idx, data in enumerate(train_loader, 1):
      
      optimizer.zero_grad()  #setting all the gradient to zero
      
      output = model(data.pos.to(device), data.edge_index.to(device), data.batch.to(device))  #Pass the node feature representation (x), the COO graph connectivity representation (edge_index) and the batch vector
      
      loss = criterion(output.to(device), data.y.to(device))  #Calculate the loss in the batch
      
      loss.backward() #Backprop over the loss function through the network
      
      acc = 100 * accuracy(output.to(device), data.y.to(device))  #Calculate the mean accuracy in the batch

      losses.append(loss.item()) #save the loss value in a list of losses
      
      accs.append(acc) #save the accuracy value in a list of accuracies

      optimizer.step() #update parameters based on gradients.

      if batch_idx >= len(train_loader):
          print('Train Epoch: {} \tLR: {} \tAverage Loss: {:.4f}\tAverage Acc: {:.2f} %'.format(
              epoch, lr, np.mean(losses), np.mean(accs)))
          
          train_writer.add_scalar('Loss', np.mean(losses), epoch) #log training loss for one epoch to Tensorboard
          train_writer.add_scalar('Acc', np.mean(accs), epoch)    #log training accuracy for one epoch to Tensorboard

  return np.mean(losses), np.mean(accs)

### Validation function

In [20]:
def eval_epoch(model, val_loader, criterion, epoch):  #evaluation function after one epoch of training
  
  model.eval() #Activate the train=False flag inside the model
  eval_losses = []
  eval_accs = []
  with torch.no_grad():
    for data in val_loader:
      
      output = model(data.pos.to(device), data.edge_index.to(device), data.batch.to(device)) 
      
      eval_loss = criterion(output.to(device), data.y.to(device))
      eval_losses.append(eval_loss.item()) #save the loss value in a list of losses
      
      eval_acc = 100 * accuracy(output.to(device), data.y.to(device)) #Calculate the accuracy in the batch
      eval_accs.append(eval_acc) #save the accuracy value in a list of accuracies
    
    print('Val Epoch: {} \tAverage loss: {:.4f}\tAverage Acc: {:.2f} %'.format(
        epoch, np.mean(eval_losses), np.mean(eval_accs)))
    
    val_writer.add_scalar('Loss', np.mean(eval_losses), epoch)  #log validation loss for one epoch to Tensorboard
    val_writer.add_scalar('Acc',  np.mean(eval_accs), epoch)    #log validation accuracy for one epoch to Tensorboard

  return np.mean(eval_losses), np.mean(eval_accs)

## Dataloader

Let's pass the datasets through the DataLoader in order to obtain batches of samples:

In [21]:
train_loader = DataLoader(train_dataset, batch_size=train_batch_size, shuffle=True)  #train set
val_loader = DataLoader(val_dataset, batch_size=val_batch_size)                      #val set

## Optimizer

In [23]:
optimizer = torch.optim.Adam(model.parameters(), lr= learning_rate)

## Scheduler

In [24]:
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.5) #Learning rate is divided by 2 every 20 epochs

## Loss

In [25]:
criterion = nn.CrossEntropyLoss()  #useful to train a classification problem with N classes (softmax layer + NLLLoss)

### Train

In [26]:
def train_net(model, train_loader, val_loader, optimizer, criterion, num_epochs, scheduler): 
  """ Function that trains and evals a network for n epochs,
      showing the plot of losses and accs.
  """
  #with autograd.detect_anomaly():
  best_accuracy = 0.0
  for epoch in range(1, num_epochs + 1):
    tr_loss, tr_acc = train_epoch(model, train_loader, optimizer, criterion, epoch, scheduler)  #train the model
    val_loss, val_acc = eval_epoch(model, val_loader, criterion, epoch)              #eval the model 
    
    scheduler.step() #Step for LR decay

    if best_accuracy < val_acc:
      best_accuracy = val_acc
      torch.save(model.state_dict(), train_logdir + '/best_params.pt')  #save the model state for best val accuracy
  
  return best_accuracy  

In [27]:
best_accuracy = train_net(model, train_loader, val_loader, optimizer, criterion, num_epochs, scheduler)
print('Best validation accuracy = ', best_accuracy)

Train Epoch: 1 	LR: [0.001] 	Average Loss: 0.6961	Average Acc: 79.75 %
Val Epoch: 1 	Average loss: 0.4180	Average Acc: 86.85 %
Train Epoch: 2 	LR: [0.001] 	Average Loss: 0.3376	Average Acc: 89.13 %
Val Epoch: 2 	Average loss: 0.3564	Average Acc: 88.34 %
Train Epoch: 3 	LR: [0.001] 	Average Loss: 0.2878	Average Acc: 90.30 %
Val Epoch: 3 	Average loss: 0.2842	Average Acc: 90.72 %
Train Epoch: 4 	LR: [0.001] 	Average Loss: 0.2582	Average Acc: 91.42 %
Val Epoch: 4 	Average loss: 0.2882	Average Acc: 90.08 %
Train Epoch: 5 	LR: [0.001] 	Average Loss: 0.2466	Average Acc: 92.37 %
Val Epoch: 5 	Average loss: 0.2979	Average Acc: 90.09 %
Train Epoch: 6 	LR: [0.001] 	Average Loss: 0.2345	Average Acc: 91.89 %
Val Epoch: 6 	Average loss: 0.2814	Average Acc: 90.98 %
Train Epoch: 7 	LR: [0.001] 	Average Loss: 0.2033	Average Acc: 93.34 %
Val Epoch: 7 	Average loss: 0.2699	Average Acc: 90.97 %
Train Epoch: 8 	LR: [0.001] 	Average Loss: 0.2272	Average Acc: 92.16 %
Val Epoch: 8 	Average loss: 0.2747	Avera