<a href="https://colab.research.google.com/github/giuseppefutia/ml/blob/master/gcn_numpy_vs_pytorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Understanding Graph Convolutional Networks using Numpy and Pytorch

In this tutorial I provide some basic concepts on neural networks and I illustrate the graph convolutional network architecture, providing examples using the computational library numpy and the deep learning framework pytorch.

Exploring the numpy code, you are able to fully understand the computations and the algorithm (for instance the backpropagation), while in the Pytorch code you use training algorithms and optimizers as a black box.

I try to provide some autoconsistent examples: in this way you can run each block code autonomusly.

## Two-layered Network
In this first example, we see how to train a shallow neural network with numpy and pytorch.

This example is written starting from the code published at the following link: https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

### Numpy implementation


In [7]:
import numpy as np

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

# Define the learning rate 
learning_rate = 1e-6

# Loop for the number of epoch
for t in range(500):
    
    # Forward pass: compute predicted y
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    # Compute the Mean Squared Error (MSE) and print the loss.
    loss = np.square(y_pred - y).sum()
    
    if (t % 100) == 0:
      print('epoch:' + str(t), 'loss:' + str(loss))

    # Backprop to compute gradients of w1 and w2 with respect to loss
    # Remember that backprop is performed applying the chain rule of derivatives
    
    # Let's analyze each step of the backprop
    
    # This the derivative of the MSE function
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)

    # Update weights
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

epoch:0 loss:28426790.467448108
epoch:100 loss:573.230085711877
epoch:200 loss:6.783543314221504
epoch:300 loss:0.135135595572394
epoch:400 loss:0.002993381668289992


### Pytorch Implementation

In this example we adopt pytorch tensors for input, output data, and weights.

Here, we manually perform forward and backward.

In [8]:
import torch


dtype = torch.float
device = torch.device("cpu")

if torch.cuda.is_available(): # Try to use GPU if available
  device = torch.device('cuda')

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)

# Define the learning rate
learning_rate = 1e-6

for t in range(500):
  
    # Forward pass: compute predicted y
    # mm() performs a matrix multiplication of x and w1
    h = x.mm(w1)
    # clam() allows you to construct the relu. The output will be the max betwwen 0 and the number
    h_relu = h.clamp(min=0)
    y_pred = h_relu.mm(w2)

    # Compute and print loss
    # Using item, I can get the effective value of loss (otherwise I will get a tensor)
    loss = (y_pred - y).pow(2).sum().item()
    
    if (t % 100) == 0:
      print('epoch:' + str(t), 'loss:' + str(loss))

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

    # Update weights using gradient descent
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

epoch:0 loss:29190852.0
epoch:100 loss:306.8869934082031
epoch:200 loss:1.499150037765503
epoch:300 loss:0.015107796527445316
epoch:400 loss:0.00039342930540442467


In this example we do not manually perform forward and backward propagation, but we exploit the autograd feature of Pytorch in order to automate both the steps.

With the autograd function, a computational graph will be built during the forward pass. For each computation a derivative will be generated, that will be used during the backprop.

In [10]:
import torch

dtype = torch.float
device = torch.device("cpu")
if torch.cuda.is_available(): # Try to use GPU if available
  device = torch.device('cuda')

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Tensors during the backward pass.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

# Define the learning rate
learning_rate = 1e-6

for t in range(500):
    # Forward pass: compute predicted y using operations on Tensors; these
    # are exactly the same operations we used to compute the forward pass using
    # Tensors, but we do not need to keep references to intermediate values since
    # we are not implementing the backward pass by hand.
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # Compute and print loss using operations on Tensors.
    # Now loss is a Tensor of shape (1,)
    # loss.item() gets the a scalar value held in the loss.
    loss = (y_pred - y).pow(2).sum()
    if (t % 100) == 0:
      print('epoch:' + str(t), 'loss:' + str(loss.item()))

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call w1.grad and w2.grad will be Tensors holding the gradient
    # of the loss with respect to w1 and w2 respectively.
    loss.backward()

    # Manually update weights using gradient descent. Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this
    # in autograd.
    # An alternative way is to operate on weight.data and weight.grad.data.
    # Recall that tensor.data gives a tensor that shares the storage with
    # tensor, but doesn't track history.
    # You can also use torch.optim.SGD to achieve this.
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after updating weights
        w1.grad.zero_()
        w2.grad.zero_()

epoch:0 loss:31386812.0
epoch:100 loss:309.37091064453125
epoch:200 loss:0.6816291809082031
epoch:300 loss:0.0024171480908989906
epoch:400 loss:9.141815098701045e-05
