<a href="https://colab.research.google.com/github/cric96/DL-exercise/blob/main/test_gnn_aggregate.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
%matplotlib inline


`Learn the Basics <intro.html>`_ ||
`Quickstart <quickstart_tutorial.html>`_ || 
`Tensors <tensorqs_tutorial.html>`_ || 
`Datasets & DataLoaders <data_tutorial.html>`_ ||
`Transforms <transforms_tutorial.html>`_ ||
`Build Model <buildmodel_tutorial.html>`_ ||
`Autograd <autogradqs_tutorial.html>`_ ||
**Optimization** ||
`Save & Load Model <saveloadrun_tutorial.html>`_

Optimizing Model Parameters
===========================

Now that we have a model and data it's time to train, validate and test our model by optimizing its parameters on 
our data. Training a model is an iterative process; in each iteration (called an *epoch*) the model makes a guess about the output, calculates 
the error in its guess (*loss*), collects the derivatives of the error with respect to its parameters (as we saw in 
the `previous section  <autograd_tutorial.html>`_), and **optimizes** these parameters using gradient descent. For a more 
detailed walkthrough of this process, check out this video on `backpropagation from 3Blue1Brown <https://www.youtube.com/watch?v=tIeHLnjs5U8>`__.

Prerequisite Code 
-----------------
We load the code from the previous sections on `Datasets & DataLoaders <data_tutorial.html>`_ 
and `Build Model  <buildmodel_tutorial.html>`_.



In [3]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=26421880.0), HTML(value='')))


Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=29515.0), HTML(value='')))


Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=4422102.0), HTML(value='')))


Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=5148.0), HTML(value='')))


Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw



  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


In [124]:
### Simple number data

class FooDataset(Dataset):
  def __init__(self):
    self.elements = torch.tensor([[1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0]])
    self.labels = torch.tensor([[4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [10.0]])
  def __len__(self):
    return self.elements.shape[0]
  def __getitem__(self, idx):
    return self.elements[idx], self.labels[idx]

### Foo network

class FooNetwork(nn.Module):
  def __init__(self):
    super(FooNetwork, self).__init__()
    self.stack = nn.Sequential(
        nn.Linear(1, 5),
        nn.ReLU(),
        nn.Linear(5, 1),
        nn.ReLU()
    )
  def forward(self, x):
    first = self.stack(x)
    second = self.stack(first)
    third = self.stack(second)
    return first + second + third
  def single(self, x):
    return self.stack(x)
  def double(self, x):
    return self.stack(self.stack(x))
  def triple(self, x):
    return self.stack(self.stack(self.stack(x)))
  
foo = FooNetwork()

learning_rate = 1e-3
batch_size = 1
epochs = 5

loss_fn = nn.MSELoss()
optimizer = torch.optim.SGD(foo.parameters(), lr=learning_rate)

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):        
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)
        
        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 1 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

train_dataloader = DataLoader(FooDataset(), batch_size = 1)
epochs = 50
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, foo, loss_fn, optimizer)

print(foo.single(torch.tensor([20.0])))

print(foo.double(torch.tensor([20.0])))
print(foo.triple(torch.tensor([20.0])))
print(foo(torch.tensor([20.0])))
print("Done!")

Epoch 1
-------------------------------
None
loss: 15.591476  [    0/    7]
None
loss: 23.912933  [    1/    7]
None
loss: 33.702282  [    2/    7]
None
loss: 42.265957  [    3/    7]
None
loss: 47.036568  [    4/    7]
None
loss: 46.152336  [    5/    7]
None
loss: 34.856888  [    6/    7]
Epoch 2
-------------------------------
None
loss: 1.773775  [    0/    7]
None
loss: 1.901127  [    1/    7]
None
loss: 1.554969  [    2/    7]
None
loss: 0.821950  [    3/    7]
None
loss: 0.158647  [    4/    7]
None
loss: 0.000213  [    5/    7]
None
loss: 0.001372  [    6/    7]
Epoch 3
-------------------------------
None
loss: 0.065980  [    0/    7]
None
loss: 0.015697  [    1/    7]
None
loss: 0.000049  [    2/    7]
None
loss: 0.002387  [    3/    7]
None
loss: 0.002658  [    4/    7]
None
loss: 0.000811  [    5/    7]
None
loss: 0.000389  [    6/    7]
Epoch 4
-------------------------------
None
loss: 0.056449  [    0/    7]
None
loss: 0.012128  [    1/    7]
None
loss: 0.000242  [    2/



None
loss: 0.000031  [    6/    7]
Epoch 14
-------------------------------
None
loss: 0.018379  [    0/    7]
None
loss: 0.001425  [    1/    7]
None
loss: 0.001769  [    2/    7]
None
loss: 0.000733  [    3/    7]
None
loss: 0.000105  [    4/    7]
None
loss: 0.000011  [    5/    7]
None
loss: 0.000022  [    6/    7]
Epoch 15
-------------------------------
None
loss: 0.016726  [    0/    7]
None
loss: 0.001116  [    1/    7]
None
loss: 0.001868  [    2/    7]
None
loss: 0.000683  [    3/    7]
None
loss: 0.000066  [    4/    7]
None
loss: 0.000004  [    5/    7]
None
loss: 0.000015  [    6/    7]
Epoch 16
-------------------------------
None
loss: 0.015276  [    0/    7]
None
loss: 0.000867  [    1/    7]
None
loss: 0.001958  [    2/    7]
None
loss: 0.000638  [    3/    7]
None
loss: 0.000038  [    4/    7]
None
loss: 0.000001  [    5/    7]
None
loss: 0.000010  [    6/    7]
Epoch 17
-------------------------------
None
loss: 0.014001  [    0/    7]
None
loss: 0.000667  [    1/   

In [252]:
if torch.cuda.is_available():  
  dev = "cpu" 
else:  
  dev = "cpu"  
device = torch.device(dev)  
class GraphData(Dataset):
  def __init__(self):
    self.elements = torch.tensor([[[[0, 0], [1, -1], [1, -1], [1, -1]], [[1, 2], [0, 2], [0, 1], [1, 2]]]], dtype = torch.float32)
    self.labels = torch.tensor([[[0, 0], [1, 1], [1, 1], [1, 2]]], dtype = torch.float32)
  def __len__(self):
    return 1
  def __getitem__(self, idx):
    return self.elements[0], self.labels[0]

class GraphNetwork(nn.Module):
  def __init__(self):
    super(GraphNetwork, self).__init__()
    self.iteration = 10
    self.neighbour = nn.Sequential(
        nn.Linear(2, 5),
        nn.ReLU(),
        nn.Linear(5, 3),
        nn.ReLU()
    )
    self.aggregate = nn.Sequential(
        nn.Linear(5, 3),
        nn.ReLU(),
        nn.Linear(3, 1),
        nn.ReLU() 
    )
  def forward(self, data):
    X = data[0]
    nodes = X[0].to(device)
    edges = X[1].to(device)
    nodes_size = nodes.shape[0]
    new = torch.clone(nodes)
    indexes = torch.range(0, nodes_size - 1)
    i = 0
    result = torch.clone(nodes)

    for iter in range(2):
      for indx in indexes:
        i = int(indx.item())
        neighborhood = edges[i]
        zeros = torch.zeros(3)
        for neigh in neighborhood:
          eval = nodes[int(neigh.item())]
          zeros = zeros + self.neighbour(eval)
        input = torch.cat((torch.tensor(nodes[i]), zeros), dim = 0)
        newState = self.aggregate(input)
        newState = torch.cat((nodes[i][:1], newState))
        result[i] = newState
      nodes = torch.clone(result)
    """
    copy = torch.clone(result)

    for indx in indexes:
      i = int(indx.item())
      neighborhood = edges[i]
      zeros = torch.zeros(3)
      for neigh in neighborhood:
        eval = copy[int(neigh.item())]
        zeros = zeros + self.neighbour(eval)
      input = torch.cat((torch.tensor(copy[i]), zeros), dim = 0)
      newState = self.aggregate(input)
      newState = torch.cat((copy[i][:1], newState))
      copy[i] = newState"""
    return result
    
data = GraphData()
net = GraphNetwork().to(device)

learning_rate = 1e-3
batch_size = 64
epochs = 5
loss_fn = nn.MSELoss()

optimizer = torch.optim.SGD(net.parameters(), lr=learning_rate)

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)
        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if batch % 1000 == 0:
            loss, current = loss.item(), batch * len(X)
            #print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")
        return loss

train_dataloader = DataLoader(GraphData(), batch_size = 1)
epochs = 20000
net.zero_grad()
for t in range(epochs):
    # print(f"Epoch {t+1}\n-------------------------------")
    loss = train_loop(train_dataloader, net, loss_fn, optimizer)
    if t % 100 == 0:
      print(f"loss: {loss:>7f} ")
print(net(data[0]))

  return F.mse_loss(input, target, reduction=self.reduction)


loss: 0.406145 
loss: 0.335210 
loss: 0.283175 
loss: 0.247709 
loss: 0.225069 
loss: 0.211112 
loss: 0.202327 
loss: 0.196277 
loss: 0.191566 
loss: 0.187445 
loss: 0.183551 
loss: 0.179729 
loss: 0.175910 
loss: 0.172069 
loss: 0.168197 
loss: 0.164289 
loss: 0.160502 
loss: 0.158134 
loss: 0.155788 
loss: 0.153453 
loss: 0.151128 
loss: 0.148812 
loss: 0.146517 
loss: 0.144230 
loss: 0.141970 
loss: 0.139732 
loss: 0.137514 
loss: 0.135336 
loss: 0.133179 
loss: 0.131068 
loss: 0.128986 
loss: 0.126945 
loss: 0.124952 
loss: 0.123004 
loss: 0.121090 
loss: 0.119242 
loss: 0.117441 
loss: 0.115686 
loss: 0.113979 
loss: 0.112345 
loss: 0.110752 
loss: 0.109217 
loss: 0.107738 
loss: 0.106306 
loss: 0.104920 
loss: 0.103589 
loss: 0.102298 
loss: 0.101067 
loss: 0.099888 
loss: 0.098751 
loss: 0.097664 
loss: 0.096649 
loss: 0.095669 
loss: 0.094728 
loss: 0.093852 
loss: 0.093002 
loss: 0.092180 
loss: 0.091397 
loss: 0.090630 
loss: 0.089911 
loss: 0.089199 
loss: 0.088522 
loss: 0.

Hyperparameters 
-----------------

Hyperparameters are adjustable parameters that let you control the model optimization process. 
Different hyperparameter values can impact model training and convergence rates 
(`read more <https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html>`__ about hyperparameter tuning)

We define the following hyperparameters for training:
 - **Number of Epochs** - the number times to iterate over the dataset
 - **Batch Size** - the number of data samples propagated through the network before the parameters are updated
 - **Learning Rate** - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.




In [None]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

Optimization Loop
-----------------

Once we set our hyperparameters, we can then train and optimize our model with an optimization loop. Each 
iteration of the optimization loop is called an **epoch**. 

Each epoch consists of two main parts:
 - **The Train Loop** - iterate over the training dataset and try to converge to optimal parameters.
 - **The Validation/Test Loop** - iterate over the test dataset to check if model performance is improving.

Let's briefly familiarize ourselves with some of the concepts used in the training loop. Jump ahead to 
see the `full-impl-label` of the optimization loop.

Loss Function
~~~~~~~~~~~~~~~~~

When presented with some training data, our untrained network is likely not to give the correct 
answer. **Loss function** measures the degree of dissimilarity of obtained result to the target value, 
and it is the loss function that we want to minimize during training. To calculate the loss we make a 
prediction using the inputs of our given data sample and compare it against the true data label value.

Common loss functions include `nn.MSELoss <https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss>`_ (Mean Square Error) for regression tasks, and 
`nn.NLLLoss <https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss>`_ (Negative Log Likelihood) for classification. 
`nn.CrossEntropyLoss <https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss>`_ combines ``nn.LogSoftmax`` and ``nn.NLLLoss``.

We pass our model's output logits to ``nn.CrossEntropyLoss``, which will normalize the logits and compute the prediction error.



In [None]:
# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()

Optimizer
~~~~~~~~~~~~~~~~~

Optimization is the process of adjusting model parameters to reduce model error in each training step. **Optimization algorithms** define how this process is performed (in this example we use Stochastic Gradient Descent).
All optimization logic is encapsulated in  the ``optimizer`` object. Here, we use the SGD optimizer; additionally, there are many `different optimizers <https://pytorch.org/docs/stable/optim.html>`_ 
available in PyTorch such as ADAM and RMSProp, that work better for different kinds of models and data.

We initialize the optimizer by registering the model's parameters that need to be trained, and passing in the learning rate hyperparameter.



In [None]:
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Inside the training loop, optimization happens in three steps:
 * Call ``optimizer.zero_grad()`` to reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration.
 * Backpropagate the prediction loss with a call to ``loss.backwards()``. PyTorch deposits the gradients of the loss w.r.t. each parameter. 
 * Once we have our gradients, we call ``optimizer.step()`` to adjust the parameters by the gradients collected in the backward pass.  




Full Implementation
-----------------------
We define ``train_loop`` that loops over our optimization code, and ``test_loop`` that 
evaluates the model's performance against our test data.



In [None]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):        
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)
        
        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
            
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

We initialize the loss function and optimizer, and pass it to ``train_loop`` and ``test_loop``.
Feel free to increase the number of epochs to track the model's improving performance.



In [None]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.307807  [    0/60000]
loss: 2.301522  [ 6400/60000]
loss: 2.286419  [12800/60000]
loss: 2.276046  [19200/60000]
loss: 2.287521  [25600/60000]
loss: 2.252595  [32000/60000]
loss: 2.269362  [38400/60000]
loss: 2.253034  [44800/60000]
loss: 2.235528  [51200/60000]
loss: 2.217758  [57600/60000]
Test Error: 
 Accuracy: 31.5%, Avg loss: 2.227662 

Epoch 2
-------------------------------
loss: 2.227663  [    0/60000]
loss: 2.238604  [ 6400/60000]
loss: 2.191649  [12800/60000]
loss: 2.191515  [19200/60000]
loss: 2.213330  [25600/60000]
loss: 2.138905  [32000/60000]
loss: 2.176790  [38400/60000]
loss: 2.141481  [44800/60000]
loss: 2.122778  [51200/60000]
loss: 2.074745  [57600/60000]
Test Error: 
 Accuracy: 41.9%, Avg loss: 2.099309 

Epoch 3
-------------------------------
loss: 2.099908  [    0/60000]
loss: 2.108566  [ 6400/60000]
loss: 2.018093  [12800/60000]
loss: 2.045220  [19200/60000]
loss: 2.067481  [25600/60000]
loss: 1.947030  [32000/600

Further Reading
-----------------------
- `Loss Functions <https://pytorch.org/docs/stable/nn.html#loss-functions>`_
- `torch.optim <https://pytorch.org/docs/stable/optim.html>`_
- `Warmstart Training a Model <https://pytorch.org/tutorials/recipes/recipes/warmstarting_model_using_parameters_from_a_different_model.html>`_


