## Dynamic Networks

To showcase the power of PyTorch's dynamic graphs, we will implement a fully-connected ReLU network <br>
that on each forward pass randomly chooses a number between 0 and 3 and has that many hidden layers, <br>
reusing the same weights multiple times to compute the innermost hidden layers.

Orginal content by Justin Johnson
https://github.com/jcjohnson/pytorch-examples/blob/master/nn/dynamic_net.py

In [1]:
import random
import torch
from torch.autograd import Variable

In [2]:
class DynamicNet(torch.nn.Module):
  def __init__(self, D_in, H, D_out):
    """
    In the constructor we construct three nn.Linear instances that we will use
    in the forward pass.
    """
    super(DynamicNet, self).__init__()
    self.input_linear = torch.nn.Linear(D_in, H)
    self.middle_linear = torch.nn.Linear(H, H)
    self.output_linear = torch.nn.Linear(H, D_out)

  def forward(self, x):
    """
    For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
    and reuse the middle_linear Module that many times to compute hidden layer
    representations.
    Since each forward pass builds a dynamic computation graph, we can use normal
    Python control-flow operators like loops or conditional statements when
    defining the forward pass of the model.
    Here we also see that it is perfectly safe to reuse the same Module many
    times when defining a computational graph. This is a big improvement from Lua
    Torch, where each Module could be used only once.
    """
    
    h_relu = self.input_linear(x).clamp(min=0)
    num_hidden_layers = random.randint(0, 3)
    for _ in range(num_hidden_layers):
      h_relu = self.middle_linear(h_relu).clamp(min=0)
    y_pred = self.output_linear(h_relu)
    
    return y_pred, num_hidden_layers

In [None]:
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs, and wrap them in Variables
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)

# Construct our model by instantiating the class defined above
model = DynamicNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(5000):
  # Forward pass: Compute predicted y by passing x to the model
  y_pred, num_hidden_layers = model(x)

  # Compute and print loss
  loss = criterion(y_pred, y)
  if t % 100 == 0:
      print("Iteration %d, Loss %.5f, Number Of Hidden Layers %d\n" %(t, loss.data[0], num_hidden_layers))

  # Zero gradients, perform a backward pass, and update the weights.
  optimizer.zero_grad()
  loss.backward()
optimizer.step()

Iteration 0, Loss 600.79181, Number Of Hidden Layers 1

Iteration 100, Loss 595.12225, Number Of Hidden Layers 3

Iteration 200, Loss 595.73450, Number Of Hidden Layers 2

Iteration 300, Loss 595.73450, Number Of Hidden Layers 2

Iteration 400, Loss 595.73450, Number Of Hidden Layers 2

Iteration 500, Loss 600.79181, Number Of Hidden Layers 1

Iteration 600, Loss 600.79181, Number Of Hidden Layers 1

Iteration 700, Loss 626.55396, Number Of Hidden Layers 0

Iteration 800, Loss 595.12225, Number Of Hidden Layers 3

Iteration 900, Loss 595.73450, Number Of Hidden Layers 2

Iteration 1000, Loss 600.79181, Number Of Hidden Layers 1

Iteration 1100, Loss 626.55396, Number Of Hidden Layers 0

Iteration 1200, Loss 595.12225, Number Of Hidden Layers 3

Iteration 1300, Loss 626.55396, Number Of Hidden Layers 0

Iteration 1400, Loss 600.79181, Number Of Hidden Layers 1

Iteration 1500, Loss 595.12225, Number Of Hidden Layers 3

Iteration 1600, Loss 595.12225, Number Of Hidden Layers 3

Iteration