# Introduction to automatic differentiation using PyTorch

A neural network is based on the minimization of a cost function (also called loss function) w.r.t. to some parameters, these parameters corresponding to the weights of the network. The loss function measures the adequacy between the observed data and the model (i.e., the neural network).

In deep learning architectures, loss functions are complicated, corresponding to a succession of several activation functions (e.g., linear, ReLU, sigmoid). Minimizing this function is thus tricky. The minimization procedure is performed using gradient descent algorithms, but computing gradients of complicated loss function is not straightforward.

Deep learning libraries like PyTorch use automatic differentiation to compute efficiently and rapidly the gradient of the loss function at each epoch. The goal here is to implement a minimization problem using automatic differentiation and compare it with a classic method based on the exact calculation of the derivative. Then, the second objective will be to implement, by hand, the iterations of the optimization of a neural network.

First, let's define a sigmoid function, well known is neural networks, especially for classification problems: tanh. The exact derivative of tanh is also defined.


In [None]:
# Import classic libraries (Matplotlib and PyLab)
%matplotlib inline
%pylab inline

# Import PyTorch
import torch
import torch.nn as nn

# Parameters (figure size and random seed)
pylab.rcParams['figure.figsize'] = (15,15)
torch.manual_seed(1)

In [None]:
# Define tanh function
def f(x):
    return (torch.tanh(x))

# Define the first derivative of tanh
def f_prime(x):
    return (1 - torch.tanh(x)**2)

Then, we compare the analytic solution to the approximation computed by automatic differentiation in PyTorch.

In [None]:
# Generate data
x = torch.linspace(-5.0, 5.0, requires_grad=True)
f_x = f(x)
f_prime_x = f_prime(x)

# Apply the automatic differentiation
y = torch.sum(f(x))

grads = torch.autograd.grad(y, x, create_graph=True)[0] 
#Alternative to torch.autograd.grad:
#y.backward(retain_graph=True)
#x.grad


# Plot results
subplot(2,1,1)
plot(x.detach().numpy(), f_x.detach().numpy(), 'b')
plot(x.detach().numpy(), f_prime_x.detach().numpy(), 'g')
#plot(x.detach().numpy(), x.grad.detach().numpy(), 'r--', linewidth=5)
plot(x.detach().numpy(), grads.detach().numpy(), 'r--', linewidth=5)
plot(array([-5, 5]), array([0, 0]), 'k--')
legend(['f(x) = tanh(x)', 'f prime (analytic)', 'f prime (automatic)'], prop={'size': 20})
title('Tanh and its first derivative', size=20)
xlabel('x', size=20)

Note: you can use both ```torch.autograd.grad``` or ```y.backward()``` to compute the gradients of $y$ with respect to $x$.

Let's use automatic differentation to estimate network parameters.

We consider the same non-linear regression problem as previously: $y=2+0.5x-0.05x^2$ and generate $y$ using an additional Gaussian standard random noise.

In [None]:
# Generate data
x = torch.randn(1000, 1)*10 # input variable
y_true = 2 + 0.5*x - 0.05*x**2 # true model
y = y_true + torch.randn(1000, 1)*2 # add noise to the truth

# Plot noisy data and true model
plot(x, y_true, 'r.')
plot(x, y, 'b.')
legend(['Model', 'Data'], prop={'size': 20})
title('Nonlinear regression', size=20)
xlabel('x', size=20)
ylabel('y', size=20)

Let's consider the same neural network and the MSE loss function.

In [None]:
# Declare a class for nonlinear regression
class nonlinear_regression_nn(nn.Module):
    
    # class initialization
    def __init__(self, input_size, hidden_size, output_size):
        super(nonlinear_regression_nn, self).__init__()
        # fully connected layer with linear activation
        self.fc0 = nn.Linear(input_size, hidden_size)
        # ReLu activation
        self.relu = nn.ReLU()
        # fully connected layer with linear activation
        self.fc1 = nn.Linear(hidden_size, output_size)
        
    # function to apply the neural network
    def forward(self, x):
        out = self.fc0(x)
        out = self.relu(out)
        y_pred = self.fc1(out)
        return y_pred
    
# Create the neural network (1 input size for x, 6 neurons in the hidden layer, and 1 output size for y)
nonlinear_regression_model = nonlinear_regression_nn(1, 10, 1)

# Loss function: MSE = sum [(y - y_pred)^2]
criterion = nn.MSELoss()

Write the code to perform the optimization using the automatic differentiation. Look at the convergence (evolution of the loss) and compare the estimated model with the true model.

In [None]:
#############
### TO DO ###
#############