# Pylops-GPU - extending pytorch with Lops

### Author: M.Ravasi

In this notebook we experiment with extending Pytorch with PyLops linear operators

In [2]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
%pylab inline

import warnings
warnings.filterwarnings('ignore')

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import scipy as sp
import matplotlib.pyplot as plt
import pylops
import pylops_gpu

from torch.autograd import gradcheck
from scipy.signal import triang
from pylops import Diagonal, MatrixMult, FirstDerivative
from pylops.utils import dottest
from pylops import Restriction

from scipy.sparse.linalg import cg
from pylops_gpu import TorchOperator
from pylops_gpu.utils.backend import device
from pylops_gpu.utils import dottest as gdottest
from pylops_gpu import Restriction as gRestriction

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Populating the interactive namespace from numpy and matplotlib


## Gradient of scalar-scalar

Let's consider the following **scalar** input and output function:

$$f(x) = (3*x)^2$$

that is expressed as:

$$y = 3*x, \quad z = y^2$$

We can thus compute the following derivatives

$$df/dx = 18 * x, \quad dy/dx = 3$$

In [3]:
x = torch.ones(1, requires_grad=True)
y = 3 * x
z = y ** 2

In [4]:
z.backward(retain_graph=True)
print(x.grad)

tensor([18.])


In [5]:
x.grad.data.zero_() # always clean gradient otherwise it will be summed
y.backward(retain_graph=True)
print(x.grad)

x.grad.data.zero_() # always clean gradient otherwise it will be summed
z.backward(retain_graph=True)
print(x.grad)

tensor([3.])
tensor([18.])


## Gradient of vector-scalar

Let's do the same with a **scalar** function and a **vectorial** input

$$f(\textbf{x}) = \sum (3*\textbf{x})^2$$

We can thus compute the following derivatives

$$df/dx_i = 18 * x_i$$

In [13]:
x = torch.arange(5, dtype=torch.float32, requires_grad=True)
y = 3 * x
z = torch.sum(y ** 2)

In [14]:
z.backward(retain_graph=True)
#z.backward(torch.tensor(1.), retain_graph=True)
print(x.grad)

tensor([ 0., 18., 36., 54., 72.])


## Gradient of vector-vector

Finally we consider a **vectorial** function and a **vectorial** input

$$\textbf{y} = 3*\textbf{x}^2$$

Now we cannot compute the jacobian, but we can compute the product of the jacobian by a vector $$\textbf{J}^T * \textbf{v}$$.

In our case:

$$\textbf{J} = \begin{bmatrix}
dy_1/dx_1&...&dy_N/dx_1 \\
...&...&...\\
dy_1/dx_N&...&dy_N/dx_M
\end{bmatrix} = 
\begin{bmatrix}
6*x_1&...&0 \\
...&...&...\\
0&...&6*x_M
\end{bmatrix}
$$

If we choose a unitary vector:

$$
\textbf{g} = \textbf{J}^T * \textbf{v} = \begin{vmatrix} 6*x_1 \\ ...\\ 6*x_M \end{vmatrix}
$$

In [119]:
x = torch.arange(5, dtype=torch.float32, requires_grad=True)
y = 3 * (x ** 2)

In [120]:
v = torch.ones(5)
y.backward(v, retain_graph=True)
print(x)
print(x.grad)

tensor([0., 1., 2., 3., 4.], requires_grad=True)
tensor([ 0.,  6., 12., 18., 24.])


## Gradient of matrix-vector multiplication

Let's consider now a **matrix-vector multiplication**

$$\textbf{y} = \textbf{A}\textbf{x}$$

For any matrix the Jacobian is the matrix itself ($\textbf{J} = \textbf{A}$), and the gradient is equal:

$$\textbf{g} =\textbf{A}^T\textbf{v}$$

In [9]:
n, m = 10, 5 
A = torch.from_numpy(np.arange(n*m, dtype=np.float32).reshape(n, m))

x = torch.arange(m, dtype=torch.float32, requires_grad=True)
y = torch.matmul(A, x)

In [10]:
v = torch.ones(n)
y.backward(v, retain_graph=True)
print(x.grad)
print(torch.matmul(A.T, v))

tensor([225., 235., 245., 255., 265.])
tensor([225., 235., 245., 255., 265.])


If we thus have the following relation:

$$\textbf{y} = \textbf{A} (3*\textbf{x}^2)$$

the gradient can be obtained by first multiplying $\textbf{A}^T$ followed by the gradient of the second term.

In [11]:
n, m = 10, 5 
A = torch.from_numpy(np.arange(n*m, dtype=np.float32).reshape(n, m))

x = torch.ones(m, dtype=torch.float32, requires_grad=True)
y = 3 * x**2
z = torch.matmul(A, y)
z

tensor([ 30., 105., 180., 255., 330., 405., 480., 555., 630., 705.],
       grad_fn=<MvBackward>)

In [12]:
v = torch.ones(n)
v1 = torch.matmul(A.T, v)
y.backward(v1, retain_graph=True)
print(x.grad)

tensor([1350., 1410., 1470., 1530., 1590.])


Compare with full gradient from AD

In [13]:
x.grad.data.zero_() # always clean gradient otherwise it will be summed
z.backward(v, retain_graph=True)
print(x.grad)

tensor([1350., 1410., 1470., 1530., 1590.])


## Gradient of linear operator

Finally we consider a linear operator that mimics a matrix $\textbf{A}$ and define its backward operator as its adjoint and compare results with its equivalent dense matrix

In [14]:
class MatMult(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x, R):
        y = torch.matmul(R, x)
        ctx.save_for_backward(R)
        return y
        
    @staticmethod
    def backward(ctx, y):
        R, = ctx.saved_tensors
        return  torch.matmul(R.T, y), None

In [15]:
x = torch.ones(m, dtype=torch.float32, requires_grad=True)
y = 3 * x**2
z = MatMult.apply(y, A)

In [16]:
v = torch.ones(n)
z.backward(v, retain_graph=True)
print(x.grad)

tensor([1350., 1410., 1470., 1530., 1590.])


And with a more complicated operator, the **Restriction** operator

In [17]:
# subsampling 
perc_subsampling=0.4

nsub=int(np.round(n*perc_subsampling))
iava = np.sort(np.random.permutation(np.arange(n))[:nsub])
R = np.zeros((nsub, n))
R[np.arange(nsub), iava] = 1
R = torch.from_numpy(R)

In [18]:
x = torch.arange(n, dtype=torch.float64, requires_grad=True)
y = MatMult.apply(x, R)
print(y)

# gradient
v = torch.randn(nsub, dtype=torch.float64)
y.backward(v, retain_graph=True)
print(x.grad)

tensor([0., 6., 7., 8.], dtype=torch.float64, grad_fn=<MatMultBackward>)
tensor([-0.0758,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000, -0.8599,  1.8456,
        -1.3833,  0.0000], dtype=torch.float64)


Finally we wrap the linear operator from pylops-gpu. 

This could become very generical way to include all linear operators of pylops in pytorch and create combination of NN layers and physical operators

In [19]:
Rop = gRestriction(n, iava, dtype=torch.float64)
x = torch.arange(n, dtype=torch.float64, requires_grad=True)
y = TorchOperator(Rop, pylops=False).apply(x)
print(y)

# gradient
y.backward(v, retain_graph=True)
print(x.grad)

inputs = (torch.randn(n, dtype=torch.double,requires_grad=True))
test = gradcheck(TorchOperator(Rop).apply, inputs, eps=1e-6, atol=1e-4)
print(test)

tensor([0., 6., 7., 8.], dtype=torch.float64, grad_fn=<_TorchOperatorBackward>)
tensor([-0.0758,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000, -0.8599,  1.8456,
        -1.3833,  0.0000], dtype=torch.float64)
True


And directly using a pylops operator (conversion from and to torch Tensors is handled internally)

In [20]:
Rop = Restriction(n, iava, dtype=np.float64)
x = torch.arange(n, dtype=torch.float64, requires_grad=True)
y = TorchOperator(Rop, pylops=True).apply(x)
print(y)

# gradient
y.backward(v, retain_graph=True)
print(x.grad)

inputs = (torch.randn(n, dtype=torch.double,requires_grad=True))
test = gradcheck(TorchOperator(Rop).apply, inputs, eps=1e-6, atol=1e-4)
print(test)

tensor([0., 6., 7., 8.], dtype=torch.float64, grad_fn=<_TorchOperatorBackward>)
tensor([-0.0758,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000, -0.8599,  1.8456,
        -1.3833,  0.0000], dtype=torch.float64)
True


## Gradient on input of NN

In [21]:
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


In [22]:
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

10
torch.Size([6, 1, 3, 3])


In [23]:
x = torch.randn(1, 1, 32, 32, requires_grad=True)
out = net(x)
print(out)

tensor([[ 0.0367,  0.0303,  0.1158, -0.0952,  0.0046, -0.0868, -0.1189, -0.0491,
          0.0411, -0.1482]], grad_fn=<AddmmBackward>)


Gradients on weights from vector output

In [24]:
net.zero_grad()
out.backward(torch.randn(1, 10))
net.conv2.weight.grad

tensor([[[[ 6.5344e-02,  7.4599e-02,  6.1744e-02],
          [ 6.8918e-02,  7.0227e-02,  8.2948e-02],
          [ 2.2136e-02,  4.4246e-02,  7.6388e-02]],

         [[ 6.4809e-02,  1.1347e-01,  8.1592e-02],
          [ 1.0299e-01,  8.3533e-02,  8.0949e-02],
          [ 5.1546e-02,  4.8394e-02,  1.0211e-01]],

         [[ 1.4843e-01,  9.3501e-02,  1.0322e-01],
          [ 5.3313e-02,  1.5026e-01,  9.6460e-02],
          [ 1.1382e-01,  1.0095e-01,  1.5473e-01]],

         [[ 2.6541e-02,  3.9705e-02,  6.7609e-02],
          [ 6.7996e-02,  4.0456e-02,  5.0336e-03],
          [ 1.3968e-02,  6.8134e-02,  5.8479e-02]],

         [[ 7.7581e-02,  1.2238e-01,  8.2076e-02],
          [ 8.9508e-02,  7.0173e-02,  8.8250e-02],
          [ 1.8027e-02,  7.7447e-02,  1.0314e-01]],

         [[ 6.4214e-02,  6.7708e-02,  5.2168e-02],
          [ 8.2369e-02,  1.1010e-01,  1.2574e-01],
          [ 3.7718e-02,  3.0779e-02,  1.1155e-01]]],


        [[[ 0.0000e+00,  0.0000e+00,  0.0000e+00],
          [ 0.000

Gradients on weights from scalar output (loss)

In [27]:
output = net(x)
target = torch.randn(10).view(1, -1)
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

tensor(0.6220, grad_fn=<MseLossBackward>)


In [28]:
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

<MseLossBackward object at 0x1c2e1a0208>
<AddmmBackward object at 0x1c2e1a01d0>
<AccumulateGrad object at 0x1c2e1a0208>


In [36]:
net.zero_grad()     # zeroes the gradient buffers of all parameters
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward(retain_graph=True)
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

loss.backward(retain_graph=True)
print('x.grad after backward')
print(x.grad)

conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([-0.0058,  0.0100, -0.0100,  0.0096,  0.0008, -0.0016])
x.grad after backward
tensor([[[[ 2.3789e-04,  2.8962e-04,  6.5590e-04,  ..., -8.3606e-05,
            0.0000e+00,  0.0000e+00],
          [-1.0139e-03,  8.9873e-04, -1.0501e-03,  ...,  7.2838e-04,
            0.0000e+00,  0.0000e+00],
          [-3.1228e-04,  1.6695e-03,  8.7161e-04,  ..., -2.6210e-03,
            0.0000e+00,  0.0000e+00],
          ...,
          [ 5.8365e-04,  3.1709e-04,  2.2372e-04,  ...,  1.7347e-03,
            0.0000e+00,  0.0000e+00],
          [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
            0.0000e+00,  0.0000e+00],
          [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
            0.0000e+00,  0.0000e+00]]]])


Gradient on model

In [37]:
v = torch.autograd.Variable(torch.from_numpy(np.random.normal(0, 1, [1, 10]).astype(np.float32)))
output.backward(v, retain_graph=True)
print(x.grad)

tensor([[[[ 3.7889e-04, -2.0635e-04, -2.3091e-04,  ..., -1.0745e-04,
            0.0000e+00,  0.0000e+00],
          [-4.1266e-04,  3.6506e-04, -7.1320e-04,  ...,  8.8593e-04,
            0.0000e+00,  0.0000e+00],
          [-6.0151e-04,  2.6901e-03,  3.7990e-05,  ..., -5.2463e-03,
            0.0000e+00,  0.0000e+00],
          ...,
          [ 1.4937e-03,  4.9296e-04,  1.5585e-04,  ...,  3.5435e-03,
            0.0000e+00,  0.0000e+00],
          [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
            0.0000e+00,  0.0000e+00],
          [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
            0.0000e+00,  0.0000e+00]]]])
