# **Pytorch**

Pytorch can be especially effective if you have a gpu available, and have cuda installed. This makes it possible to do large-scale matrix calculations e.g. (matrix multiplications) more quickly than if you try to do them on your cpu.



In [1]:
import torch
import numpy as np

if torch.cuda.is_available():
    dev="cuda:0"
    print("cuda available")
    print(torch.cuda.get_device_name())
    t = torch.cuda.get_device_properties(0).total_memory
    c = torch.cuda.memory_reserved(0)
    a = torch.cuda.memory_allocated(0)
    print(t)
    print(c)
    print(a)
else:
    dev="cpu"
    print("cuda not available")


cuda not available


## **Tensors**

In Pytorch, tensors are basic objects to work with. They are basically multidimensional arrays.

In [2]:
T=torch.tensor([[1.,2.,3.],[4.,5.,6.]])
print(T)
print(T.size())

T=torch.tensor([[[1.,2.,3.],[4.,5.,6.]],[[7.,8.,9.],[10.,11.,12.]]])
print(T)
print(T.size())

print(T[0,1,0])

tensor([[1., 2., 3.],
        [4., 5., 6.]])
torch.Size([2, 3])
tensor([[[ 1.,  2.,  3.],
         [ 4.,  5.,  6.]],

        [[ 7.,  8.,  9.],
         [10., 11., 12.]]])
torch.Size([2, 2, 3])
tensor(4.)


## **Basic tensor vector operations**

We can add tensors.

In [3]:
import numpy as np
L1=np.random.choice([float(i) for i in range(10)],size=(3,2,2))
T1=torch.tensor(L1)

L2=np.random.choice([float(i) for i in range(10)],size=(3,2,2))
T2=torch.tensor(L2)

print(T1)
print(T1.size())

print(T2)
print(T2.size())

T3=T1+T2
print(T3)

tensor([[[2., 7.],
         [2., 1.]],

        [[6., 6.],
         [8., 2.]],

        [[8., 1.],
         [1., 6.]]], dtype=torch.float64)
torch.Size([3, 2, 2])
tensor([[[4., 2.],
         [1., 5.]],

        [[6., 6.],
         [2., 0.]],

        [[6., 8.],
         [9., 2.]]], dtype=torch.float64)
torch.Size([3, 2, 2])
tensor([[[ 6.,  9.],
         [ 3.,  6.]],

        [[12., 12.],
         [10.,  2.]],

        [[14.,  9.],
         [10.,  8.]]], dtype=torch.float64)


We can peform scalar multiplication

In [4]:
L1=np.random.choice([float(i) for i in range(10)],size=(3,2,2))
T1=torch.tensor(L1)
print(T1)
T2=5.*T1
print(T2)

tensor([[[3., 5.],
         [4., 2.]],

        [[1., 4.],
         [9., 0.]],

        [[4., 7.],
         [4., 4.]]], dtype=torch.float64)
tensor([[[15., 25.],
         [20., 10.]],

        [[ 5., 20.],
         [45.,  0.]],

        [[20., 35.],
         [20., 20.]]], dtype=torch.float64)


**Coordinatewise multiplication**

In [5]:
L1=np.random.choice([float(i) for i in range(10)],size=(3,2,2))
T1=torch.tensor(L1)
print(T1)
L2=np.random.choice([float(i) for i in range(10)],size=(3,2,2))
T2=torch.tensor(L2)
print(T2)

T3=T1*T2
print(T3)

tensor([[[4., 5.],
         [9., 7.]],

        [[5., 4.],
         [7., 3.]],

        [[3., 7.],
         [8., 4.]]], dtype=torch.float64)
tensor([[[9., 6.],
         [5., 1.]],

        [[6., 8.],
         [9., 5.]],

        [[7., 3.],
         [0., 9.]]], dtype=torch.float64)
tensor([[[36., 30.],
         [45.,  7.]],

        [[30., 32.],
         [63., 15.]],

        [[21., 21.],
         [ 0., 36.]]], dtype=torch.float64)


**Matrix multiplication**

A matrix is a 2-d tensor, and mm is used for matrix multiplication.

In [6]:
L1=np.random.choice([float(i) for i in range(10)],size=(3,2))
T1=torch.tensor(L1)
print(T1)
L2=np.random.choice([float(i) for i in range(10)],size=(2,4))
T2=torch.tensor(L2)
print(T2)

T3=torch.mm(T1,T2)
print(T3)

tensor([[3., 1.],
        [2., 3.],
        [9., 2.]], dtype=torch.float64)
tensor([[5., 7., 0., 9.],
        [1., 1., 9., 1.]], dtype=torch.float64)
tensor([[16., 22.,  9., 28.],
        [13., 17., 27., 21.],
        [47., 65., 18., 83.]], dtype=torch.float64)


## **Batch matrix multiplication**

Batch matrix multiplication refers to matrix multiplicaton of two "batches" of matrices.

A batch of K  MxN matrices is a tensor that is K x M x N.

We can batch multiply by another batch, which would be a K x N x P tensor.

In [7]:
L1=np.random.choice([float(i) for i in range(10)],size=(5,3,2))
T1=torch.tensor(L1)
print(T1)
L2=np.random.choice([float(i) for i in range(10)],size=(5,2,4))
T2=torch.tensor(L2)
print(T2)

T3=torch.bmm(T1,T2)
print(T3)

tensor([[[8., 6.],
         [3., 2.],
         [7., 3.]],

        [[4., 2.],
         [0., 3.],
         [1., 4.]],

        [[3., 4.],
         [1., 0.],
         [7., 7.]],

        [[6., 1.],
         [8., 9.],
         [5., 5.]],

        [[4., 1.],
         [0., 5.],
         [5., 6.]]], dtype=torch.float64)
tensor([[[0., 6., 7., 3.],
         [7., 2., 3., 1.]],

        [[4., 9., 8., 0.],
         [5., 8., 0., 1.]],

        [[3., 8., 1., 9.],
         [2., 1., 1., 1.]],

        [[5., 7., 7., 6.],
         [9., 0., 9., 5.]],

        [[0., 4., 0., 3.],
         [8., 9., 6., 6.]]], dtype=torch.float64)
tensor([[[ 42.,  60.,  74.,  30.],
         [ 14.,  22.,  27.,  11.],
         [ 21.,  48.,  58.,  24.]],

        [[ 26.,  52.,  32.,   2.],
         [ 15.,  24.,   0.,   3.],
         [ 24.,  41.,   8.,   4.]],

        [[ 17.,  28.,   7.,  31.],
         [  3.,   8.,   1.,   9.],
         [ 35.,  63.,  14.,  70.]],

        [[ 39.,  42.,  51.,  41.],
         [121.,  56., 137.,

**matmul**

matmul allows for more general matrix products.

In [8]:
L1=np.random.choice([float(i) for i in range(10)],size=(5,3,2))
T1=torch.tensor(L1)
L2=np.random.choice([float(i) for i in range(10)],size=(5,2,4))
T2=torch.tensor(L2)
T3=torch.matmul(T1,T2)
print(T3.size())

L1=np.random.choice([float(i) for i in range(10)],size=(10,5,3,2,4))
T1=torch.tensor(L1)
L2=np.random.choice([float(i) for i in range(10)],size=(4,10))
T2=torch.tensor(L2)

T3=torch.matmul(T1,T2)
print(T3.size())

torch.Size([5, 3, 4])
torch.Size([10, 5, 3, 2, 10])


We can do some timings to see the advantages of doing calculations on the gpu.

In [9]:
import time
import torch
import numpy as np

K=25000
L=10000
M=1000

X=np.random.normal(0,1,(K,L))
Y=np.random.normal(0,1,(L,M))

start_time=time.perf_counter()
np.matmul(X,Y)
end_time=time.perf_counter()
print(end_time-start_time)

XT=torch.tensor(X)
YT=torch.tensor(Y)

XG=XT.to(dev)
YG=YT.to(dev)

start_time=time.time()
ZG=torch.matmul(XG,YG)
end_time=time.time()
print(end_time-start_time)

3.085249333991669
38.63678193092346


**Outer product**

In [10]:
v1 = torch.arange(1, 4)    # Size 3
v2 = torch.arange(1, 3)    # Size 2
r = torch.ger(v1, v2)
print(v1)
print(v2)
print(r)

tensor([1, 2, 3])
tensor([1, 2])
tensor([[1, 2],
        [2, 4],
        [3, 6]])


## **Differentiability**

Before we talk about torch autograd it is important to remind ourselves what is meant by differentiability of a function of several variables.

A function $f:\mathbb{R}^d \rightarrow \mathbb{R}$ is said to be differentiable at a point $x \in \mathbb{R}^d$ if there exists a linear function $D: \mathbb{R}^d \rightarrow \mathbb{R}$ with the property that

$$
\lim_{h\rightarrow 0} {f(x+h) - f(x) - D(h) \over \parallel h \parallel}= 0
$$

We call the function $D$ the derviative $f$ at $x.$ 

When $f$ is differentiable at $x$ it is directionally differentiable, i.e. the limit

$$
\lim_{t\rightarrow 0}  {f(x+tv) - f(x) \over t}
$$

exists, for any nonzero vector $v.$ When we take $v = e_i$ the unit vector $(0,0,\ldots,0,1,0,\ldots,0)$ then we obtain the partial derivative

$$
\lim_{t\rightarrow 0}  {f(x+te_i) - f(x) \over t} = {\partial f \over \partial x_i}, 
$$

and the linear function $D$ is given by

$$
D(h) = \sum_{i=1}^d {\partial f \over \partial x_i} h_i
$$

Importantly, directional differentiability of a function does not guarantee its differentiability. In order to be differentiable at a point, a function that has directional derivative must 

A counterexample is give by the function

$$
f(x,y) = {yx^2 \over x^2+y^2}.
$$

This function has partial derivative with respect to $x$ given by

$$
{\partial f \over \partial x} = {2xy^3 \over x^2+y^2}
$$

and this function is not continuous at $(0,0).$ Indeed, the limit along a linear path approaching $(0,0)$ depends on which path you take. 

Consider the path along a ray through the origin in the $\theta$ direction. Take  $x = r \cos(\theta)$ and $y = r\sin(\theta)$ for fixed $\theta$ letting $r \rightarrow 0$ we obtain

$$
{2xy^3 \over x^2+y^2}\biggr\rvert_{x = r\cos(\theta),y= r\sin(\theta)}
= 2 \cos(\theta) \sin^3(\theta)
$$

which depends on $\theta.$




**Torch autograd**

A key feature of Pytorch is autograd - the ability to store information about calculations on a tensor and generate gradients automatically in code.

Gradients will be useful whenever we want to optimize some function, like a loss function when we fit a statistical model.

Here we create a tensor x, and tell pytorch to store gradient information when we create tensors that are functions of x. Ultimately, we compute the gradient of a scalar function of x.

Let's start with a simple case of a dot product with a tensor.

In [11]:
import torch

x=torch.tensor([0.,1.,2.,3.],requires_grad=True)
y=torch.tensor([2.,3.,5.,7.])
z=torch.dot(x,y)
z.backward()
print(x.grad)

tensor([2., 3., 5., 7.])


And more complicated operations work. We just need to make sure that the operation is something that torch knows how to differentiate.

Here we create w as a function of x and compute the gradient of w with respect to x, which is a tensor of partial derivatives with respect to the components of x

$ {\partial ~ \over \partial x_j} w(x)$

In [12]:
import torch

x=torch.tensor([1.,2.,3.],requires_grad=True)
z=torch.sum(torch.sin(x))
u=torch.log(1+z)
w=torch.exp(-u)

w.backward()
print(x.grad)

tensor([-0.0646,  0.0498,  0.1184])


What if we try to do the same for u

In [14]:
import torch

x=torch.tensor([1.,2.,3.],requires_grad=True)
y=x[0]**1+x[1]**2+x[2]**3
z=torch.sin(y)
u=torch.log(1+z)
w=torch.exp(-u)
w.backward(retain_graph=True)
print(x.grad)

u.backward()
print(x.grad)

tensor([-0.3466, -1.3864, -9.3580])
tensor([0.1911, 0.7645, 5.1603])


In [15]:
import torch

x=torch.tensor([1.,2.,3.],requires_grad=True)
y=torch.tensor([5.,7.,6.],requires_grad=True)
z=torch.cos(x)*torch.sin(y)
u=torch.sum(z)
u.backward(retain_graph=True)
print(x.grad)
print(y.grad)


tensor([ 0.8069, -0.5974,  0.0394])
tensor([ 0.1533, -0.3137, -0.9506])
