# LELA 60342 Research Methods in Computational and Corpus Linguistics
## Week 1: Tensors and Operations on Tensors in PyTorch

### Pytorch

Pytorch (https://pytorch.org/) is a very powerful library for building neural network / deep learning models that you are going to be using in CL2 this semester and are likely to want to use in your own research. In these RM in CL sessions I am going to introduce you to it and supplement the work you are going to be doing in CL2. We start by importing the library as follows:

In [None]:
import torch

### Tensors in Pytorch

At its heart Pytorch (like all deep learning libraries) is a tool for manipulating tensors. You encountered Tensors in Numpy in CL1 last semester. Much of what you will do with them in Pytorch will seem familiar, but Pytorch adds a few things that make Tensors much more useful to us in Deep Learning. Firstly Pytorch supports very efficent computation with tensors on the GPU. Secondly Pytorch provides a package (autograd) for the automatic calculation of gradients, and these gradients are stored in tensor objects. We will get to that in a bit. First we need to create Tensors.

### Creating tensors

Create from arrays

In [None]:
torch.tensor([1,2,4,5])

In [None]:
torch.tensor([[1,2,4,5],[1,2,4,5]])

Initialise by size

In [None]:
torch.zeros(3,5)

In [None]:
torch.ones(10,2)

#### Create Randomly populated tensors
See here for range of options: \\
https://pytorch.org/docs/stable/torch.html#random-sampling

In [None]:
# Uniform values from 0 to 1
torch.rand(10)

In [None]:
# Values from normal distribution with mean of 0 and standard deviation of 1
torch.randn(10)

### Create based on existing tensor

In [None]:
a=torch.zeros(10)
torch.full_like(a,11)

In [None]:
torch.full_like(a,0.2)

#### Specifying data type
You can specify the data type of your tensor. See here for the full set of PyTorch data types:

https://pytorch.org/docs/stable/tensors.html#data-types

In [None]:
torch.rand(10,dtype=torch.float32)

In [None]:
torch.rand(10,dtype=torch.float64)

In [None]:
a=torch.zeros(10)
torch.full_like(a,11,dtype=torch.int32)

### Running on GPU
One of the most valuable things about PyTorch is that it can run computations on the GPU. Tensors that are to be used for such computations must be situated on the GPU. By default new tensors will be created on the CPU. However you can assign the device on which you want a tensor to be situated.

In [None]:
# Create the tensor in GPU memory
a=torch.zeros(10,device="cuda")

You can also move the tensor between processors

In [None]:
a=torch.zeros(10)
a=a.to("cuda")

In [None]:
b=torch.zeros(10,device="cuda")
b=b.to("cpu")

In order for operations to be applied to tensors all the tensors involved must be on the same device

In [None]:
a*b

Bear in mind that cuda only works on NVIDIA GPUs and is not available on Apple computers. You can use the GPU on Macs but then you need to use "mps" not "cuda". See here: https://developer.apple.com/metal/pytorch/

In order to make your code moveable between machines you would so well to set the device based on what is available locally as in the following:

In [None]:
if torch.cuda.is_available():
  device = "cuda"
elif torch.mps.is_available(): # if you never use macs you can omit this
  device = "mps"    # if you never use macs you can omit this
else:
  device = "cpu"

In [None]:
b=torch.zeros(10,device=device)
b.device

### Mathematical Operations on Tensors

Like Numpy Torch has functions for a wide array of mathematical operations on tensors. If tensors are on the GPU then the operations are performed on the GPU and the result of the operation will also be situated on the GPU. You can find most operations you want. Full list here:

https://pytorch.org/docs/stable/torch.html#math-operations

Be careful to check that the function does what you expect as there are some false friends. For example the torch.dot function (unlike its numpy equivalent)only works for 1D tensors (vectors):

In [None]:
a=torch.rand(3,4)
b=torch.rand(4)

torch.dot(a,b)

For matrix-vector dot product there is a separate function torch.mv (https://pytorch.org/docs/stable/generated/torch.mv.html)

In [None]:
torch.mv(a,b)

For matrix-matrix dot product there is a separate function torch.mm (https://pytorch.org/docs/stable/generated/torch.mm.html)

To transpose a tensor x in Pytorch you can write x=x.T





###Autograd

One of the most powerful aspects of PyTorch is its automatic calculation of derivatives/partial derivatives via the chain rule. This makes the performance of the backpropogation of error in neural network models much more straightforward - we don't need to calculate these ourselves which can become a burden in complex models.

Remember that the derivative of a function is the sensitivity the output of a function as a result of a change in the input to that function and we use it to change our model weights in a way that decreases loss.

If we have some (e.g. loss) function f with some input tensor, then if we call f.backward() Pytorch will augment the tensors that determine the input to that function with their partial derivatives/gradients. In order for this to happen we have to a flag requires_grad=True for the tensor.

Here is an example with a very simple function.



In [None]:
x=torch.tensor([1],dtype=torch.float32,requires_grad=True)
f=x*2
f.backward()
x.grad

Or for longer tensors of weights

In [None]:
x=torch.tensor([1,1],dtype=torch.float32,requires_grad=True)
f=sum(x*2)
f.backward()
x.grad

This will work for much more complex functions including composite (chains of) functions. And critically it will work for all the functions we use in neural networks.

We will see in future weeks that Pytorch also provides tools to perform updates for us but we can also just use these gradients to update our model in the same way that we used our manually calculated derivatives in CL1.

Bear in mind that tensors hold on their gradients and update them. This will be useful in some model types where we want to accumulate gradients over multiple forward passes (e.g. recurrent neural networks which you will encounter soon) but for feedforward networks of the kind that we have been building so far we want to reset the grads to zero after each forward pass. We can do this with the following command:

weights.grad=None

And if we want to update tensors for which requires_grad=True then we need to turn off the gradient computation by calling torch.no_grad() e.g.:
   with torch.no_grad():
      w[0] = w[0] - learning_rate * w.grad[0]

Problem 1: Write code for a linear regression model predicting y from both features in X (data generated below) using Pytorch with autograd to obtain gradients. Print out the learning curve.

In [None]:
x=torch.tensor([[-0.6832,  0.2324, -1.2326, -0.3170,  0.3240, -1.2326, -1.5989,  0.7818,
-0.3170,  0.2324,  1.0565,  1.4228,  1.3312],
        [-1.5407, -1.2839, -1.0271, -0.7703, -0.5136, -0.2568,  0.0000,  0.2568,
          0.5136,  0.7703,  1.0271,  1.2839,  1.5407]])
y=torch.tensor([33,49,41,54,52,45,36,58,45,69,55,56,68])

In [None]:
import matplotlib.pyplot as plt
plt.scatter(x[0], x[1], s=torch.exp(y/10), alpha=0.5)
plt.show()

Problem 2: Rewrite your code from problem 1 so as to use the GPU for all calculations