# Linear Algebra in PyTorch

[PyTorch](https://pytorch.org/) is a popular package for developing models for deep learning.  In this section, we'll look at its linear algebra capabilities.

Even if you are not doing deep learning, you can use PyTorch for linear algebra.  One of the nice things about PyTorch is that it makes it easy to take advantage of GPU hardware, which is very efficient at certain operations.

You can find [PyTorch tutorials](https://pytorch.org/tutorials/) on the official website as well as [documentation](https://pytorch.org/docs/stable/index.html).  Note these tutorials are focused on deep learning.  This section will focus on the linear algebra capabilities.

When you install PyTorch, use the `pytorch` channel in conda.

```bash
conda install pytorch -c pytorch
```

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import torch
torch.__version__

'1.7.0'

## PyTorch Tensors

PyTorch `Tensor`s are just multi-dimensional arrays.  You can go back and forth between these and numpy `ndarray`.

In [2]:
A = np.random.rand(2,2)
A

array([[0.11806299, 0.74761462],
       [0.25475287, 0.65000852]])

In [3]:
B = torch.Tensor(A)
B

tensor([[0.1181, 0.7476],
        [0.2548, 0.6500]])

To put a tensor on gpu, use `cuda()`.  Note that you must have an NVIDIA GPU in your computer to be able to do this successfully.

In [4]:
torch.cuda.is_available()

True

In [5]:
Bcuda = B.cuda()
print(Bcuda)
Bcpu = B.cpu()
print(Bcpu)

tensor([[0.1181, 0.7476],
        [0.2548, 0.6500]], device='cuda:0')
tensor([[0.1181, 0.7476],
        [0.2548, 0.6500]])


In [6]:
B.to(device='cuda')

tensor([[0.1181, 0.7476],
        [0.2548, 0.6500]], device='cuda:0')

To move a tensor back to CPU, you can use `device='cpu'`

## Linear Algebra Functions

PyTorch provides access to a variety of BLAS and LAPACK-type routines - see [documentation here](https://pytorch.org/docs/stable/torch.html#blas-and-lapack-operations).  These do not follow the BLAS/LAPACK naming conventions

[`torch.addmv`](https://pytorch.org/docs/stable/generated/torch.addmv.html#torch-addmv) is roughly equivalent to `axpy`, and performs $Ax + y$

In [7]:
m = 100
n = 100
device = torch.device('cuda') # 'cuda' or 'cpu'

Anp = np.random.randn(m,n)
xnp = np.random.randn(n)
ynp = np.random.randn(m)

A = torch.Tensor(Anp).to(device=device)
x = torch.Tensor(xnp).to(device=device)
y = torch.Tensor(ynp).to(device=device)

z = torch.addmv(y, A, x)

Let's look at the timing difference between CPU and GPU

In [9]:
m = 1000
n = 1000

Anp = np.random.randn(m,n)
xnp = np.random.randn(n)
ynp = np.random.randn(m)
print("numpy")
%time z = ynp + Anp @ xnp


for device in ('cpu', 'cuda'):
    print(f"\ndevice = {device}")
    A = torch.Tensor(Anp).to(device=device)
    x = torch.Tensor(xnp).to(device=device)
    y = torch.Tensor(ynp).to(device=device)
    z = torch.addmv(y, A, x)

    %time z = torch.addmv(y, A, x)
    
    

numpy
CPU times: user 3.62 ms, sys: 0 ns, total: 3.62 ms
Wall time: 853 µs

device = cpu
CPU times: user 1.2 ms, sys: 0 ns, total: 1.2 ms
Wall time: 352 µs

device = cuda
CPU times: user 193 µs, sys: 43 µs, total: 236 µs
Wall time: 63.9 µs


[`torch.mv`](https://pytorch.org/docs/stable/generated/torch.mv.html#torch.mv) performs matrix-vector products

In [10]:
Anp = np.random.randn(m,n)
xnp = np.random.randn(n)

print("numpy")
%time z = Anp @ xnp


for device in ['cpu', 'cuda']:
    print(f"\ndevice = {device}")
    A = torch.Tensor(Anp).to(device=device)
    x = torch.Tensor(xnp).to(device=device)
    y = torch.mv(A, x)

    %time y = torch.mv(A, x)

numpy
CPU times: user 3.53 ms, sys: 0 ns, total: 3.53 ms
Wall time: 812 µs

device = cpu
CPU times: user 1.66 ms, sys: 0 ns, total: 1.66 ms
Wall time: 538 µs

device = cuda
CPU times: user 265 µs, sys: 51 µs, total: 316 µs
Wall time: 90.4 µs


[torch.mm](https://pytorch.org/docs/stable/generated/torch.mm.html#torch.mm) performs matrix-matrix multiplications

In [11]:
m = 1000
n = 1000

Anp = np.random.randn(m,n)
Bnp = np.random.randn(n, n)

print("numpy")
%time C = Anp @ Bnp

for device in ['cpu', 'cuda']:
    print(f"\ndevice = {device}")
    A = torch.Tensor(Anp).to(device=device)
    B = torch.Tensor(Bnp).to(device=device)
    C = torch.mm(A, B) # run once to warm up

    %time C = torch.mm(A, B)

numpy
CPU times: user 112 ms, sys: 4.44 ms, total: 116 ms
Wall time: 32.3 ms

device = cpu
CPU times: user 53.2 ms, sys: 0 ns, total: 53.2 ms
Wall time: 13.5 ms

device = cuda
CPU times: user 124 µs, sys: 20 µs, total: 144 µs
Wall time: 39.6 µs


### Batch operations

Where PyTorch (and GPUs in general) really shine are in **batch** operations.  We get extra efficiency if we do a bunch of multiplications with matrices of the same size.

For matrix-matrix multiplcation, the function is [`torch.bmm`](https://pytorch.org/docs/stable/generated/torch.bmm.html#torch.bmm)

Because tensors are row-major, we want the batch index to be the first index.  In the below code, the batch multiplication is equivalent to
```python
for i in range(k):
    C[i] = A[i] @ B[i]
```

In [12]:
n = 512 # matrix size
k = 32 # batch size

Anp = np.random.randn(k, n, n)
Bnp = np.random.randn(k, n, n)
# see numpy matmul documentation for how this performs batch multiplication
print("numpy")
%time C = np.matmul(Anp, Bnp)

for device in ['cpu', 'cuda']:
    print(f"\ndevice = {device}")
    A = torch.randn(k, n, n).to(device=device)
    B = torch.randn(k, n, n).to(device=device)
    C = torch.bmm(A, B) # run once to warm up

    %time C = torch.bmm(A, B)

numpy
CPU times: user 378 ms, sys: 66.7 ms, total: 444 ms
Wall time: 113 ms

device = cpu
CPU times: user 206 ms, sys: 14.5 ms, total: 221 ms
Wall time: 56.4 ms

device = cuda
CPU times: user 350 µs, sys: 43 µs, total: 393 µs
Wall time: 398 µs


## Sparse Linear Algebra

PyTorch also supports sparse tensors in [`torch.sparse`](https://pytorch.org/docs/stable/sparse.html).  Tensors are stored in [COOrdinate format](sparse.html#coordinate-format).

In [13]:
i = torch.LongTensor([[0, 1, 1],
                      [2, 0, 2]])
v = torch.FloatTensor([3, 4, 5])
torch.sparse.FloatTensor(i, v, torch.Size([2,3])).to_dense()

tensor([[0., 0., 3.],
        [4., 0., 5.]])

indices are stored in a `2 x nnz` tensor of `Long` (a datatype that stores integers).  Values are stored as floats.

### Exercise

Write a function that returns a sparse identity matrix of size `n` in PyTorch.