# Linear Algebra in PyTorch

[PyTorch](https://pytorch.org/) is a popular package for developing models for deep learning.  In this section, we'll look at its linear algebra capabilities.

Even if you are not doing deep learning, you can use PyTorch for linear algebra.  One of the nice things about PyTorch is that it makes it easy to take advantage of GPU hardware, which is very efficient at certain operations.

You can find [PyTorch tutorials](https://pytorch.org/tutorials/) on the official website as well as [documentation](https://pytorch.org/docs/stable/index.html).  Note these tutorials are focused on deep learning.  This section will focus on the linear algebra capabilities.

When you install PyTorch, use the `pytorch` channel in conda.

```bash
conda install pytorch -c pytorch
```

In [1]:
%pylab inline
import torch

Populating the interactive namespace from numpy and matplotlib


## PyTorch Tensors

PyTorch tensors are just multi-dimensional arrays.  You can go back and forth between these and numpy.

In [2]:
A = np.random.rand(2,2)
A

array([[0.78570125, 0.24589094],
       [0.33821079, 0.36772354]])

In [3]:
B = torch.Tensor(A)
B

tensor([[0.7857, 0.2459],
        [0.3382, 0.3677]])

To put a tensor on gpu, use `cuda()`

In [4]:
Bcuda = B.cuda()
print(Bcuda)
Bcpu = B.cpu()
print(Bcpu)

RuntimeError: cuda runtime error (999) : unknown error at /opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/THC/THCGeneral.cpp:47

In [14]:
B.to(device='cuda')

tensor([[0.8763, 0.2339],
        [0.3822, 0.8602]], device='cuda:0')

## Linear Algebra Functions

PyTorch provides access to a variety of BLAS and LAPACK-type routines - see [documentation here](https://pytorch.org/docs/stable/torch.html#blas-and-lapack-operations).  These do not follow the BLAS/LAPACK naming conventions

[`torch.addmv`](https://pytorch.org/docs/stable/generated/torch.addmv.html#torch-addmv) is roughly equivalent to `axpy`, and performs $Ax + y$

In [21]:
m = 100
n = 100
device = torch.device('cuda') # 'cuda' or 'cpu'

Anp = np.random.randn(m,n)
xnp = np.random.randn(n)
ynp = np.random.randn(m)

A = torch.Tensor(Anp).to(device=device)
x = torch.Tensor(xnp).to(device=device)
y = torch.Tensor(ynp).to(device=device)

z = torch.addmv(y, A, x)

tensor([ -5.9996, -13.2232,  -7.8537,  -1.2397,   7.0935,   6.2512, -21.6037,
          3.0582,   0.4784,  -0.8764,  10.3109, -11.9672,  -5.3819, -14.5128,
         10.0314, -11.4222,   5.6935,  -4.2006,   0.8691, -15.6624,  -9.7964,
        -10.4048,  -5.1958,   4.7203, -13.0604,  -2.9627,  -3.9022, -11.1822,
         -0.7486,  -1.0978,   2.1650,  16.4536,  13.0700,   5.6372, -11.6202,
          3.9144,  -9.4889,  18.1967,  -1.6189,  13.6554,   1.9952,  -5.2764,
         15.7601,  -7.3246,  -6.6940,  -0.4277,   0.4601, -20.7658,   0.2272,
         -8.2151,  22.4861, -11.9605,  -5.5331, -15.1731,  -5.3083,  -2.2013,
         21.9076,   1.1307,  -5.8735,   0.2406,  25.1729,  16.2230,   5.1621,
          2.1377,   0.6817,   1.8963,   6.3730,  -7.3287,  12.3657, -12.5891,
          5.1580,   7.1713,   9.3990,   5.3373,  -4.4808, -18.1150,  -6.8875,
          2.9506,  -5.0131,   3.2138,  -4.8178,   8.6255,   4.5671,  16.8259,
         -7.8197,  -0.9449,  13.7888,   6.7392, -15.4357,  20.34

Let's look at the timing difference between CPU and GPU

In [30]:
m = 1000
n = 1000

Anp = np.random.randn(m,n)
xnp = np.random.randn(n)
ynp = np.random.randn(m)

for device in ['cpu', 'cuda']:
    print(f"device = {device}")
    A = torch.Tensor(Anp).to(device=device)
    x = torch.Tensor(xnp).to(device=device)
    y = torch.Tensor(ynp).to(device=device)
    z = torch.addmv(y, A, x)

    %time z = torch.addmv(y, A, x)
    
    

device = cpu
CPU times: user 239 µs, sys: 59 µs, total: 298 µs
Wall time: 302 µs
device = cuda
CPU times: user 51 µs, sys: 0 ns, total: 51 µs
Wall time: 54.4 µs


[`torch.mv`](https://pytorch.org/docs/stable/generated/torch.mv.html#torch.mv) performs matrix-vector products

In [29]:
Anp = np.random.randn(m,n)
xnp = np.random.randn(n)

for device in ['cpu', 'cuda']:
    print(f"device = {device}")
    A = torch.Tensor(Anp).to(device=device)
    x = torch.Tensor(xnp).to(device=device)
    y = torch.mv(A, x)

    %time y = torch.mv(A, x)

device = cpu
CPU times: user 207 µs, sys: 52 µs, total: 259 µs
Wall time: 262 µs
device = cuda
CPU times: user 35 µs, sys: 0 ns, total: 35 µs
Wall time: 37.9 µs


[torch.mm](https://pytorch.org/docs/stable/generated/torch.mm.html#torch.mm) performs matrix-matrix multiplications

In [28]:
m = 1000
n = 1000

Anp = np.random.randn(m,n)
Bnp = np.random.randn(n, n)

for device in ['cpu', 'cuda']:
    print(f"device = {device}")
    A = torch.Tensor(Anp).to(device=device)
    B = torch.Tensor(Bnp).to(device=device)
    C = torch.mm(A, B) # run once to warm up

    %time C = torch.mm(A, B)

device = cpu
CPU times: user 31 ms, sys: 690 µs, total: 31.7 ms
Wall time: 8.24 ms
device = cuda
CPU times: user 27 µs, sys: 7 µs, total: 34 µs
Wall time: 38.6 µs


### Batch operations

Where PyTorch (and GPUs in general) really shine are in **batch** operations.  We get extra efficiency if we do a bunch of multiplications with matrices of the same size.

For matrix-matrix multiplcation, the function is [`torch.bmm`](https://pytorch.org/docs/stable/generated/torch.bmm.html#torch.bmm)

Because tensors are row-major, we want the batch index to be the first index.

In [32]:
n = 512 # matrix size
k = 32 # batch size

for device in ['cpu', 'cuda']:
    print(f"device = {device}")
    A = torch.randn(k, n, n).to(device=device)
    B = torch.randn(k, n, n).to(device=device)
    C = torch.bmm(A, B) # run once to warm up

    %time C = torch.bmm(A, B)

device = cpu
CPU times: user 192 ms, sys: 13.8 ms, total: 206 ms
Wall time: 51.8 ms
device = cuda
CPU times: user 239 µs, sys: 53 µs, total: 292 µs
Wall time: 298 µs


## Sparse Linear Algebra

PyTorch also supports sparse tensors in [`torch.sparse`](https://pytorch.org/docs/stable/sparse.html).  Tensors are stored in [COOrdinate format](sparse.html#coordinate-format).

In [33]:
i = torch.LongTensor([[0, 1, 1],
                      [2, 0, 2]])
v = torch.FloatTensor([3, 4, 5])
torch.sparse.FloatTensor(i, v, torch.Size([2,3])).to_dense()

tensor([[0., 0., 3.],
        [4., 0., 5.]])