# Linear algebra in PyTorch

* objects
* operations

## Scalars

**tensors** with 1 element, they have no shape (size).

In [1]:
import torch
x, y = torch.tensor(2.58), torch.tensor(1276)
x, y

(tensor(2.5800), tensor(1276))

In [5]:
x.shape, x.numel(), x.dtype, y.dtype
# it has no dimension, so empty shape
# numel = number of elements

(torch.Size([]), 1, torch.float32, torch.int64)

## Vectors

1-dim `torch.tensor`

In [6]:
x = torch.arange(5)
x

tensor([0, 1, 2, 3, 4])

In [7]:
x.shape, x.numel(), len(x), x.dtype

(torch.Size([5]), 5, 5, torch.int64)


## Matrices

2-dim `torch.tensor`

In [10]:
A = torch.arange(12).view(3, 4)
# give me 12 integers and view at as a 3x4 matrix
# 3 rows, 4 columns
A

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

In [11]:
A.shape, A.numel(), len(A), A.dtype
# len: it doesnt know what it means for a matrix, so it takes just the first dimension

(torch.Size([3, 4]), 12, 3, torch.int64)

## Tensors

higher-dim `torch.tensor`

In [12]:
Z = torch.arange(24).view(-1, 3, 4)
Z

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])

In [13]:
Z.shape, Z.numel(), len(Z), Z.dtype
# 2 length, 3 rows, 4 columns, 12 * 2 = 24

(torch.Size([2, 3, 4]), 24, 2, torch.int64)

## Basic operations

### Transpose - flip axes

In [14]:
A, A.T

(tensor([[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]]),
 tensor([[ 0,  4,  8],
         [ 1,  5,  9],
         [ 2,  6, 10],
         [ 3,  7, 11]]))

In [15]:
A.shape, A.T.shape

(torch.Size([3, 4]), torch.Size([4, 3]))

## Reduction operation
There are many, check documentation


In [16]:
x = torch.arange(4, dtype=torch.float32)
x, x.sum()
# 1+2+3 = 6

(tensor([0., 1., 2., 3.]), tensor(6.))

### Reduction axis specification

In [17]:
# reduce all elements
A.shape, A.sum()

(torch.Size([3, 4]), tensor(66))

In [19]:
# reduce along axis=0 (rows)
A_sum_axis0 = A.sum(axis=0)
A_sum_axis0, A_sum_axis0.shape

# sum across rows, axis=0

(tensor([12, 15, 18, 21]), torch.Size([4]))

In [20]:
# reduce along axis=1 (columns)
A_sum_axis1 = A.sum(axis=1)
A_sum_axis1, A_sum_axis1.shape

(tensor([ 6, 22, 38]), torch.Size([3]))

In [21]:
# reduce along all axis
A.sum(axis=[0, 1]), A.sum()

(tensor(66), tensor(66))

### Non-Reduction Sum

Reduce elements but keep number of axes unchanged `keepdims=True`(useful for broadcasting)


In [22]:
A.sum(axis=1, keepdims=True), A.sum(axis=1, keepdims=True).shape

(tensor([[ 6],
         [22],
         [38]]),
 torch.Size([3, 1]))

In [23]:
A.sum(axis=1), A.sum(axis=1).shape

(tensor([ 6, 22, 38]), torch.Size([3]))

## Dot Products

two vectors $\mathbf{x}, \mathbf{y} \in \mathbb{R}^d$

**dot product**: $\quad \mathbf{x}^\top \mathbf{y} = \langle \mathbf{x}, \mathbf{y}  \rangle = \sum_{i=1}^{d} x_i y_i$.

In [24]:
x = torch.arange(4.)
y = torch.ones(4, dtype=torch.float32)
x, y

(tensor([0., 1., 2., 3.]), tensor([1., 1., 1., 1.]))

In [25]:
x, y, torch.dot(x, y), (x*y).sum(), x.dot(y)
# last 3 same operation

(tensor([0., 1., 2., 3.]),
 tensor([1., 1., 1., 1.]),
 tensor(6.),
 tensor(6.),
 tensor(6.))

## Matrix-Vector Products

matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ and vector $\mathbf{x} \in \mathbb{R}^n$

In [27]:
A = torch.randn(3,4)
A

tensor([[ 0.5195, -0.0674, -1.0511, -1.1626],
        [-0.6995, -2.1672,  0.5128,  0.7514],
        [ 0.2593, -0.8123,  0.5161,  2.2874]])

In [28]:
A.shape, x.shape, torch.mv(A, x), A.mv(x), A[2,:].dot(x) #2. rows mult

(torch.Size([3, 4]),
 torch.Size([4]),
 tensor([-5.6576,  1.1126,  7.0822]),
 tensor([-5.6576,  1.1126,  7.0822]),
 tensor(7.0822))

## Matrix-Matrix Multiplication

$\mathbf{A} \in \mathbb{R}^{n \times k}$ and $\mathbf{B} \in \mathbb{R}^{k \times m}$:

In [26]:
n, k, m = 3, 4, 2
A, B = torch.randn(n,k), torch.randn(k, m)
C = torch.mm(A, B)
C, C.shape

(tensor([[-1.1690,  0.4869],
         [ 2.0671,  0.5512],
         [ 0.1609, -2.9475]]),
 torch.Size([3, 2]))

`torch.matmul` generic function for all the above

In [29]:
torch.matmul(x, y), torch.matmul(A, x), torch.matmul(A, B)

(tensor(6.),
 tensor([-5.6576,  1.1126,  7.0822]),
 tensor([[-3.8647, -1.4330],
         [ 3.1556,  0.4487],
         [ 4.4584,  0.0476]]))

## Norms
$L_2$ norm $\qquad \|\mathbf{x}\| = \|\mathbf{x}\|_2 = \sqrt{\sum_i x_i^2}$.

In [30]:
x, torch.norm(x), torch.norm(x[:2])

(tensor([0., 1., 2., 3.]), tensor(3.7417), tensor(1.))

### Other norms

$L_1$ norm $\qquad \|\mathbf{x}\|_1 = \sum_{i=1}^n \left|x_i \right|$


$L_p$ norm $\qquad \|\mathbf{x}\|_p = \left(\sum_{i=1}^n \left|x_i \right|^p \right)^{1/p}$

Frobenious norm $\qquad \|\mathbf{X}\|_F = \sqrt{\sum_{i=1}^m \sum_{j=1}^n x_{ij}^2}$

In [31]:
x, torch.abs(x).sum(), torch.norm(x, 1), torch.norm(x, 3.)

(tensor([0., 1., 2., 3.]), tensor(6.), tensor(6.), tensor(3.3019))

In [32]:
A, torch.norm(A), torch.norm(A, 'fro'), torch.norm(A.flatten(), 2)

(tensor([[ 0.5195, -0.0674, -1.0511, -1.1626],
         [-0.6995, -2.1672,  0.5128,  0.7514],
         [ 0.2593, -0.8123,  0.5161,  2.2874]]),
 tensor(3.8692),
 tensor(3.8692),
 tensor(3.8692))