In [1]:
import torch

## Scalars

Scalars are implemented as tensors that contain only one element. 

In [2]:
x = torch.tensor(3.0)
y = torch.tensor(2.0)
x + y, x - y, x * y, x ** y

(tensor(5.), tensor(1.), tensor(6.), tensor(9.))

## Vectors

Vectors are fixed length array of scalers. Scalars are the elements of the vector. 
Vectors are implemented as first order tensors. 
<br>
In python and in most programming languages, vector indices start at 0, also known as zero-based indexing, whereas in linear algebra subscripts begin at 1 (one-based indexing).

In [3]:
x = torch.arange(3)
x

tensor([0, 1, 2])

By default vectors are visualized by stacking their elements vertically.

In [4]:
x[2], len(x), x.shape

(tensor(2), 3, torch.Size([3]))

## Matrices
Just as scalars are 0th order tensors and vectors are 1st order tensors, matrices are 2nd order tensors. 

In [5]:
A = torch.arange(6).reshape(3,2)
A

tensor([[0, 1],
        [2, 3],
        [4, 5]])

In [6]:
A.T

tensor([[0, 2, 4],
        [1, 3, 5]])

In [7]:
A = torch.tensor([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
A == A.T

tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])

## Tensors
Tensors gives a generic way of describing extensions to nth order arrays. 

In [8]:
torch.arange(24).reshape(2, 3, 4)

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])

## Basic Properties of Tensor Arithmetic

In [9]:
A = torch.arange(6, dtype=torch.float32). reshape(2, 3)
B = A.clone() # assign a copy of A to B by allocating new memory

A, B , A+B

(tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 tensor([[ 0.,  2.,  4.],
         [ 6.,  8., 10.]]))

In [10]:
A * B # element wise product or Hadamard product

tensor([[ 0.,  1.,  4.],
        [ 9., 16., 25.]])

In [11]:
a = 2
X = torch.arange(24).reshape(2, 3, 4)
a + X, (a * X).shape

(tensor([[[ 2,  3,  4,  5],
          [ 6,  7,  8,  9],
          [10, 11, 12, 13]],
 
         [[14, 15, 16, 17],
          [18, 19, 20, 21],
          [22, 23, 24, 25]]]),
 torch.Size([2, 3, 4]))

## Reduction

In [12]:
x = torch.arange(3, dtype=torch.float32)
x, x.sum()

(tensor([0., 1., 2.]), tensor(3.))

In [13]:
A.shape, A.sum()

(torch.Size([2, 3]), tensor(15.))

In [14]:
A.shape, A.sum(axis=0).shape, A.sum(axis=1).shape

(torch.Size([2, 3]), torch.Size([3]), torch.Size([2]))

In [15]:
A, A.sum(axis=0), A.sum(axis=1)
# axis=0 sum over rows
# axis=1 sum over cols

(tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 tensor([3., 5., 7.]),
 tensor([ 3., 12.]))

In [16]:
A.sum(axis=[0, 1]) == A.sum()  # Same as A.sum()

tensor(True)

In [17]:
A.mean(), A.sum(), A.numel()
# numel returns total number of elements in the input tensor

(tensor(2.5000), tensor(15.), 6)

In [18]:
A.mean(), A.sum() / A.numel()

(tensor(2.5000), tensor(2.5000))

## Non-reduction Sum

In [19]:
print(A)
sum_A = A.sum(axis=1, keepdims=True) # keeps the same dimension as A (two axes)
sum_A, sum_A.shape

tensor([[0., 1., 2.],
        [3., 4., 5.]])


(tensor([[ 3.],
         [12.]]),
 torch.Size([2, 1]))

In [20]:
A.sum(axis=1) # doesn't keep the dimension same 

tensor([ 3., 12.])

In [21]:
# we can divide A by sum_A with broadcasting to create a matrix where each row sums to 1
A / sum_A

tensor([[0.0000, 0.3333, 0.6667],
        [0.2500, 0.3333, 0.4167]])

In [22]:
A.cumsum(axis=0) # how?

tensor([[0., 1., 2.],
        [3., 5., 7.]])

## Dot Products

In [23]:
y = torch.ones(3, dtype=torch.float32)
x, y, torch.dot(x, y)

(tensor([0., 1., 2.]), tensor([1., 1., 1.]), tensor(3.))

In [24]:
# another way to calculate dot product
x * y, torch.sum(x * y)

(tensor([0., 1., 2.]), tensor(3.))

# Matrix-Vector Products

In [25]:
A.shape, x.shape, torch.mv(A,x), A@x

(torch.Size([2, 3]), torch.Size([3]), tensor([ 5., 14.]), tensor([ 5., 14.]))

## Matrix-Matrix Multiplication

In [26]:
B = torch.ones(3, 4)
torch.mm(A, B), A@B

(tensor([[ 3.,  3.,  3.,  3.],
         [12., 12., 12., 12.]]),
 tensor([[ 3.,  3.,  3.,  3.],
         [12., 12., 12., 12.]]))

## Norms
Norm of a vector tells us how big it is. l<sub>2</sub> norm measures the (Euclidean) length of a vector. We are employing a notion of size that concerns the magnitude of a vector's components (not it's dimensionality).
<br>
A norm is a function that maps a vector to a scaler.

In [27]:
u = torch.tensor([3.0, -4.0]) # sqrt(3^2 + 4^2)
torch.norm(u)

tensor(5.)

l<sub>1</sub> norm measures the Manhattan distance. Sums the absolute values of a vector's elements

In [28]:
torch.abs(u).sum()

tensor(7.)

In deep learning, we are often trying to solve optimization problems: maximize the probability assigned to observed data; maximize the revenue associated with a recommender model; minimize the distance between predictions and the ground truth observations; minimize the distance between representations of photos of the same person while maximizing the distance between representations of photos of different people. These distances, which constitute the objectives of deep learning algorithms, which are expressed as norms.

# Questions

In [30]:
A = torch.ones(6).reshape(2, 3)
A.T.T == A ## transpose of the matrix of the matrix is the matrix itself

tensor([[True, True, True],
        [True, True, True]])

In [31]:
B = torch.tensor([[1, 2, 3], [4, 5, 6]])

A.T + B.T == (A + B).T

tensor([[True, True],
        [True, True],
        [True, True]])

In [34]:
X = torch.ones(9).reshape(3, -1)
X + X.T # square matrix A + A.T is always symmetric. (transpose of the matrix is equal to matrix itself)

tensor([[2., 2., 2.],
        [2., 2., 2.],
        [2., 2., 2.]])

In [37]:
X = torch.ones(24).reshape(2, 3, 4)
X, len(X)

(tensor([[[1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.]],
 
         [[1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.]]]),
 2)

In [40]:
A = torch.randn(160).reshape(10, 16)
B = torch.randn(16*5).reshape(16, 5)
C = torch.randn(5*14).reshape(5, 14)

import time
t0 = time.time()
D = (A.dot(B)).dot(C)
t1 = time.time()
t2 = time.time()
E = A.dot(B.dot(C))
t3 = time.time()

t1 - t0, t3 - t2

RuntimeError: 1D tensors expected, but got 2D and 2D tensors