http://d2l.ai/chapter_preliminaries/linear-algebra.html

we adopt the mathematical notation where scalar variables are denoted by ordinary lower-cased letters (e.g.,  x ,  y , and  z ). We denote the space of all (continuous) real-valued scalars by  R . 

the expression  x∈R  is a formal way to say that  x  is a real-valued scalar. The symbol  ∈  can be pronounced “in” and simply denotes membership in a set. Analogously, we could write  x,y∈{0,1}  to state that  x  and  y  are numbers whose value can only be  0  or  1 .

In [1]:
import torch
# scalars
x = torch.tensor([3.0])
y = torch.tensor([2.0])

x + y, x * y, x / y, x**y

(tensor([5.]), tensor([6.]), tensor([1.5000]), tensor([9.]))

In [2]:
# Vectors
x = torch.arange(4)
x


tensor([0, 1, 2, 3])

Extensive literature considers column vectors to be the default orientation of vectors, so does this book. 

In [3]:
x[3]

tensor(3)

In math notation, if we want to say that a vector  x  consists of  n  real-valued scalars, we can express this as  x∈Rn . The length of a vector is commonly called the dimension of the vector.

In [4]:
len(x)

4

In [5]:
x.shape

torch.Size([4])

In [6]:
# Matrices
A = torch.arange(20).reshape(5, 4)
A

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19]])

In [7]:
# transpose by T attribute
A.T

tensor([[ 0,  4,  8, 12, 16],
        [ 1,  5,  9, 13, 17],
        [ 2,  6, 10, 14, 18],
        [ 3,  7, 11, 15, 19]])

As a special type of the square matrix, a symmetric matrix  A  is equal to its transpose:  A=A⊤ .

In [9]:
B = torch.tensor([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
B

tensor([[1, 2, 3],
        [2, 0, 4],
        [3, 4, 5]])

In [10]:
B == B.T

tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])

 Tensors are denoted with capital letters of a special font face (e.g., X, Y, and Z) and their indexing mechanism (e.g., xijk and [X]1,2i−1,3) is similar to that of matrices.

In [11]:
X = torch.arange(24).reshape(2, 3, 4)
X

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])

 For example, adding two matrices of the same shape performs elementwise addition over these two matrices.

In [12]:
A = torch.arange(20, dtype = torch.float32).reshape(5, 4)
B = A.clone()  # Assign a copy of `A` to `B` by allocating new memory
A, A + B

(tensor([[ 0.,  1.,  2.,  3.],
         [ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.],
         [12., 13., 14., 15.],
         [16., 17., 18., 19.]]),
 tensor([[ 0.,  2.,  4.,  6.],
         [ 8., 10., 12., 14.],
         [16., 18., 20., 22.],
         [24., 26., 28., 30.],
         [32., 34., 36., 38.]]))

Specifically, elementwise multiplication of two matrices is called their Hadamard product (math notation  ⊙ ). Consider matrix  B∈Rm×n  whose element of row  i  and column  j  is  bij . The Hadamard product of matrices  A  (defined in :eqref:eq_matrix_def) and  B

In [14]:
a = 2
X = torch.arange(24).reshape(2, 3, 4)
a + X, a * X,(a * X).shape

(tensor([[[ 2,  3,  4,  5],
          [ 6,  7,  8,  9],
          [10, 11, 12, 13]],
 
         [[14, 15, 16, 17],
          [18, 19, 20, 21],
          [22, 23, 24, 25]]]),
 tensor([[[ 0,  2,  4,  6],
          [ 8, 10, 12, 14],
          [16, 18, 20, 22]],
 
         [[24, 26, 28, 30],
          [32, 34, 36, 38],
          [40, 42, 44, 46]]]),
 torch.Size([2, 3, 4]))

In [15]:
x = torch.arange(4, dtype = torch.float32)
x, x.sum()

(tensor([0., 1., 2., 3.]), tensor(6.))

 For example, the sum of the elements of an  m×n  matrix  A  could be written  ∑mi=1 ∑nj=1 aij .

In [16]:
A.shape, A.sum()

(torch.Size([5, 4]), tensor(190.))

By default, invoking the sum function reduces a tensor along all its axes to a scalar. We can also specify the axes along which the tensor is reduced via summation.
Take matrices as an example. To reduce the row dimension (axis 0) by summing up elements of all the rows, we specify axis=0 when invoking sum. Since the input matrix reduces along axis 0 to generate the output vector, the dimension of axis 0 of the input is lost in the output shape.


In [17]:
A_sum_axis0 = A.sum(axis=0)
A_sum_axis0, A_sum_axis0.shape

(tensor([40., 45., 50., 55.]), torch.Size([4]))

In [18]:
A_sum_axis1 = A.sum(axis=1)
A_sum_axis1, A_sum_axis1.shape

(tensor([ 6., 22., 38., 54., 70.]), torch.Size([5]))

In [19]:
A.sum(axis=[0, 1])  # Same as A.sum()

tensor(190.)

In [20]:
A.mean(), A.sum() / A.numel()

(tensor(9.5000), tensor(9.5000))

In [21]:
A.mean(axis=0), A.sum(axis=0) / A.shape[0]

(tensor([ 8.,  9., 10., 11.]), tensor([ 8.,  9., 10., 11.]))

In [22]:
sum_A = A.sum(axis=1, keepdims=True)
sum_A

tensor([[ 6.],
        [22.],
        [38.],
        [54.],
        [70.]])

In [23]:
A / sum_A

tensor([[0.0000, 0.1667, 0.3333, 0.5000],
        [0.1818, 0.2273, 0.2727, 0.3182],
        [0.2105, 0.2368, 0.2632, 0.2895],
        [0.2222, 0.2407, 0.2593, 0.2778],
        [0.2286, 0.2429, 0.2571, 0.2714]])

calculate the cumulative sum of elements of A along some axis, say axis=0 (row by row), we can call the cumsum function

In [24]:
A.cumsum(axis=0)

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  6.,  8., 10.],
        [12., 15., 18., 21.],
        [24., 28., 32., 36.],
        [40., 45., 50., 55.]])

Given two vectors  x,y∈Rd , their dot product  x⊤y  (or  ⟨x,y⟩ ) is a sum over the products of the elements at the same position:  x⊤y=∑di=1xiyi .

In [25]:
y = torch.ones(4, dtype = torch.float32)
x, y, torch.dot(x, y)

(tensor([0., 1., 2., 3.]), tensor([1., 1., 1., 1.]), tensor(6.))

In [26]:
torch.sum(x * y)

tensor(6.)

Dot products are useful in a wide range of contexts. For example, given some set of values, denoted by a vector  x∈Rd  and a set of weights denoted by  w∈Rd , the weighted sum of the values in  x  according to the weights  w  could be expressed as the dot product  x⊤w . When the weights are non-negative and sum to one (i.e.,  (∑di=1wi=1) ), the dot product expresses a weighted average. 

After normalizing two vectors to have the unit length, the dot products express the cosine of the angle between them. We will formally introduce this notion of length later in this section.

Expressing matrix-vector products in code with tensors, we use the same dot function as for dot products. When we call np.dot(A, x) with a matrix A and a vector x, the matrix-vector product is performed. Note that the column dimension of A (its length along axis 1) must be the same as the dimension of x (its length).

In [27]:
A.shape, x.shape, torch.mv(A, x)

(torch.Size([5, 4]), torch.Size([4]), tensor([ 14.,  38.,  62.,  86., 110.]))

In [30]:
B = torch.ones(4, 3)
B


tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

In [31]:
# 5*4, 4*3==>5*3
torch.mm(A, B)

tensor([[ 6.,  6.,  6.],
        [22., 22., 22.],
        [38., 38., 38.],
        [54., 54., 54.],
        [70., 70., 70.]])

 The  ℓ2  norm of  x  is the square root of the sum of the squares of the vector elements:

In [32]:
u = torch.tensor([3.0, -4.0])
torch.norm(u)

tensor(5.)

To calculate the  ℓ1  norm, we compose the absolute value function with a sum over the elements.

In [34]:
torch.abs(u).sum()

tensor(7.)

Analogous to  ℓ2  norms of vectors, the Frobenius norm of a matrix  X∈Rm×n  is the square root of the sum of the squares of the matrix elements:

$\|\mathbf{X}\|_{F}=\sqrt{\sum_{i=1}^{m} \sum_{j=1}^{n} x_{i j}^{2}}$


In [35]:
torch.norm(torch.ones((4, 9)))

tensor(6.)

Oftentimes, the objectives, perhaps the most important components of deep learning algorithms (besides the data), are expressed as norms.