## Linear Algebra

We understand the importance of vectors, matrics and tensors and how to perform algebraic operations on them. Also, some relevant mathematical operations that will come handy during model preparation and training

In [1]:
import torch
x, y = torch.tensor(2), torch.tensor(3.)
print(x, y) # Scalar

x = torch.arange(3)
print(x, x.shape) # Vectors

X = torch.arange(6).view((2,3))
print(X) # Matrics

X_transpose = X.T
print(X_transpose) # X_transpose_ij = X_ji

X = torch.arange(24).view((2,3,4))
print(X) # Tensor



tensor(2) tensor(3.)
tensor([0, 1, 2]) torch.Size([3])
tensor([[0, 1, 2],
        [3, 4, 5]])
tensor([[0, 3],
        [1, 4],
        [2, 5]])
tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])


### Reduction and Non-Reduction of Tensor

We can reduce the tensor along any axis with `sum()` function. Also, we can sum only along a particular axis and not reduce it completely.

The `keepdims` parameter keeps the dimensions along differents axes, this helps when we want to normallize along some direction.

In [2]:
x = torch.arange(3, dtype=torch.float32)
x, x.sum()

(tensor([0., 1., 2.]), tensor(3.))

In [3]:
A = torch.arange(6, dtype=torch.float32).view((2,3))
A, A.sum(), A.sum(axis=0), A.sum(axis=1)

(tensor([[0., 1., 2.],
         [3., 4., 5.]]),
 tensor(15.),
 tensor([3., 5., 7.]),
 tensor([ 3., 12.]))

In [4]:
sum_A = A.sum(axis=1, keepdims=True)
sum_A, sum_A.shape, A/sum_A # Using A/sum_A we were able to make the sum of all elements = 1 by using keepdims

(tensor([[ 3.],
         [12.]]),
 torch.Size([2, 1]),
 tensor([[0.0000, 0.3333, 0.6667],
         [0.2500, 0.3333, 0.4167]]))

### Dot Product and Cross Products

In [5]:
y = torch.ones(3, dtype = torch.float32)
x, y, torch.dot(x, y)

(tensor([0., 1., 2.]), tensor([1., 1., 1.]), tensor(3.))

In [6]:
B = torch.ones(3, 4)
A@B

tensor([[ 3.,  3.,  3.,  3.],
        [12., 12., 12., 12.]])

### Exercise

1. Prove that the transpose of the transpose of a matrix is the matrix itself:

In [8]:
A = torch.randn((3,3))
A_T = A.T
A_T_T = A_T.T

A == A_T_T

tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])

2. Given two matrices A and B, show that sum and transposition commute:

In [10]:
A = torch.randn((3,3))
B = torch.randn((3,3))

A, B, (A.T + B.T) == (A + B).T

(tensor([[-0.8139, -0.6017, -1.0057],
         [ 1.2051,  1.2881,  1.1881],
         [ 0.1341, -0.5966,  1.8152]]),
 tensor([[ 0.5449,  0.4095,  1.0028],
         [-0.0169,  0.8721,  0.5611],
         [ 1.5233, -0.6360, -1.9314]]),
 tensor([[True, True, True],
         [True, True, True],
         [True, True, True]]))

4. We defined the tensor X of shape (2, 3, 4) in this section. What is the output of len(X)? Write your answer without implementing any code, then check your answer using code.
Ans: 2

In [11]:
X = torch.randn((2,3,4))
len(X)

2

6. Run A / A.sum(axis=1) and see what happens. Can you analyze the results?

In [16]:
A = torch.arange(9).view((3,3))
print(A)             # 3x3
print(A.sum(axis=1)) # 3x1
# A.sum will get broadcasted for division
A / A.sum(axis=1)

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
tensor([ 3, 12, 21])


tensor([[0.0000, 0.0833, 0.0952],
        [1.0000, 0.3333, 0.2381],
        [2.0000, 0.5833, 0.3810]])

8. Consider a tensor of shape (2, 3, 4). What are the shapes of the summation outputs along axes 0, 1, and 2?

In [21]:
A = torch.arange(24).reshape((2,3,4))
print(A)
print(A.sum(axis=0), A.sum(axis=0).shape)
print(A.sum(axis=1), A.sum(axis=1).shape)

print(A.sum(axis=2), A.sum(axis=2).shape)

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])
tensor([[12, 14, 16, 18],
        [20, 22, 24, 26],
        [28, 30, 32, 34]]) torch.Size([3, 4])
tensor([[12, 15, 18, 21],
        [48, 51, 54, 57]]) torch.Size([2, 4])
tensor([[ 6, 22, 38],
        [54, 70, 86]]) torch.Size([2, 3])


Consider three large matrices, say A∈R2^10×2^16 , B∈B2^16×2^5 and C∈C2^5×2^14 , initialized with Gaussian random variables. You want to compute the product ABC:

Ans: Response by BARD<br>
Yes, there can be differences in memory footprint and speed depending on the order of matrix multiplications, even though the final result is the same.

Here's a breakdown of the factors to consider:

1. Memory Footprint:

(AB)C:
The intermediate product AB has dimensions 2^10 x 2^5, requiring about 2^15 elements to store.
A(BC):
The intermediate product BC has dimensions 2^16 x 2^14, requiring about 2^30 elements to store.
Clearly, (AB)C consumes less memory during computation as it involves a smaller intermediate product.
2. Speed:

Number of floating-point operations (FLOPs): Both computations require the same total number of FLOPs (roughly 2^49).
Cache utilization: The order of multiplications can affect how efficiently data is loaded and used from cache memory.
(AB)C: It might be more cache-friendly, as the smaller intermediate product AB can fit better in cache during subsequent multiplication with C.
3. Hardware and Algorithm Optimizations:

Matrix libraries: Optimized libraries like BLAS often have specific algorithms for different matrix dimensions and orders, which can impact speed.
Hardware parallelism: Some hardware architectures might be better suited for certain matrix multiplication orders.
In conclusion:

(AB)C is generally preferred for memory efficiency due to the smaller intermediate product.
A(BC) might be faster in some cases due to better cache utilization or hardware-specific optimizations, but it's more memory-intensive.
The optimal choice depends on the specific hardware, libraries, and matrix dimensions involved.
It's always recommended to benchmark both options to determine the most efficient approach for a given scenario.