# <font color = 'pickle'>**Lecture 1.2. Linear Algebra Pytorch**

We can perform basic algebraic functions like, addition, subtraction, multiplication, division using pytorch.

But before moving on to that, we have to get a basic understanding about some common terms such as scalars and vectors.


## <font color = 'pickle'>**Scalars**

Scalars are values consisting of just one numerical quantity. Let us consider an example of scalers. If we go to some restaurant, our bill would be charged by adding the price of all items (p) that we had along with some service charge. Let us consider the service charge is 5%. So we can represent our total bill (y) as:

    y = p + 0.05*p

Now, in this equation, 0.05 is a `scalar value`. The placeholders y and p are know as variables as they represent unknown scalar values.



In [None]:
# Import PyTorch Library
import torch

We can create a scalar using a 1-d Tensor of size 1 as shown below.

Let's create some tensors and perform basic algebraic functions on them. 

In [None]:
# Creating 2 scalars using tensors
t1 = torch.Tensor([10.0])
t2 = torch.Tensor([2.0])

# Addition : ans = 12
print(t1 + t2)

#Subtraction : ans = 8
print(t1 - t2)

# Multiplication : ans = 20
print(t1 * t2)

# Division : ans = 5
print(t1 / t2)

# Power function : ans = 10^2 = 100
print(t1 ** t2)

# Exponentiation
print(torch.exp(t1))

tensor([12.])
tensor([8.])
tensor([20.])
tensor([5.])
tensor([100.])
tensor([22026.4648])


## <font color = 'pickle'>**Vectors**

Vectors are basically a list of scalar elements. They are used to represent real world datasets. For example, if we have to represent marks of a student in 5 subjects we can write them in a sequence as [98, 95, 96, 94, 92]. 


Thus, vectors are needed to represent real-world entities.

Precisely, a vector $\mathbf{x}$ can be written as

$$\mathbf{y} =\begin{bmatrix}y_{1}  \\y_{2}  \\ \vdots  \\y_{n}\end{bmatrix}$$

Vectors are a 1-d tensors with size n.

Let us create some vectors using tensors and perform some basic operations on them.



In [None]:
# Creating 2 vectors using tensors
v1 = torch.Tensor([2, 5, 7])
v2 = torch.Tensor([3, 6, 8])

In [None]:
# Elements can be accessed using index
print(v1[2])

# Length of the vector
print(len(v1))

# Shape of the vector
print(v1.shape)

tensor(7.)
3
torch.Size([3])


In [None]:
# Addition of 2 vectors
print(v1 + v2)

# Subtraction of 2 vectors
print(v2 - v1)

# Exponentiation (Element-wise)
print(torch.exp(v1))

tensor([ 5., 11., 15.])
tensor([1., 1., 1.])
tensor([   7.3891,  148.4132, 1096.6332])


## <font color = 'pickle'>**Matrices/Tensors**
 
Matrices are 2-d arrays with size `n x m`. Here, n: number of rows and m: number of columns.

If `m = n`, then the matrix is known as a `square matrix`.

Precisely, matrices can be represented as:
$$\mathbf{X}=\begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n} \\ x_{21} & x_{22} & \cdots & x_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ x_{m1} & x_{m2} & \cdots & x_{mn} \\ \end{bmatrix}$$
<br>

Tensors are n-dimenional (n-d) arrays with arbitrary number of axes.

Scalars, Vectors and Metrices are `0-d, 1-d and 2-d` tensors respectively. To represent color images we need 3-axes or dimensions (Color channels (red, green and blue), height, and width). Hence color images can be represented using 3-d tensors.

Let us look at some of the operations on matrices/tensors.


### <font color = 'pickle'>**Creating Merices**

In [None]:
# Creating a matrix using tensor

# Initializing with random values
A = torch.Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Creating an identity matrix
B = torch.eye(3)

# Creating matrix having elements in a range
C = torch.arange(0,10).reshape(2, 5)

# Viewing all the matrices
print(A.view(3, 3))
print(B.view(3, 3))
print(C.view(2, 5))

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])
tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])
tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])


### <font color = 'pickle'>**Accessing Elements**

In [None]:
# Accessing elements using index values

# To get an element from matrix A which is in row 2 and column 3 i.e 6, we write A[1][2] as indexing starts from 0.
A[1][2]

tensor(6.)

### <font color = 'pickle'>**Copying metrices (tensors)**
We can use clone function to copy tensors : A = B.clone().detach()
- clone function is recorded in the computation graph. (Computation Graphs are explained in next notebook).
- Gradients propagating to the cloned tensor will propagate to the original tensor.
- Use detach() to disconnect the computation graph from the cloned tensor.
- see other ways to copy tensor here <br>
 https://stackoverflow.com/questions/55266154/pytorch-preferred-way-to-copy-a-tensor


In [None]:
# Creating 2 matrices

# First matrix
A = torch.arange(0, 25).reshape(5, 5)

# Second matrix : copy of A
B = A.clone()

print(A)
print(B)

tensor([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24]])
tensor([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24]])


### <font color = 'pickle'>**Operations on Metrices** 

In [None]:
# Addition of 2 matrices
A + B

tensor([[ 0,  2,  4,  6,  8],
        [10, 12, 14, 16, 18],
        [20, 22, 24, 26, 28],
        [30, 32, 34, 36, 38],
        [40, 42, 44, 46, 48]])

In [None]:
# Subtraction of 2 matrices
A - B

tensor([[0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0]])

In [None]:
# Elementwise multiplication of two metrices is called Hadamard product
A * B

tensor([[  0,   1,   4,   9,  16],
        [ 25,  36,  49,  64,  81],
        [100, 121, 144, 169, 196],
        [225, 256, 289, 324, 361],
        [400, 441, 484, 529, 576]])

In [None]:
# Each element of matrix can be aded or multiplied by a scalar (broadcasting)
# This operation will not change the shape of a matrix or a Tensor
a = 2
print(a + A)
print()
print(a * A)

tensor([[ 2,  3,  4,  5,  6],
        [ 7,  8,  9, 10, 11],
        [12, 13, 14, 15, 16],
        [17, 18, 19, 20, 21],
        [22, 23, 24, 25, 26]])

tensor([[ 0,  2,  4,  6,  8],
        [10, 12, 14, 16, 18],
        [20, 22, 24, 26, 28],
        [30, 32, 34, 36, 38],
        [40, 42, 44, 46, 48]])


In [None]:
# Transpose of a matrix : Elements of the rows and columns get interchanged a[i][j] becomes a[j][i]
# Transpose is a special case of permute 
A.T

tensor([[ 0,  5, 10, 15, 20],
        [ 1,  6, 11, 16, 21],
        [ 2,  7, 12, 17, 22],
        [ 3,  8, 13, 18, 23],
        [ 4,  9, 14, 19, 24]])

<font color = 'pickle'>**Symmetric Matrix : A special type of square matrix which is equal to its transpose.** 

In [None]:
# Check for symmetric matrix
X = torch.tensor([[1, 2, 3], [2, 0, 4], [3, 4, 5]])

X == X.T

tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])

Since X is equal to it's transpose, X is a symmetric matrix.

We can also construct binary tensor using logical statements. Let's take two matrices A and B for example. If A and B are equal at some position, then A == B will be true and if A and B are not equal at some position, then that position will be false.

In [None]:
A = torch.zeros(3,3)
B = torch.eye(3,3)
A == B

tensor([[False,  True,  True],
        [ True, False,  True],
        [ True,  True, False]])

## <font color = 'pickle'>**Reduction**

We can calculate the sum of all elemnets of a vector or a matrix of any shape. This can be done using the ***sum*** function.



In [None]:
# Creating a vector
x = torch.arange(5)
print(x)

# This will do summation of all the elements of the vector : 0 + 1 + 2 + 3 + 4 = 10
print(x.sum())

tensor([0, 1, 2, 3, 4])
tensor(10)


In [None]:
# Creating a matrix
X = torch.arange(0, 10).reshape(2, 5)
print(X)

# This will do summation of all the elements of the matrix
print(X.sum())

tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])
tensor(45)


We can also calculate the mean or average of all elements in a vector or a matrix by dividing the sum of elements by no. of elements. 

In [None]:
# Calculating mean / average

# While creating a tensor we can also specify the data type of the elements using parameter "dtype"

v = torch.arange(5, dtype = float) 

print(v)

# Sum of all elements
print(v.sum())

# Number of elements
print(v.numel())

# Mean
print(v.sum() / v.numel())
print(v.mean())             # Same as above

tensor([0., 1., 2., 3., 4.], dtype=torch.float64)
tensor(10., dtype=torch.float64)
5
tensor(2., dtype=torch.float64)
tensor(2., dtype=torch.float64)


By default, invoking the sum/mean finction on a tensor will give us a scaler (reduces the tensor along all its axes)

We can also calculate sum, along the rows or columns by specifying the value of parameter "axis".

axis = 0 will calculate sum along the rows while axis = 1 will calculate sum along the columns.




In [None]:
# Creating a matrix A
A = torch.arange(0, 15, dtype = float).reshape(5, 3)
A

tensor([[ 0.,  1.,  2.],
        [ 3.,  4.,  5.],
        [ 6.,  7.,  8.],
        [ 9., 10., 11.],
        [12., 13., 14.]], dtype=torch.float64)

In [None]:
# Sum of elements row-wise
# Since we are taking sum along axis = 0, the input tensor reduces along axis 0
# The shape of A was ([5,3])
# After invoking sum along axis = 0, the shape reduces to ([3])
print(f'Shape before rediction{A.shape}')
A.sum(axis = 0), A.sum(axis=0).shape

Shape before redictiontorch.Size([5, 3])


(tensor([30., 35., 40.], dtype=torch.float64), torch.Size([3]))

In [None]:
# Sum of elements column-wise
# Since we are taking sum along axis = 1, the input tensor reduces along axis 1
# The shape of A was ([5,3])
# After invoking sum along axis = 1, the shape redices to ([5])
A.sum(axis = 1), A.sum(axis = 1).shape

(tensor([ 3., 12., 21., 30., 39.], dtype=torch.float64), torch.Size([5]))

## <font color = 'pickle'>**Non-Reduction Sum**

As seem in above examples, invoking sum() or mean() will reduce number of dimensions. We can keep number of axis unchanged by passing argument keepdims = True.

In [None]:
# however if we do not pass the argument keepdim=true, 
# the two tensors are not broadcastabel and we will get an error
print(A/A.sum(axis=1, keepdim=False))

RuntimeError: ignored

In [None]:
# When we pass argument keepdim=True, the shape will now be ([5,1]. The output has 2-dimensions
# if we do not pass the argument keepdim=True, the shape will be ([5]). The output has one-dimension
sum_A_0 = A.sum(axis=1, keepdim=True)
print(sum_A_0.shape)
print(sum_A_0)

torch.Size([5, 1])
tensor([[ 3.],
        [12.],
        [21.],
        [30.],
        [39.]], dtype=torch.float64)


In [None]:
# Let us now try operation : A/(sum(A, axis=0))
print(A/A.sum(axis=1, keepdim=True))

tensor([[0.0000, 0.3333, 0.6667],
        [0.2500, 0.3333, 0.4167],
        [0.2857, 0.3333, 0.3810],
        [0.3000, 0.3333, 0.3667],
        [0.3077, 0.3333, 0.3590]], dtype=torch.float64)


In [None]:
A

tensor([[ 0.,  1.,  2.],
        [ 3.,  4.,  5.],
        [ 6.,  7.,  8.],
        [ 9., 10., 11.],
        [12., 13., 14.]], dtype=torch.float64)

In [None]:
# Cumulative sum of elements along rows
A.cumsum(axis = 0)

tensor([[ 0.,  1.,  2.],
        [ 3.,  5.,  7.],
        [ 9., 12., 15.],
        [18., 22., 26.],
        [30., 35., 40.]], dtype=torch.float64)

In [None]:
# Cumulative sum of elements along columns
A.cumsum(axis = 1)

tensor([[ 0.,  1.,  3.],
        [ 3.,  7., 12.],
        [ 6., 13., 21.],
        [ 9., 19., 30.],
        [12., 25., 39.]], dtype=torch.float64)

## <font color = 'pickle'>**Dot Products**

Dot product of 2 vectors x and y, represented as (x.T)(y) is given by the summation of product of elements at the same position. 

If we have 2 vectors x: [1, 2, 3, 4] and y: [1, 1, 2, 1]

(x.y) will be 1x1 + 2x1 + 3x2 + 4x1 = 13

In [None]:
# Initializing 2 tensors
x = torch.Tensor([1, 2, 3, 4])
y = torch.Tensor([1, 1, 2, 1])

# Performing Dot product
torch.dot(x, y)

tensor(13.)

In [None]:
# Dot Product is equal to sum of products at the same position, thus the expression below will give similar result
torch.sum(x * y)

tensor(13.)

## <font color = 'pickle'>**Matrix Multiplication**

Matrix multiplication is a binary operation on 2 matrices which gives us a matrix which is the product of the 2 matrices. 

If we are given 2 matrices $A$ of shape $(m * n)$ and $B$ of shape $(q * p)$, we can perform matrix multiplication only when $n = q$ and the resultant product matrix will have shape $(m * p)$. 

Suppose we are given 2 matrices $A (m * n)$ and $B (n * p)$: 

$$\mathbf{A}=\begin{bmatrix}
 a_{11} & a_{12} & \cdots & a_{1n} \\
 a_{21} & a_{22} & \cdots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
 a_{m1} & a_{m2} & \cdots & a_{mn} \\
\end{bmatrix},\quad
\mathbf{B}=\begin{bmatrix}
 b_{11} & b_{12} & \cdots & b_{1p} \\
 b_{21} & b_{22} & \cdots & b_{2p} \\
\vdots & \vdots & \ddots & \vdots \\
 b_{n1} & b_{n2} & \cdots & b_{np} \\
\end{bmatrix}$$

Then after performing matrix multiplication, the resultant matrix C = AB will be:

$$\mathbf{C}=\begin{bmatrix}
 c_{11} & c_{12} & \cdots & c_{1p} \\
 c_{21} & c_{22} & \cdots & c_{2p} \\
\vdots & \vdots & \ddots & \vdots \\
 c_{mp} & c_{mp} & \cdots & c_{mp} \\
\end{bmatrix}$$

Here, $c_{ij} = a_{i1}b_{1j} + a_{i2}b_{2j} + ... a_{in}b_{b_nj} = \sum_{k = 1}^n a_{ik}b_{kj}$

for, $i = 1,....m$ and $j = 1,...p$


Thus, each element of C, $c_{ij}$ is obtained by dot product of $i^{th}$ row of $A$ and $j^{th}$ column of $B$. 

We can perform matrix multiplication in the following way using PyTorch:

In [None]:
# Initializing 2 matrices
A = torch.arange(0, 10, dtype=float).reshape(2, 5)
B = torch.ones(5, 2, dtype=float)

# Matrix-Matrix Multiplication using mm function of PyTorch 
torch.mm(A, B)

tensor([[10., 10.],
        [35., 35.]], dtype=torch.float64)

### <font color = 'pickle'>**Matrix - Vector Multiplication**

This a subset of normal matrix multiplication. In this case, one of the two input tensors is a one dimensional tensor.

Suppose we have a matrix $A$ of shape $(m * n)$ and a vector $v$ of shape $(n)$:

$$\mathbf{A}=\begin{bmatrix}
 a_{11} & a_{12} & \cdots & a_{1n} \\
 a_{21} & a_{22} & \cdots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
 a_{m1} & a_{m2} & \cdots & a_{mn} \\
\end{bmatrix},\quad
\mathbf{v}=\begin{bmatrix}
 v_{1} \\
 v_{2} \\
\vdots \\
 v_{n} \\
\end{bmatrix}$$

In this case, we will consider vector to be of shape $(n * 1)$ and will perform normal matrix multiplication.

The resultant matrix $R$ will be of shape $(m * 1)$ given as:

$$\mathbf{R}=\begin{bmatrix}
 a_{11}v_1 + a_{12}v_2 + \cdots + a_{1n}v_n \\
 a_{21}v_1 + a_{22}v_2 + \cdots + a_{2n}v_n\\
\vdots \\
 a_{m1}v_1 + a_{m2}v_2 + \cdots + a_{mn}v_n\\
\end{bmatrix}$$


In [None]:
# Initializing a matrix and a vector
A = torch.arange(0, 9).reshape(3, 3)
v = torch.arange(3)

# Matrix-Vector Multiplication using mv function of PyTorch
torch.mv(A, v)

tensor([ 5, 14, 23])

## **Norms**

Norm is a function on a vector that tells us about the size of the vector.



### <font color = 'pickle'>**p - Norm**

The most commonly used norms are clubbed under p-norms or (lₚ-norms) family, where p is any number greater than or equal to 1.

p-Norm of a vector X is given by:

$$\|\mathbf{x}\|_p = (x_1^p + x_2^p + x_3^p + \cdots + x_n^p)^{1/p}$$

It could be simplified as: 

$$\|\mathbf{x}\|_p = \left(\sum_{i=1}^n \left|x_i \right|^p \right)^{1/p}.$$

### <font color = 'pickle'>**$L_1$ norm**

$L_1$ norm is also known as the Manhattan Distance as it measures the actual distance between 2 points.

$L_1$ norm is given by putting $p = 1$ in p - norm as:

$$\|\mathbf{x}\|_1 = {\sum_{i=1}^n \left|x_i \right|}$$

If we have vector $v = [2, 3]$, then $L_1$ norm will be:

$$\|\mathbf{v}\|_1 = (2^1 + 3^1)^1$$

$$\|\mathbf{v}\|_1 = 5$$



In [None]:
# Initializing a tensor (vector)
t = torch.tensor([-3.0, 4.0])

# Calculating l1 norm
n = torch.norm(t, p=1)
print(n)

tensor(7.)


### <font color = 'pickle'>**$L_2$ norm**

$L_2$ norm is also known as the Euclidean Distance as it measures the shortest distance between 2 points. 

$L_2$ norm is given by:

$$\|\mathbf{x}\|_2 = \sqrt{\sum_{i=1}^n x_i^2},$$


If we have vector $v = [2, 3]$, then $L_2$ norm will be:

$$\|\mathbf{v}\|_2 = (2^2 + 3^2)^{1/2}$$

$$\|\mathbf{v}\|_2 = \sqrt {13}$$

In deep learning, we will usually work with squared $L_2$ norm. 

We can calculate $L_2$ norm using the norm() function of PyTorch as given below:


In [None]:
# Initializing a tensor (matrix)
t = torch.tensor([3.0, 4])

# Calculating l2 norm
n = torch.norm(t)
print(n)

tensor(5.)


In deep learning we always try to solve optimization problems:

- Maximize the probability assigned to observed data.
- Minimize the distance between predicted and actual values.

We aim to maximize the distance between similar items and minimize the distance between dissimilar items (euclidean distance using $L_2$ norm). Thus, norms are frequently used in deep learning algorithms.