<a href="https://colab.research.google.com/github/devesssi/dl-playground/blob/main/Copy_of_d2_playground.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**DAY 02**


Matrix multiplication (is all you need)
One of the most common operations in machine learning and deep learning algorithms (like neural networks) is matrix multiplication.

PyTorch implements matrix multiplication functionality in the torch.matmul() method.

The main two rules for matrix multiplication to remember are:

Matrix multiplication (is all you need)
One of the most common operations in machine learning and deep learning algorithms (like neural networks) is matrix multiplication.

PyTorch implements matrix multiplication functionality in the torch.matmul() method.

The main two rules for matrix multiplication to remember are:

The inner dimensions must match:
(3, 2) @ (3, 2) won't work
(2, 3) @ (3, 2) will work
(3, 2) @ (2, 3) will work
The resulting matrix has the shape of the outer dimensions:
(2, 3) @ (3, 2) -> (2, 2)
(3, 2) @ (2, 3) -> (3, 3)
Note: "@" in Python is the symbol for matrix multiplication.

Resource: You can see all of the rules for matrix multiplication using torch.matmul() in the [PyTorch documentation.
](https://docs.pytorch.org/docs/stable/generated/torch.matmul.html)

# Let's create a tensor and perform **element-wise multiplication** and **matrix multiplication** on it.



In [None]:
import torch

In [None]:
tensor = torch.tensor([1, 2, 3])
tensor

tensor([1, 2, 3])

In [None]:
tensor.shape

torch.Size([3])

The difference between element-wise multiplication and matrix multiplication is the addition of values.

For our tensor variable

Operation	Calculation	Code
Element-wise multiplication	[1x1, 2x2, 3x3] = [1, 4, 9]	tensor * tensor
Matrix multiplication	[1x1 + 2x2 + 3x3] = [14]	tensor.matmul(tensor)




In [None]:
# Element-wise matrix multiplication
tensor * tensor

tensor([1, 4, 9])

In [None]:
torch.matmul(tensor , tensor)

tensor(14)

In [None]:
# Can also use the "@" symbol for matrix multiplication, though not recommended
tensor @ tensor

tensor(14)


You can do matrix multiplication by hand but it's not recommended.

The in-built torch.matmul() method is faster.

In [None]:
%%time
#  " %%time "this is a magic command use to get the execution time of a single cell
# Matrix multiplication by hand
# (avoid doing operations with for loops at all cost, they are computationally expensive)
value = 0
for i in range(len(tensor)):
  value += tensor[i] * tensor[i]
value

CPU times: user 1.41 ms, sys: 0 ns, total: 1.41 ms
Wall time: 1.34 ms


tensor(14)

In [None]:
%%time
torch.matmul(tensor ,tensor)

CPU times: user 146 µs, sys: 0 ns, total: 146 µs
Wall time: 152 µs


tensor(14)

**One of the most common errors in deep learning (shape errors)**

Because much of deep learning is multiplying and performing operations on matrices and matrices have a strict rule about what shapes and sizes can be combined, one of the most common errors you'll run into in deep learning is shape mismatches



In [None]:
# Shapes need to be in the right way
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11],
                         [9, 12]], dtype=torch.float32)

torch.matmul(tensor_A, tensor_B) # (this will error)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

In [None]:
%%time

# Shapes need to be in the right way
tensor_A = torch.tensor([[1, 2,6],
                         [3, 4,7],
                         ], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11],
                         [9, 12]], dtype=torch.float32)

# torch.matmul(tensor_A, tensor_B) # (this will error)
# tensor_A.shape , tensor_B.shape
tensor_A@tensor_B



CPU times: user 1.31 ms, sys: 0 ns, total: 1.31 ms
Wall time: 1.18 ms


tensor([[ 77., 104.],
        [116., 158.]])

In [None]:
%%time
tensor_A@tensor_B


CPU times: user 723 µs, sys: 0 ns, total: 723 µs
Wall time: 637 µs


tensor([[ 77., 104.],
        [116., 158.]])


We can make matrix multiplication work between tensor_A and tensor_B by making their inner dimensions match.

One of the ways to do this is with a transpose (switch the dimensions of a given tensor).

You can perform transposes in PyTorch using either:

torch.transpose(input, dim0, dim1) - where input is the desired tensor to transpose and dim0 and dim1 are the dimensions to be swapped.
tensor.T - where tensor is the desired tensor to transpose.

In [None]:
# View tensor_A and tensor_B
print(tensor_A)
print(tensor_B)

tensor([[1., 2., 6.],
        [3., 4., 7.]])
tensor([[ 7., 10.],
        [ 8., 11.],
        [ 9., 12.]])


In [None]:
# View tensor_A and tensor_B.T
print(tensor_A)
print(tensor_B.T)

tensor([[1., 2., 6.],
        [3., 4., 7.]])
tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])


look when the dimension like 3x2 ,3x2 you can make the transpose of second one i.e the 3x2 --> 2x3 and hence the tensors can get multiplt acc to the rule: 3x2 , 2x3 inner no matches and the result will be of tensor 3x3



# Without the transpose, the rules of matrix multiplication aren't fulfilled and we get an error like above.


**Note: A matrix multiplication like this is also referred to as the dot product of two matrices.**



```
## NOTE MATRIX MUL IS ONE OF THE MOST COMMON OPERATION IN NN AND AS WE EXPLORE DEEP LEARNING```



**
Finding the min, max, mean, sum, etc (aggregation)**

In [None]:
tor1 = torch.arange(1, 200 , 20)
tor1

tensor([  1,  21,  41,  61,  81, 101, 121, 141, 161, 181])

In [None]:
#finding the min of the above tensors "min() is the function"
tor1.min()

tensor(1)

In [None]:
tor1.max()

tensor(181)

In [None]:
# you must specify the type of the output mean beforhand
tor1.mean()

RuntimeError: mean(): could not infer output dtype. Input dtype must be either a floating point or complex dtype. Got: Long

In [None]:
tor1.type(torch.float32).mean()

tensor(91.)

In [None]:
tor1.sum()

tensor(910)

In [None]:
# Create a tensor
x = torch.arange(0, 100, 10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [None]:
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
# print(f"Mean: {x.mean()}") # this will error
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatype
print(f"Sum: {x.sum()}")

Minimum: 0
Maximum: 90
Mean: 45.0
Sum: 450


# Positional min/max
You can also find the index of a tensor where the max or minimum occurs with **torch.argmax()** and **torch.argmin() **respectively.

This is helpful incase you just want the position where the highest (or lowest) value is and not the actual value itself (we'll see this in a later section when using the softmax activation function).


In [None]:
# Create a tensor
tensor = torch.arange(10, 100, 10)
print(f"Tensor: {tensor}")

# Returns index of max and min values
print(f"Index where max value occurs: {tensor.argmax()}")
print(f"Index where min value occurs: {tensor.argmin()}")

Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Index where max value occurs: 8
Index where min value occurs: 0


# Change tensor datatype
As mentioned, a common issue with deep learning operations is having your tensors in different datatypes.

If one tensor is in torch.float64 and another is in torch.float32, you might run into some errors.

But there's a fix.

You can change the datatypes of tensors using torch.Tensor.type(dtype=None) where the dtype parameter is the datatype you'd like to use.

First we'll create a tensor and check its datatype (the default is torch.float32).



In [None]:
# Create a tensor and check its datatype
tensor = torch.arange(10., 100., 10.)
tensor.dtype

torch.float32

In [None]:
tensor

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.])

Now we'll create another tensor the same as before but change its datatype to

In [None]:
# Create a float16 tensor
tensor_float16 = tensor.type(torch.float16)
tensor_float16

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

And we can do something similar to make a torch.int8 tensor.


In [None]:
# Create an int8 tensor
tensor_int8 = tensor.type(torch.int8)
tensor_int8

tensor([10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)


Note: Different datatypes can be confusing to begin with. But think of it like this, the lower the number (e.g. 32, 16, 8), the less precise a computer stores the value. And with a lower amount of storage, this generally results in faster computation and a smaller overall model. Mobile-based neural networks often operate with 8-bit integers, smaller and faster to run but less accurate than their float32 counterparts. For more on this, I'd read up about [precision in computing](https://en.wikipedia.org/wiki/Precision_(computer_science)).

In [None]:
tensor.cuda

<function Tensor.cuda>

In [None]:
tensor.dense_dim


<function Tensor.dense_dim>

Reshaping, stacking, squeezing and unsqueezing

In [None]:
# Create a tensor
import torch
x = torch.arange(1., 8.)
x, x.shape
# note the dot after the no specifies the system that it is floating point

(tensor([1., 2., 3., 4., 5., 6., 7.]), torch.Size([7]))


Now let's add an extra dimension with torch.reshape().

In [None]:
# Add an extra dimension
x_reshaped = x.reshape(2, 7)
x_reshaped, x_reshaped.shape

# note this: The error message RuntimeError: shape '[2, 7]' is invalid for input of size 7 indicates that you are trying to reshape a tensor with 7 elements into a shape that requires 2 * 7 = 14 elements. The number of elements in the original tensor (7) does not match the number of elements required by the target shape (14).

# Looking at the global variables, the tensor x has a shape of [7], meaning it has 7 elements. You are attempting to reshape it into a tensor of shape [2, 7]

RuntimeError: shape '[2, 7]' is invalid for input of size 7

In [None]:
import torch
x = torch.arange(1.,9.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7., 8.]), torch.Size([8]))

note: here i created a tensor of size i.e
elements 8 and i reshaped it into 2 ,4 and this time it worked because the no of elements requires in the 2,4 is 2*4 = 8 and there are 8 elements in the og tensor "x"

In [None]:
x2 = x.reshape(2,4)
x2, x2.shape

(tensor([[1., 2., 3., 4.],
         [5., 6., 7., 8.]]),
 torch.Size([2, 4]))

In [None]:
# Change view (keeps same data as original but changes view)
# See more: https://stackoverflow.com/a/54507446/7900723
z = x2.view(1, 8)
z, z.shape

(tensor([[1., 2., 3., 4., 5., 6., 7., 8.]]), torch.Size([1, 8]))

Remember changing the view of the tensor with torch.view() creates a new view of the same tensor (the new  shares the memory with the original tensor).

z[ : , 0] selects all rows (:) and the first column (0) of the tensor.


z[ : , 0] = 5 assigns the value 5 to every element in that selected column.

In [None]:
# Changing z changes x
z[:, 0] = 5
z, x

(tensor([[5., 2., 3., 4., 5., 6., 7., 8.]]),
 tensor([5., 2., 3., 4., 5., 6., 7., 8.]))

In [None]:
# Stack tensors on top of each other
x_stacked = torch.stack([x, x, x, x], dim=0)
x_stacked

tensor([[5., 2., 3., 4., 5., 6., 7., 8.],
        [5., 2., 3., 4., 5., 6., 7., 8.],
        [5., 2., 3., 4., 5., 6., 7., 8.],
        [5., 2., 3., 4., 5., 6., 7., 8.]])

If we change dim to dim=1, the tensors will be stacked along the column, which means they will be transposed and stacked along the column. This will result in a different shape of the output tensor, where the new dimension is inserted at position 1.



the dimension varies from [ -2 , 1 ] and -2 = 0 and -1 = 1

In [None]:
# try changing dim to dim=1 and see what happens
x_stacked = torch.stack([x, x, x, x], dim= -2)
x_stacked

tensor([[5., 2., 3., 4., 5., 6., 7., 8.],
        [5., 2., 3., 4., 5., 6., 7., 8.],
        [5., 2., 3., 4., 5., 6., 7., 8.],
        [5., 2., 3., 4., 5., 6., 7., 8.]])


# How about removing all single dimensions from a tensor?

To do so you can use torch.squeeze() (I remember this as squeezing the tensor to only have dimensions over 1).

In [None]:
print(f"Previous tensor: {x_stacked}")
print(f"Previous shape: {x_stacked.shape}")

# Remove extra dimension from x_reshaped
x_squeezed = x_stacked.squeeze()
print(f"\nNew tensor: {x_squeezed}")
print(f"New shape: {x_squeezed.shape}")

Previous tensor: tensor([[5., 2., 3., 4., 5., 6., 7., 8.],
        [5., 2., 3., 4., 5., 6., 7., 8.],
        [5., 2., 3., 4., 5., 6., 7., 8.],
        [5., 2., 3., 4., 5., 6., 7., 8.]])
Previous shape: torch.Size([4, 8])

New tensor: tensor([[5., 2., 3., 4., 5., 6., 7., 8.],
        [5., 2., 3., 4., 5., 6., 7., 8.],
        [5., 2., 3., 4., 5., 6., 7., 8.],
        [5., 2., 3., 4., 5., 6., 7., 8.]])
New shape: torch.Size([4, 8])



And to do the reverse of torch.squeeze() you can use torch.unsqueeze() to add a dimension value of 1 at a specific index.

tensor([[[0.4174],
         [0.9095],
         [0.6827]],

        [[0.8275],
         [0.6723],
         [0.9831]]])

In [None]:
y = torch.rand(1 ,2, 3)
y2 = torch.permute(y,(1,2,0))# this will convert the 0->1 , 1->2 , 2->0 (this are dimensions read like this at the o th dimension we place the 1st dimn )
print(f"this is original tensor: {y}")
print(f"this is permuted tensor: {y2}")
print(f"this is original shape: {y.shape}")
print(f"this is permuted shape: {y2.shape}")

this is original tensor: tensor([[[0.6255, 0.4127, 0.6934],
         [0.2493, 0.8843, 0.8894]]])
this is permuted tensor: tensor([[[0.6255],
         [0.4127],
         [0.6934]],

        [[0.2493],
         [0.8843],
         [0.8894]]])
this is original shape: torch.Size([1, 2, 3])
this is permuted shape: torch.Size([2, 3, 1])


# INDEXING

In [None]:
import torch

In [None]:
x = torch.arange(1,10).reshape(1, 3, 3)# .ararnge will give 9 eleme and .reshape 1*3*3 =9
x, x.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

In [None]:
#lets index our new tensor
x[0]

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

In [None]:
#lets index the on the middle bracket (dim=1)
x[0][0]

tensor([1, 2, 3])

In [None]:
#lets index the on theinner bracket (last dimn)
x[0][0][0]

tensor(1)

In [None]:
# we can also write like this instead of [][]
x[0, 0]

tensor([1, 2, 3])

In [None]:
# Get all values of 0th dimension and the 0 index of 1st dimension
x[:, 0]

tensor([[1, 2, 3]])

In [None]:
# Get all values of 0th & 1st dimensions but only index 1 of 2nd dimension
x[:, :, 1]

tensor([[2, 5, 8]])

In [None]:
# Get all values of the 0 dimension but only the 1 index value of the 1st and 2nd dimension
x[:, 1, 1]

tensor([5])

In [None]:
# Get index 0 of 0th and 1st dimension and all values of 2nd dimension
x[0, 0, :] # same as x[0][0]

tensor([1, 2, 3])

In [None]:
#index on x to return 9
x[0][2][2]

tensor(9)

In [None]:
# index on x to return 3,6,9
x[ :, :,2]

tensor([[3, 6, 9]])

# PyTorch tensors & NumPy

Since NumPy is a popular Python numerical computing library, PyTorch has functionality to interact with it nicely.

The two main methods you'll want to use for NumPy to PyTorch (and back again) are:

torch.from_numpy(ndarray) - NumPy array -> PyTorch tensor.
torch.Tensor.numpy() - PyTorch tensor -> NumPy array.
Let's try them out.


In [None]:
# NumPy array to tensor
import torch
import numpy as np

In [None]:
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array)
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

In [None]:
tensor.dtype

torch.float64

In [None]:
# note when converting the np array to tensor to convert we have to use .type()
tensor = torch.from_numpy(array).type(torch.float32)
tensor.dtype

torch.float32

In [None]:
tensor = torch.from_numpy(array,
                          dtype = float32)


NameError: name 'float32' is not defined

And if you want to go from PyTorch tensor to NumPy array, you can call tensor.numpy()

In [None]:
# Tensor to NumPy array
tensor = torch.ones(7) # create a tensor of ones with dtype=float32
numpy_tensor = tensor.numpy() # will be dtype=float32 unless changed
tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

In [None]:
numpy_tensor.dtype

dtype('float32')


And the same rule applies as above, if you change the original tensor, the new numpy_tensor stays the same.

In [None]:
# Change the tensor, keep the array the same
tensor = tensor + 1
tensor, numpy_tensor

(tensor([2., 2., 2., 2., 2., 2., 2.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

# Reproducibility (trying to take the random out of random)


As you learn more about neural networks and machine learning, you'll start to discover how much randomness plays a part.

Well, pseudorandomness that is. Because after all, as they're designed, a computer is fundamentally deterministic (each step is predictable) so the randomness they create are simulated randomness (though there is debate on this too, but since I'm not a computer scientist, I'll let you find out more yourself).

How does this relate to neural networks and deep learning then?

We've discussed neural networks start with random numbers to describe patterns in data (these numbers are poor descriptions) and try to improve those random numbers using tensor operations (and a few other things we haven't discussed yet) to better describe patterns in data.

In short:

start with random numbers -> tensor operations -> try to make better (again and again and again)

Although randomness is nice and powerful, sometimes you'd like there to be a little less randomness.

Why?

So you can perform repeatable experiments.

For example, you create an algorithm capable of achieving X performance.

And then your friend tries it out to verify you're not crazy.

How could they do such a thing?

That's where reproducibility comes in.

In other words, can you get the same (or very similar) results on your computer running the same code as I get on mine?

Let's see a brief example of reproducibility in PyTorch.

We'll start by creating two random tensors, since they're random, you'd expect them to be different right?

In [None]:
import torch

# Create two random tensors
random_tensor_A = torch.rand(3, 4)
random_tensor_B = torch.rand(3, 4)

print(f"Tensor A:\n{random_tensor_A}\n")
print(f"Tensor B:\n{random_tensor_B}\n")
print(f"Does Tensor A equal Tensor B? (anywhere)")
random_tensor_A == random_tensor_B

Tensor A:
tensor([[0.8539, 0.9268, 0.4546, 0.1180],
        [0.5832, 0.6901, 0.3991, 0.6317],
        [0.6297, 0.2111, 0.9767, 0.2704]])

Tensor B:
tensor([[0.6556, 0.0944, 0.8068, 0.6754],
        [0.1752, 0.2117, 0.8580, 0.6502],
        [0.2474, 0.1956, 0.8956, 0.1637]])

Does Tensor A equal Tensor B? (anywhere)


tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])

Just as you might've expected, the tensors come out with different values.

But what if you wanted to create two random tensors with the same values.

As in, the tensors would still contain random values but they would be of the same flavour.

That's where torch.manual_seed(seed) comes in, where seed is an integer (like 42 but it could be anything) that flavours the randomness.

Let's try it out by creating some more flavoured random tensors.

In [None]:
import torch
import random

# # Set the random seed
RANDOM_SEED=42 # try changing this to different values and see what happens to the numbers below
torch.manual_seed(seed=RANDOM_SEED)
random_tensor_C = torch.rand(3, 4)

# Have to reset the seed every time a new rand() is called
# Without this, tensor_D would be different to tensor_C
torch.random.manual_seed(seed=RANDOM_SEED) # try commenting this line out and seeing what happens
random_tensor_D = torch.rand(3, 4)

print(f"Tensor C:\n{random_tensor_C}\n")
print(f"Tensor D:\n{random_tensor_D}\n")
print(f"Does Tensor C equal Tensor D? (anywhere)")
random_tensor_C == random_tensor_D

Tensor C:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Tensor D:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Does Tensor C equal Tensor D? (anywhere)


tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])


Nice!

It looks like setting the seed worked.

Resource: What we've just covered only scratches the surface of reproducibility in PyTorch. For more, on reproducibility in general and random seeds, I'd checkout:

[The PyTorch reproducibility documentation](https://docs.pytorch.org/docs/stable/notes/randomness.html) (a good exercise would be to read through this for 10-minutes and even if you don't understand it now, being aware of it is important).
The [Wikipedia random seed page](https://en.wikipedia.org/wiki/Random_seed)` (this'll give a good overview of random seeds and pseudorandomness in general).