In [1]:
import torch
torch.__version__

'2.1.2+cu121'

Scalar is a single number.

In [2]:
scalar = torch.tensor(7)
scalar

tensor(7)

As seen below, it is only a rank-0 tensor since it is just simply one number.

In [3]:
scalar.ndim

0

If we want to view the scalar as a regular python integer, we use .item()

In [4]:
scalar.item()

7

Let's now take a look on vectors. Vector is one dimension or a rank-1 tensor. This dimension can contain many numbers. For example, you can have [3,2] to describe the number of [bedrooms, bathrooms] in yur house. Or even  [3,2,2] to describe the [bedrooms, bathrooms, carpark]. 

In [5]:
vector = torch.tensor([7,7])
vector

tensor([7, 7])

In [6]:
vector.ndim

1

Now, you might wonder as to why does vector have only one dimension when clearly there are two numbers inside it? Well, one of the easiest ways to distinguish the rank is to count the number of brackets there are on the outside. Just count one side. - such as [ 

Another key part of tensors is the shape of a dimension. This will tell you how many elements are inside the dimension.

In [7]:
vector.shape

torch.Size([2])

As we can see, it returned 2 because there are two elements inside the first dimension which as we specified earlier is [7,7]

Now, let's take a look how what a matrix is. 

In [8]:
MATRIX = torch.tensor([[7,8],
                       [9,10]])
MATRIX

tensor([[ 7,  8],
        [ 9, 10]])

In [9]:
MATRIX.ndim

2

It's returned 2 becuase this is a rank-2 tensor or a matrix. Did you count the first two brackets that you can see? 

In [10]:
MATRIX.shape

torch.Size([2, 2])

Here, it returned [2,2] because of our matrix is 2 elements wide and 2 elements deep. Let's take a look if we remove one element on the width and see the change.

In [11]:
MATRIX = torch.tensor([[4],
                       [5]])

In [12]:
MATRIX

tensor([[4],
        [5]])

In [13]:
MATRIX.ndim, MATRIX.shape

(2, torch.Size([2, 1]))

As you can see it still retains the 2 for the dimensions because the number of brackets hasn't really changed. However we did change the number of elements on it's width, we removed 1 element. Thus, our matrix is now [2,1] because it is 2 elements deep and 1 element wide.

Now, let's try making a tensor.

In [14]:
TENSOR = torch.tensor([[[1,2,3],
                        [4,5,6],
                        [7,8,9]]])

In [15]:
TENSOR

tensor([[[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]])

In [16]:
TENSOR.ndim, TENSOR.shape

(3, torch.Size([1, 3, 3]))

It is important to note that tensors can REPRESENT almost anything.

Now since there are 3 brackets on the outside, we can say that this is a rank-3 tensor. The torch.size can be easily explained by thinking of it as a series of bulletpoints

1. - First outer bracket has one element which is the entire table. 
2. - Second middle bracket has three elements or three rows - this refers to the apostrophes.
3. - Last inner bracket has three elements inside it - this refers to the numbers.

**NOTE:** Notice how the scalars and vectors are lowercased while the MATRIX and TENSORS are uppercase? This is so one can easily denote the sizes. The same practice is applied for letter variable names scalars and vectors can be 'a' / 'b' while MATRIX and TENSORS can be 'X / 'Y'.

At the same time, MATRIX and TENSORS can be used interchangeably. What's important is recognizing the shape and rank of the tensor.

In other words, a scalar is one number - vectors are arrays - matrices are 2D arrays - tensors are multidimensional arrays. While matrices and tensors can be used interchangeably.

**RANDOM TENSORS**

Tensors are representations of some form of data. Machine learning models try to seek and recognize patterns within these tensors. However, in the real world, you'll often find that you don't actually make tensors manually. 

Machine learning models actually start out with randomized weights, or basically randomly generated values. Then they adjust these values as they try to work through the data so that it can finally recognize it. 

In essence: 

start with random numbers -> analyze data -> update random numbers -> .... stop

This process can be split into initialization (starting with random data), representation (analyzes data), and optimization (updating data).

Let's create a tensor with random values using torch.rand(). This is a PyTorch function readily available to use. We pass a size parameter.

In [17]:
random_TENSOR = torch.rand(size=(3,4))
random_TENSOR, random_TENSOR.dtype

(tensor([[0.6073, 0.7951, 0.7115, 0.7323],
         [0.9058, 0.4945, 0.7131, 0.0901],
         [0.1513, 0.3903, 0.9654, 0.6571]]),
 torch.float32)

Notice how it created a rank-2 tensor with a 3 elements tall and 4 elements wide. Let's try making it into a rank-3 tensor. Let's see what happens then. We added in .dtype to show what the type the values are. We can see that these are floats.

In [18]:
big_random_TENSOR = torch.rand(size=(2,3,4))
big_random_TENSOR, big_random_TENSOR.dtype

(tensor([[[0.1578, 0.0352, 0.6950, 0.3020],
          [0.0110, 0.4866, 0.4896, 0.2532],
          [0.5094, 0.7880, 0.7868, 0.8477]],
 
         [[0.4499, 0.9702, 0.2452, 0.6330],
          [0.5993, 0.7212, 0.3038, 0.9270],
          [0.5676, 0.0656, 0.6930, 0.8717]]]),
 torch.float32)

In [19]:
big_random_TENSOR.shape

torch.Size([2, 3, 4])

That's big! We've created a rank-3 tensor with shape (2, 3, 4). You can also think of reading shapes [2,3,4] as this ->
* This means it has two rank-2 tensors along the first dimension (axis 0).
* Each of these rank-2 tensors has three rank-1 tensors (vectors) along the second dimension (axis 1).
 * And each rank-1 tensor (vector) contains four scalars (rank-0 tensors) along the third dimension (axis 2).

Now, let's imagine that we want to create a 224x224 image that is also RGB! How do we go about making this?

In [20]:
random_image_size_tensor = torch.rand(size=(224,224,3))
random_image_size_tensor.shape, random_image_size_tensor.ndim

(torch.Size([224, 224, 3]), 3)

Here, we've got 224 for our height, 224 for our width, and 3 for Red, Blue, Green! We can think of tensors as arrays in a sense. Reason we didn't visualize this is because it's a bit too large. So there's really no need to see it all.

**Zeros & Ones**

What if we don't want to fill our tensors with random values? Luckily, we can do that by using .zeros and .ones as these are available as well from PyTorch. This is helpful for masking. What is masking? It is used as a signal for the model not to learn specific values, in this case zeros. 

In [21]:
zeros = torch.zeros(size=(3,4))
zeros, zeros.dtype

(tensor([[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]),
 torch.float32)

In [22]:
ones = torch.ones(size=(3,4))
ones, ones.dtype

(tensor([[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]),
 torch.float32)

**Creating a Range & Tensors Like**

How about if we want a range of numbers instead? Like from 0 - 10 or 0 - 100. We can use *torch.arrange(start, end, step)* to do that. Pretty simple, start means where we want to initialize, end is where we want to stop, and step is how many steps in-between.

In [23]:
zero_to_ten = torch.arange(0,10,1)
zero_to_ten, zero_to_ten.dtype

(tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), torch.int64)

Notice that our type is now integers because we specified only integers in our range. If we add decimal values, it'll switch back to floats.

Sometimes, you want to create another tensor based off another but instead with all zeros or ones. You can do this by utilizing *torch.zeros_like(input)* or *torch.ones_like(input)*

In [24]:
ten_zeros = torch.zeros_like(zero_to_ten)
ten_zeros, ten_zeros.dtype

(tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), torch.int64)

In [25]:
ten_zeros = torch.zeros_like(zero_to_ten)
ten_zeros, ten_zeros.dtype

(tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), torch.int64)

**Tensor Datatypes**

There are many different tensor datatypes. Some are good to use for the CPU and some are better with the GPU.

Tensors that have *torch.cuda* generally means that these are for the GPU to use. However, the most common datatype and generaly used is *torch.float32* or *torch.float*. These are called "32-bit floating points".

But there are also 16-bit floating points such as *torch.float16* / *torch.half* and even 64-bit floating points *torch.float64* / *torch.double*. But that's not all, there's even 8-bit, 16-bit, 32-bit, 64-bit integers! And many more. 

The reason that we have so many datatypes is because tensors are made to accomodate for a wide range of data and thus these are important for precision in computing. Precision is the amount of detail used to describe a number. The higher the precision value means the more detail and more data can be used to express a number. 

This is important for deep learning because you are dealing with a lot of operations. So more detail means more compute power is needed. When it comes to lower precision datatypes, it is typically quicker and costs less in terms of computational power that is needed but this is in exchange for evaluation performance on metrics such as accuracy.

More Detail -> Costlier Compute Power -> Better Metrics | Less Detail - Affordable Compute Power -> Worse Metrics

Let's practice with utilizing datatypes. We can specify this using the dtype parameter when creating tensors. 
Usually, the default datatype for tensors is float32.

In [26]:
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                              dtype=None, #Default is None -> Float32
                              device=None, #Default is None -> Tensor Type (eg. CUDA, CPU))
                              requires_grad=False #If true, operations performed on the tensor are recorded
                              )
float_32_tensor.ndim, float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device, float_32_tensor.requires_grad

(1, torch.Size([3]), torch.float32, device(type='cpu'), False)

Here we can see a lot of information regarding our tensor. It tells us that it is of rank-1, is comprised of three elements, is a float, and is meant for a cpu, and is not required to be recorded.

Aside from the shape issues (meaning that two tensors are not the exact shape) two of the most common problems to encounter is datatype and device issues. For example, one tensor is of *torch.float32* while the other is of *torch.float16*. Same issue goes for device. If one tensor is a cpu type while the other is a gpu type then that could cause some issues.

**PyTorch loves it when two tensors have the same format and calculations are used on the same device.**

Let's make a tensor with *dtype=torch.float16*

In [28]:
float_16_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=torch.float16) # torch.half would also work

float_16_tensor.dtype

torch.float16

In [29]:
float_32_tensor.ndim, float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device, float_32_tensor.requires_grad

(1, torch.Size([3]), torch.float32, device(type='cpu'), False)

**Getting Information From Tensors**

Once you've created tensors are are working with them then it is very helpful to know how to navigate through the information that they can provide. We've already used the following:

1. Shape - what is the shape of a tensor?
2. dtype - what datatype are the elements within the tensor?
3. device - what device is it stored on - cpu or gpu?

Let's make a random tensor and try to get information from it.

In [30]:
random_tensor = torch.rand(3,4)

print(random_tensor)
print(f"Shape of Tensor: {random_tensor.shape}")
print(f"Datatype of Tensor: {random_tensor.dtype}")
print(f"Device Tensor is Stored: {random_tensor.device}")

tensor([[0.3347, 0.1443, 0.4272, 0.3633],
        [0.6514, 0.4040, 0.2347, 0.7513],
        [0.6260, 0.1857, 0.2867, 0.1479]])
Shape of Tensor: torch.Size([3, 4])
Datatype of Tensor: torch.float32
Device Tensor is Stored: cpu


Whenever you run into issues with operating with tensors, it's best to review through the tensors themselves and check the very basics of information regarding them. You might discover that something is wrong with the how they are structured. 

The three big questions - WHAT? WHAT? WHERE? will help you in troubleshooting errors:

1. What shape are my tensors?
2. What datatype are they?
3. Where are they stored?

**Manipulating Tensors**

In deep learning, almost all the data that you work with will be represented as tensors. This can range from images, videos, text, audio, protein structures, etc. Models are able to learn simply by investigating these tensors and performing a series of operations on tensors to create a representation of the patters of the data that you are working on. These operations that are performed on the tensors could range in the millions!

Now, don't get all worked up because the operations are not that complex. It's all just the following:

1. Addition
2. Subtraction
3. Multiplication (element-wise)
4. Division
5. Matrix Multiplication (The GOAT of Deep Learning)

*GOAT - Greatest of All Time*

That's basically it. Now, THERE are others but these are the majority of it and the basic building blocks of neural networks. Working with these building blocks properly will give you very sophisticated neural networks that can handle complex tasks.

**BASIC OPERATIONS**

Let's work with the fundamentals first. Addition, Subtraction, and Multiplication.

In [31]:
tensor = torch.tensor([1,2,3])
tensor + 10

tensor([11, 12, 13])

In [32]:
tensor * 10

tensor([10, 20, 30])

The tensor itself didn't change because we didn't reassign it. So if we did *tensor += 10* then that would've reassigned. Let's do this with subtraction.

In [33]:
tensor = tensor - 10
tensor

tensor([-9, -8, -7])

In [34]:
tensor = tensor + 10
tensor

tensor([1, 2, 3])

PyTorch also has a function called torch.mul() which basically does multiplication. Same goes for addition with torch.add()

In [35]:
torch.mul(tensor, 10)

tensor([10, 20, 30])

In [36]:
torch.add(tensor, 10)

tensor([11, 12, 13])

In [37]:
# Still unchanged since we didn't reassign
tensor

tensor([1, 2, 3])

But it's just more convenient to use the operational symbols - * , + , - rather than *torch.mul()*

In [38]:
# Element-wise multiplication means that an element would be multiplied to its equivalent on the same index. Index 0 * 0 - Index 1 * 1 .. 
print(tensor, "*", tensor)
print("Equals: ", tensor * tensor)

tensor([1, 2, 3]) * tensor([1, 2, 3])
Equals:  tensor([1, 4, 9])


**Matrix Multiplication**

One of the most common operations that is utilized in machine learning and deep learning algorithms such as neural networks is matrix multiplication. PyTorch implements matrix multiplication through the function *torch.matmul()*. BUT there is also the symbol *@* that is more commonly used when dealing with matrix multiplication.

In [39]:
There are two main rules to remember when dealing with matrix multiplication:
1 - The INNER DIMENSIONS must match: 

SyntaxError: invalid syntax (3229613160.py, line 1)

In [40]:
(3, 2) @ (3, 2) # WILL NOT WORK
(2, 3) @ (3, 2) # WILL WORK
(3, 2) @ (2, 3) # WILL WORK

TypeError: unsupported operand type(s) for @: 'tuple' and 'tuple'

In [None]:
2. - The resulting matrix will take the shape of the OUTER DIMENSIONS:

In [None]:
(2, 3) @ (3, 2) -> (2, 2)
(3, 2) @ (2, 3) -> (3, 3) 

For better visualization of how matrix multiplication works: http://matrixmultiplication.xyz/

**NOTE:** ROWS x COLUMNS

Let's create a tensor and perform element-wise multiplication first then conduct matrix multiplication on it.

In [41]:
tensor = torch.tensor([1,2,3])
tensor.shape

torch.Size([3])

In [42]:
# [1*1, 2*2, 3*3] = [1,4,9]
tensor * tensor

tensor([1, 4, 9])

In [43]:
# [1*1 + 2*2 + 3*3]
tensor @ tensor

tensor(14)

Matrix multiplication sums up the value of the elements while element-wise multiplication retains the structure of the tensor's elements.

You can do matrix multiplication manually. But it isn't really recommended simply because you would have to utilize for loops. These are computationally expensive. Plus, there's really no need to reinvent the wheel

In [44]:
%%time
value = 0
for i in range(len(tensor)):
    value += tensor[i] * tensor[i]
value

CPU times: total: 0 ns
Wall time: 1.44 ms


tensor(14)

In [45]:
%%time
tensor @ tensor
tensor

CPU times: total: 0 ns
Wall time: 0 ns


tensor([1, 2, 3])

**The Common Errors in Deep Learning**

Deep learning is highly involved in multiplying and performing operations with matrices. Operations with matrices have strict rules to follow when it comes to the shapes and sizes of the tensors that are being combined. One of the most common error that you would encounter in deep learning is simply shape mismatch error. 

In [46]:
tensor_A = torch.tensor ([[1,2],
                          [3,4],
                          [5,6]], dtype=torch.float32)

tensor_B = torch.tensor ([[7,10],
                          [8,11],
                          [9,12]], dtype=torch.float32)

tensor_A @ tensor_B

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

In [None]:
tensor_A.shape, tensor_B.shape

Notice the error! RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2). Remember our first rule. The INNER DIMENSIONS must match. So 3x2 @ 3x2 gets an error because the inner dimensions are different. 

We can make this work by simply making the inner dimensions match. One way we can do this is by using transpose. This switches the dimensions of a given tensor so a 3x2 would become a 2x3.

We can perform transpose using PyTorch by:
*torch.transpose(input, dim0, dim1)* - where input is the tensor to be transposed and dim0 / dim1 would be the dimensions to be swapped.
*tensor.T* - where the *tensor* is the desired tensor to be transposed.

In [None]:
print(tensor_A)
print(tensor_B)

In [None]:
print(tensor_A)
print(tensor_B.T)

In [None]:
tensor_A.shape, tensor_B.T.shape

In [None]:
As we can see they are now both the same shape while also retaining the data inside. 

In [47]:
output = tensor_A @ tensor_B.T
output, output.shape

(tensor([[ 27.,  30.,  33.],
         [ 61.,  68.,  75.],
         [ 95., 106., 117.]]),
 torch.Size([3, 3]))

Here is where our 2nd rule comes into play. The resulting tensor will take the shape of the OUTER DIMENSIONS. So now our tensor is shaped at [3,3]. How about if one of the outer tensors is different in shape? Let's take a look.

In [48]:
tensor_A = torch.tensor ([[1,2],
                          [3,4],
                          [5,6],
                         [14,15]], dtype=torch.float32)

tensor_B = torch.tensor ([[7,10],
                          [8,11],
                          [9,12]], dtype=torch.float32)

In [49]:
tensor_A.shape, tensor_B.T.shape

(torch.Size([4, 2]), torch.Size([2, 3]))

In [50]:
output = tensor_A @ tensor_B.T
output, output.shape

(tensor([[ 27.,  30.,  33.],
         [ 61.,  68.,  75.],
         [ 95., 106., 117.],
         [248., 277., 306.]]),
 torch.Size([4, 3]))

We can see that our outer dimension is now 4 because that is the outer dimension of the first tensor. The inner dimension is now 3 becuase that is the outer dimension of the second tensor. Basically, we took the outer dimensions of the original tensors and that would be the shape of the resulting tensor.

Transposing is vital to ensure operations are able to go smoothly between tensors with different shapes. 

**NOTE:** The individual elements of the resulting matrix from matrix multiplication are calculated as the dot products of rows from the first matrix with columns from the second matrix.

Neural networks are filled with matrix multiplications and dot products. The *torch.nn.Linear* module (to be seen later on), also known as a feed-forward layer or fully connected layer, implements a matrix multiplication between an input (x) and a weighst matrix (A).

y = x * A^T + b

That sounds rough but basically:

x - is the input layer (deep learning is a stack of layers like *torch.nn.Linear* and others on top of each other).
A - is the weights matrix created by the layer. This starts out as random values that gets adjusted as a neural networks start to recognize patterns in the data. The "T" means that the weights matrix is transposed. - NOTE: that the "A" is sometimes referred to as "W".
b - is the bias term used to slightly offset the weights and inputs. 
y - is the output 

That is the formula for a linear function or a straight line! Let's play around with linear layers.

In [51]:
# Since linear layers start with a random weights matrix, let's make it reproducible (more on this later)
torch.manual_seed(45)
# This uses matrix multiplication
linear = torch.nn.Linear(in_features=2, #in_features = matches inner dimensions of input
                        out_features=3) #out_features = describes outer value

x = tensor_B
output = linear(x)
print(f"Input Shape: {x.shape}\n")
print(f"Output:\n{output}\nOutput Shape: {output.shape}")

Input Shape: torch.Size([3, 2])

Output:
tensor([[ 3.8298,  6.8288, -3.2636],
        [ 4.0394,  7.6523, -3.8208],
        [ 4.2491,  8.4757, -4.3781]], grad_fn=<AddmmBackward0>)
Output Shape: torch.Size([3, 3])


Basically, The in_features should match the second dimension (number of columns) of the input tensor. The out_features determines the second dimension (number of columns) of the output tensor after the transformation.

**REMEMBER MATRIX MULTIPLICATION IS ALL YOU NEED**

**Finding the Min, Max, Mean, Sum, etc. - Aggregation**

Now that we're finished with manipulating tensors and conductiing operations. Let's talk about aggregating them - going from many values to less values. First, let's make a tensor and find the max, min, mean, and sum.

In [52]:
x = torch.arange(0, 100, 10)
x, x.shape

(tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90]), torch.Size([10]))

In [53]:
print(f"min: {x.min()}")
print(f"min: {x.max()}")
print(f"mean: {x.type(torch.float32).mean()}") # Mean won't work without floats. So there is a need to convert first from int to float.
print(f"sum: {x.sum()}")

min: 0
min: 90
mean: 45.0
sum: 450


The same can be done with torch methods such as:

In [54]:
torch.max(x), torch.min(x), torch.mean(x.type(torch.float32)), torch.sum(x)

(tensor(90), tensor(0), tensor(45.), tensor(450))

**Positional Min/Max**

You can also find the index of the max value in a tensor. The same can also be done for the minimum value using *torch.argmax()* and *torch.argmin()*. This is a great way to check the position of a max/min value of a tensor without needing the actual value itself.

In [55]:
tensor = torch.arange(10, 100, 10)
print(tensor)
print(f"Tensor - Max Value Index: {torch.argmax(tensor)}")
print(f"Tensor - Min Value Index: {torch.argmin(tensor)}")

tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Tensor - Max Value Index: 8
Tensor - Min Value Index: 0


**Change Tensor Datatype**

As mentioned before, one of the most common issues that people face when working with tensors is tensors that differ in datatypes. If one tensor is *torch.float32* and another is *float16* and you try to do some operations together with these then you might run into some error.

Luckily, PyTorch gives you the means to check the datatype easily and change it using *torch.Tensor.type(dtype=None)* where the dtype parameter is the datatype that you want to use.

Let's create a tensor and check it's datatype first. - default is *torch.float32*

In [56]:
tensor = torch.arange(10., 100., 10.)
tensor, tensor.dtype

(tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.]), torch.float32)

Let's change this into a *torch.float16* or a *torch.half*.

In [57]:
tensor_float16 = tensor.type(torch.float16)
tensor_float16

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

We can also do the same process for *torch.int*

In [58]:
tensor_int8 = tensor.type(torch.int8)
tensor_int8

tensor([10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)

Easiest way to go through datatypes is that if the higher the value the more precise it can be but the higher the cost for processing power. If the value is lower then it is less precise but a lot easier when it comes to the demand for processing power. 

**Reshaping, Stacking, Squeezing, Unsqueezing**

One of the more important means of manipulating tensors is through their shapes. We need to actively change or reshpae the dimensions without actually touching the values and data inside them. Here are the following methods that are available in PyTorch:

1. *torch.reshape(input, shape)* -> Reshapes *input* to *shape* (if compatible). Is also available in *torch.Tensor.reshape()*
2. *tensor.view(shape)* -> Returns a view of the original tensor in a different *shape* but shares the same data as the original tensor.
3. *torch.stack(tensors, dim=0)* -> Concatenates a sequence of tensors along a new dimension(*dim*). All tensors must be the same shape.
4. *torch.squeeze(input)* -> Squeezes the *input* tensor by removing all the dimensions with the value 1.
5. *torch.unsqueeze(input, dim)* -> Returs input with a dimension value of 1 added to dim.
6. *torch.permute(input, dim)* -> Returns a view of the original input with its dimensions permuted (rearranged) to dims.

Why bother with all these different methods? That's because in deep learning models, it's all about manipulating tensors in some way. Due to the strict rules of matrix multiplication, we need to be able to shape our data that can fit these rules so that we can operate them. For example, if we have shape mismatches, we'll run into errors. These methods are there for us to modify our tensors so that they can be used with each other. 

Let's try these methods out.

In [101]:
x = torch.arange(1., 21.)
x, x.shape, x.ndim

(tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13., 14.,
         15., 16., 17., 18., 19., 20.]),
 torch.Size([20]),
 1)

We'll work with adding another dimension using *torch.reshape()*

In [104]:
x_reshaped = x.reshape(2,10)
x_reshaped, x_reshaped.shape, x_reshaped.ndim

(tensor([[ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.],
         [11., 12., 13., 14., 15., 16., 17., 18., 19., 20.]]),
 torch.Size([2, 10]),
 2)

Notice that we now hav an extra outer bracket. Our size has also increased to adding one more dimension. When doing reshape, it's important to always remember that the resulting shape would be the same value as the input size. 

So originally we had [20] now we've split that up to [2,10] if you multiply 2-10 that would be 20 so it will fit.

In [105]:
x_reshaped_try = x.reshape(1,5)
x_reshaped_try, x_reshaped_try.shape, x_reshaped_try.ndim

RuntimeError: shape '[1, 5]' is invalid for input of size 20

In [61]:
x_reshaped_try = x.reshape(1,5)
x_reshaped_try, x_reshaped_try.shape, x_reshaped_try.ndim

RuntimeError: shape '[1, 5]' is invalid for input of size 7

Now let's proceed with working on *torch.view()*. This will change the shape but retain the origin data. 

In [106]:
z = x.view(1,7)
z, z.shape, z.ndim

RuntimeError: shape '[1, 7]' is invalid for input of size 20

Remember that changing the view of a tensor with torch.view() ONLY creates a new view of the SAME tensor. If we change the view, we would also be changing the original tensor. So these two are connected. Imagine it as wearing a pair of shades. The color of the data might change but you're still looking at the same thing.

In [63]:
z[:, 0] = 5
z, x

(tensor([[5., 2., 3., 4., 5., 6., 7.]]), tensor([5., 2., 3., 4., 5., 6., 7.]))

See, they're the exact same even if you did the operation on z. It still changed x.

If we want to make a tensor stack on top of itself repeatedly, we can do that with *torch.stack()*. This is great because you are able to  combine tensors that have the same layout into one. For example, images. We can stack images on top of each other with each row representing the pixel values. 

In [64]:
x_stacked = torch.stack([x,x,x,x], dim=0) # Play around with dim. Switch it to 1 and check what happens.
x_stacked, x_stacked.shape, x_stacked.ndim

(tensor([[5., 2., 3., 4., 5., 6., 7.],
         [5., 2., 3., 4., 5., 6., 7.],
         [5., 2., 3., 4., 5., 6., 7.],
         [5., 2., 3., 4., 5., 6., 7.]]),
 torch.Size([4, 7]),
 2)

In [65]:
x_stacked = torch.stack([x,x,x,x], dim=1)
x_stacked, x_stacked.shape, x_stacked.ndim

(tensor([[5., 5., 5., 5.],
         [2., 2., 2., 2.],
         [3., 3., 3., 3.],
         [4., 4., 4., 4.],
         [5., 5., 5., 5.],
         [6., 6., 6., 6.],
         [7., 7., 7., 7.]]),
 torch.Size([7, 4]),
 2)

Notice where the '4' is placed whenever we move dimensions. Also take note that the '4' comes from the number of tensors that we've stacked. If we go back to *torch.stack()* we placed four 'x' tensors into the input.

Let's try removing dimensions from a tensor. *torch.squeeze()* is a function that we can use to d exactly that. This will remove all single dimensions from a tensor. So dimensions that have 1 will all be removed.


In [69]:
print(f"Previous Tensor: {x_reshaped}")
print(f"Previous Shape: {x_reshaped.shape}")

# Removes extra dimensions from x_reshaped

x_squeezed = x_reshaped.squeeze()
print(f"\nNew Tensor: {x_squeezed}")
print(f"New Shape: {x_squeezed.shape}")

Previous Tensor: tensor([[5., 2., 3., 4., 5., 6., 7.]])
Previous Shape: torch.Size([1, 7])

New Tensor: tensor([5., 2., 3., 4., 5., 6., 7.])
New Shape: torch.Size([7])


Notice how the dimensions that have only 1 on it got removed? That will happen to all of them when squeeze is done. There is also an opposite for this. 

*torch.unsqueeze(dim=0)* does the exact opposite by adding in dimension at the value of 1 at a specified index. It's not exactly the opposite of squeeze since this only adds in single dimension at the value of 1 and you also have to specify the index for it work.

In [71]:
print(f"Previous Tensor: {x_squeezed}")
print(f"Previous Shape: {x_squeezed.shape}")

# Removes extra dimensions from x_reshaped

x_unsqueezed = x_squeezed.unsqueeze(dim=0)
print(f"\nNew Tensor: {x_unsqueezed}")
print(f"New Shape: {x_unsqueezed.shape}")

Previous Tensor: tensor([5., 2., 3., 4., 5., 6., 7.])
Previous Shape: torch.Size([7])

New Tensor: tensor([[5., 2., 3., 4., 5., 6., 7.]])
New Shape: torch.Size([1, 7])


There's also a function that we can use to rearrange the order of axes values using *torch.permute(input, dims)* where the input gets turned into a *view* with the new dimensions.

**NOTE:** Let's remind ourselves of terminologies again before getting started. Axes is the indexing of the dimensions so using our previous example of *torch.Size([1,7])*. The 1 would be the 0-axis and the 7 would the 1-axis. Also remember that *view* means that it is just a different perspective of the data so it it still connected to the original variable where the data is stored.

So if *torch.view* changes the shape of the dimensions, this changes the order of the dimensions but both still retain a connection to the original tensor.

In [73]:
x_original = torch.rand(size=(224,224,3))
x_permuted = x_original.permute(2, 0, 1) 

print(f"Previous Shape: {x_original.shape}")
print(f"New Shape: {x_permuted.shape}")

Previous Shape: torch.Size([224, 224, 3])
New Shape: torch.Size([3, 224, 224])


In our permute function we specified (2,0,1). This means the following:

We moved the 2-axis of the original to the 0-axis of the new
We moved the 0-axis of the original to the 1-axis of the new
We moved the 1-axis of the original to the 2-axis of the new

**Indexing - Selecting Data From Tensors**

Sometimes, we need to select specific data from our tensors. For example, we might need the data from this specific row of this specific column. Luckily, we can approach this as we approach regular lists and arrays - which is indexing!

In [110]:
x = torch.arange(1, 10)
x, x.shape

(tensor([1, 2, 3, 4, 5, 6, 7, 8, 9]), torch.Size([9]))

In [111]:
x = x.reshape(1,3,3)
x, x.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

Indexing starts at the outer dimension -> inner dimension 

In [114]:
print(f"First Square Bracket: \n{x[0]}")
print(f"Second Square Bracket: {x[0][0]}")
print(f"Third Square Bracket: {x[0][0][0]}")

First Square Bracket: 
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
Second Square Bracket: tensor([1, 2, 3])
Third Square Bracket: 1


That's not all, we can also use : to *specify all the values in this dimension*. We can then also use a comma , to add another dimension. Examples:

In [115]:
x[:, 0] # We grab all the values in the 0th dimension and the index of the 0 index of the 1st dimension

tensor([[1, 2, 3]])

In [116]:
x[:, :, 1] # We get all the values of the of the 0th & 1st dimensions but only index 1 of the 2nd dimension.

tensor([[2, 5, 8]])

In [117]:
x[:, 1, 1] # We get all the values of the 0th dimension but only index 1 of the 1st & 2nd dimension

tensor([5])

In [118]:
x[0, 0, :] # We grab the 0-index of the 0th & 2nd dimension then we get all the values of the 2nd dimension

tensor([1, 2, 3])

Don't worry if you get a bit confused when dealing with indexing. Especially with large tensors as it can really get confusing. But! Luckily, there's no cost in experimenting around and deducing the location of the data that you're trying to find. Keep practicing and follow the pattern of VISUALIZE! VISUALIZE! VISUALIZE. 

**PyTorch Tensors & NumPy**

PyTorch tensors and NumPy arrays are pretty similiar in a lot ways but PyTorch tensors have the added strength of utilizing CUDA which makes it a lot faster.

NumPy is widely adopted because one of it's strengths is being able to visualize and graph data. So it's no wonder why PyTorch have functionalities that allow it to interact with NumPy and vice versa. 

There are two main ways that we use to interact with these two different libraries.

1. *torch.from_numpy(ndarray)* - NumPy array converts to PyTorch tensor.
2. *torch.Tensor.numpy()* - PyTorch tensor converts to NumPy array.

Let's try using these in the following:

In [119]:
#NumPy Array to PyTorch Tensor
import torch
import numpy as np
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array)
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

Notice that np also uses the arange function which is what we also use for PyTorch. NumPy arrays have the default datatype of float64 which is different from what PyTorch's default flaot32 - but there are always means of converting these.

Here's an example of converting from NumPy array to PyTorch tensor while also changing the datatype to float32.

In [122]:
tensor = torch.from_numpy(array).type(torch.float32)
tensor, tensor.dtype

(tensor([1., 2., 3., 4., 5., 6., 7.]), torch.float32)

Since we reassigned the tensor, even if we change the tensor, our array would stay the same and vice-versa. There's no connection.

In [123]:
array = array + 1
array, tensor

(array([2., 3., 4., 5., 6., 7., 8.]), tensor([1., 2., 3., 4., 5., 6., 7.]))

If we want to convert our PyTorch tensors to NumPy array - we use tensor.numpy()

In [128]:
tensor = torch.ones(7) # This creates a tensor of ones with dtype=float32
numpy_tensor = tensor.numpy() # Will retain the datatype unless changed
tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

**Reproducability** - Removing The Random From Random

Deep learning revolves a lot around randomness, especially when initializing parameters/weights. However, there's a lot of different types of 'randomness'

As mentioned before, neural networks start off with random weights which are very poor descriptions of the data. But we calculate these and adjust them to better represent the data. So recall the steps:

start with random weights -> perform tensor operations on the weights -> update the weights -> repeat

So randomess is important since we start from there but sometimes, it's better to have the option of having a certain type of random. Not too random but 'just right' enough to do experiments with. It's not really effective to determine something is going well if the results you get is always widely different. 

For example, you've created an algorithm that is capable of executing a certain operation at X performance. You need someone else to verify that this isn't just a coincidence. That is where *reproducability* starts to come into the picture.

You both get similiar, not exactly the same but not too far apart results that is executed on two different machines.

Let's try executing reproducability in PyTorch

In [145]:
random_tensor_A = torch.rand(3,4)
random_tensor_B = torch.rand(3,4)

print(f"Tensor A: \n{random_tensor_A}\n")
print(f"Tensor B: \n{random_tensor_B}\n")
print(f"Is Tensor A == Tensor B? - in any value")
random_tensor_A == random_tensor_B

Tensor A: 
tensor([[0.8694, 0.5677, 0.7411, 0.4294],
        [0.8854, 0.5739, 0.2666, 0.6274],
        [0.2696, 0.4414, 0.2969, 0.8317]])

Tensor B: 
tensor([[0.1053, 0.2695, 0.3588, 0.1994],
        [0.5472, 0.0062, 0.9516, 0.0753],
        [0.8860, 0.5832, 0.3376, 0.8090]])

Is Tensor A == Tensor B? - in any value


tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])

It's way too random! Values are just getting crazy and the chances that you'll get something similiar is very rare. But what if you wanted to create two random tensors that have the same values? They'll be still randomly generated but they would share the same values regardless.

Thankfully, *we have torch.manual_seed(seed)*. This is a function that we can use to solve that exact problem. *seed* is an integer that we can specify the randomess that we want. 

In [158]:
import random 
random_seed = 42
torch.manual_seed(seed=random_seed)
random_tensor_C = torch.rand(3,4)

torch.manual_seed(seed=random_seed)
random_tensor_D = torch.rand(3,4)

print(f"Tensor C: \n{random_tensor_C}\n")
print(f"Tensor D: \n{random_tensor_D}\n")
print(f"Is Tensor C == Tensor D? - in any value")
random_tensor_C == random_tensor_D

Tensor C: 
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Tensor D: 
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Is Tensor C == Tensor D? - in any value


tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])

**NOTE:** When utilizing the GPU. It is important to specify that seed generation with cuda as well. Using *torch.cuda.manual_seed()* to ensure reproducability when dealing with CPU and GPU operations involving randomess.

So basically aside just using *torch.manual_seed()* you should also add in *torch.cuda.manual_seed()*. This is so that both CPU and GPU share the same seed.

**Getting PyTorch to run on GPU**

This is so that we can check whether or not PyTorch has access to our gpu.

In [148]:
torch.cuda.is_available()

True

We create a device variable so that we can check whether we are currently using a gpu or a cpu. 

In [149]:
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

Let's count the number of gpus that PyTorch has access to,

In [150]:
torch.cuda.device_count()

1

This is important because you can have the option of running one process in one gpu while running another in a different gpu. You can also have all these gpu's work together on one process.

**Putting Tensors & Models on GPU**

We can specify where a tensor is stored by calling *to(device)* wherein device is the target device that we specified earlier. Why is this important?

Because GPUs are computationally faster than CPUs and if in the case that the GPU isn't available we can still go back to the CPU because we have covered the case earlier with device checking whether or not we have cuda.

Putting a tensor on a GPU using *to(device)* will return a copy of that tensor. So that tensor will be on the CPU and GPU. In order to overwrite tensors, we need to reassign them so do the following:

*some_tensor = some_tensor.to(device)*

Let's try creating a tensor and put it on the gpu.

In [155]:
tensor = torch.tensor([1,2,3])

# Tenor is not yet on the GPU
print(tensor, tensor.device)

# Tenosr reassigned to a variable in the GPU
tensor_on_gpu = tensor.to(device)
tensor_on_gpu

tensor([1, 2, 3]) cpu


tensor([1, 2, 3], device='cuda:0')

**Moving Tensors Back To CPU**

What if we wanted to put our tensor back to the CPU? Firstly, you might think what's the advantage of that? Remember that NumPy arrays are great for visualizing data? 

One of the things that make PyTorch unique is because it's capacity of utilizing CUDA for matrix multiplication. NumPy can't do that thus it doesn't have access to CUDA. So if you try to convert a tensor from a GPU to NumPy, it won't work.

Let's try doing that now and see what happens.

In [156]:
tensor_on_gpu.numpy()

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

We've gotta specify first that we put this back to the CPU before we can make this into a NumPy array.

In [157]:
tensor_back_on_cpu = tensor_on_gpu.cpu().numpy()
tensor_back_on_cpu

array([1, 2, 3], dtype=int64)

Remember, that this only creates a copy of the tensor to the CPU. So we still have our original tensor on the GPU. You can think of it now as having two versions in two different devices. These are two seperate objects and have no connection with each other whatsoever.

**EXERCISES**

Refer to -> https://www.learnpytorch.io/00_pytorch_fundamentals/#exercises