In [2]:
import torch
torch.__version__

'2.7.1+cpu'

- PyTorch

- PyTorch is an open source machine learning and deep learning framework.
- PyTorch allows you to manipulate and process data and write machine learning algorithms using Python code.

-# Introduction to tensors

Tensors are the fundamental building block of machine learning.
It writes data in numerical way.

Scalar : zero dimension tensor

In [14]:
scalar = torch.tensor(5)
print('scalar', scalar)
print(scalar.ndim) # prints the dimensions of a tensor; ndim()

# Retrieve number from Tensor item()
print('Retrieved no. from tensor :', scalar.item())

# A vector is a single dimension tensor but can contain many numbers.
vector = torch.tensor([2,5])
print('vector:', vector)
print(vector.ndim)

# Shape of Vector
print('shape:', vector.shape)

scalar tensor(5)
0
Retrieved no. from tensor : 5
vector: tensor([2, 5])
1
shape: torch.Size([2])


In [None]:
# Tensor
Tensor = torch.tensor([[[7, 6, 3],
                        [3, 5, 1],
                        [2, 8, 5]]]) # 3d Tensor

""" Tensor = torch.tensor([[7, 6, 3],
                        [3, 5, 1],
                        [2, 8, 5]]) #2D Tensor """

print(Tensor)
# Trick: use the square bracket on the outside of one side counting trick 
print('dimension:', Tensor.ndim)
print(Tensor.shape) # The dimensions go outer to inner

tensor([[[7, 6, 3],
         [3, 5, 1],
         [2, 8, 5]]])
dimension: 3
torch.Size([1, 3, 3])


Some Notes:

- ML models such as neural networks manipulate and seek patterns within tensors.
- When building ML models with PyTorch, it's rare we'll create tensors by hand
- Instead, a machine learning model often starts out with large random tensors of numbers and adjusts these random numbers as it works through data to better represent it.
-

**Start with random numbers -> look at data -> update random numbers -> look at data -> update random numbers...**

As a data scientist, you can define how the machine learning model starts (initialization), looks at data (representation) and updates (optimization) its random numbers.



In [22]:
# Create a tensor of random numbers

random_tensor = torch.rand(size=(3,4))
random_tensor, random_tensor.dtype

(tensor([[0.2179, 0.4869, 0.0500, 0.7345],
         [0.6402, 0.1215, 0.1211, 0.8675],
         [0.8624, 0.4604, 0.9976, 0.1926]]),
 torch.float32)

In [27]:
# Zeros and ones

zeros = torch.zeros(size=(3,4))
print(zeros)
print(zeros.dtype)

ones = torch.ones(size=(3,4))
print(ones)
print(ones.dtype)

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])
torch.float32
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])
torch.float32


In [29]:
# Creating a range and tensors like
# torch.arange(start, end, step)

# Create a range of values 0 to 10
range = torch.arange(start=0, end=10, step=1)
range


tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [31]:
""" Sometimes you might want one tensor of a certain type with the same shape as another tensor.
For example, a tensor of all zeros with the same shape as a previous tensor.
To do so you can use torch.zeros_like(input) or torch.ones_like(input) which return a tensor filled with zeros or ones in the same shape as the input respectively. 
"""

# Can also create a tensor of zeros similar to another tensor
similar = torch.zeros_like(input=range) # will have same shape
similar

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

# Tensor datatypes

- There are many different tensor datatypes available in PyTorch. Some are specific for CPU and some are better for GPU.

- Generally if you see torch.cuda anywhere, the tensor is being used for GPU (since Nvidia GPUs use a computing toolkit called CUDA).

- The most common type (and generally the default) is torch.float32 or torch.float. This is referred to as "32-bit floating point".

- But there's also 16-bit floating point (torch.float16 or torch.half) and 64-bit floating point (torch.float64 or torch.double).

- Lower precision datatypes are generally faster to compute on but sacrifice some performance on evaluation metrics like accuracy (faster to compute but less accurate)



In [32]:
# Default datatype for tensors is float32
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # defaults to None, which is torch.float32 or whatever datatype is passed
                               device=None, # defaults to None, which uses the default tensor type
                               requires_grad=False) # if True, operations performed on the tensor are recorded 

float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device

(torch.Size([3]), torch.float32, device(type='cpu'))

In [None]:
""" Aside from shape issues (tensor shapes don't match up), two of the other most common issues we'll come across in PyTorch are datatype and device issues.
For example, one of tensors is torch.float32 and the other is torch.float16 (PyTorch often likes tensors to be the same format).
Or one of your tensors is on the CPU and the other is on the GPU (PyTorch likes calculations between tensors to be on the same device). 
"""
# dtype=torch.float16
float_16_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=torch.float16) # torch.half would also work

float_16_tensor.dtype

torch.float16

In [None]:
# Getting information from tensors
# Create a tensor
some_tensor = torch.rand(3, 4)

# Find out details about it
print(some_tensor)
print(f"Shape of tensor: {some_tensor.shape}")
print(f"Datatype of tensor: {some_tensor.dtype}")
print(f"Device tensor is stored on: {some_tensor.device}") # will default to CPU

# Note: When you encounter errors...do this --> "what shape are my tensors? what datatype are 
# they and where are they stored? what shape, what datatype, where where where"

tensor([[0.9866, 0.5019, 0.7114, 0.1557],
        [0.1988, 0.4908, 0.8214, 0.8343],
        [0.4541, 0.6520, 0.4039, 0.2909]])
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


## Manipulating Tensors

In deep learning, data (images, text, video, audio, protein structures, etc) gets represented as tensors.

A model learns by investigating those tensors and performing a series of operations (could be 1,000,000s+) on tensors to create a representation of the patterns in the input data.

These operations are often a wonderful dance between:

Addition,
Substraction,
Multiplication (element-wise),
Division,
Matrix multiplication,
And that's it. Sure there are a few more here and there but these are the basic building blocks of neural networks.

Stacking these building blocks in the right way, you can create the most sophisticated of neural networks (just like lego!)

In [None]:
tensor = torch.tensor([1, 2, 3])
print(tensor + 100)
print(tensor * 100) # it's more common to use the operator symbols like * instead of torch.mul()
print(tensor / 100)
print(tensor - 100)
# original tensor will be unchanged

tensor([101, 102, 103])
tensor([100, 200, 300])
tensor([0.0100, 0.0200, 0.0300])
tensor([-99, -98, -97])


### Matrix multiplication (is all you need)
One of the most common operations in machine learning and deep learning algorithms (like neural networks) is matrix multiplication.

- PyTorch implements matrix multiplication functionality in the **torch.matmul()** method.

- The main two rules for matrix multiplication to remember are:

- The inner dimensions must match:

        (3, 2) @ (3, 2) won't work

        (2, 3) @ (3, 2) will work

        (3, 2) @ (2, 3) will work

- The resulting matrix has the shape of the outer dimensions:

        (2, 3) @ (3, 2) -> (2, 2)

        (3, 2) @ (2, 3) -> (3, 3)

- Note: "@" in Python is the symbol for **matrix multiplication**.

In [48]:
import torch
tensor = torch.tensor([1, 2, 3])
print(tensor.shape)
print('Element-wise multiplication (tensor * tensor):', tensor*tensor)
print('Matrix multiplication (tensor.matmul(tensor)):', tensor.matmul(tensor)) # torch.matmul(tensor, tensor)
#torch.matmul(tensor, tensor)
# tensor @ tensor # not recommended
# Note: The difference between element-wise multiplication and matrix multiplication is the addition of values.
# The in-built torch.matmul() method is faster instead of doing by hand

torch.Size([3])
Element-wise multiplication (tensor * tensor): tensor([1, 4, 9])
Matrix multiplication (tensor.matmul(tensor)): tensor(14)


In [50]:
# One of the most common errors in deep learning (shape errors)
# Shapes need to be in the right way  
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32) # 3 x 2 Matrix

tensor_B = torch.tensor([[7, 10],
                         [8, 11], 
                         [9, 12]], dtype=torch.float32)  # 3 x 2 Matrix

torch.matmul(tensor_A, tensor_B) # (this will error) as matrix inner dimension should be same

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

In [None]:
# Above can be corrected using following

print(tensor_A) # 3x2
print(tensor_B.T) # 2x3 (.T is transpose of a matrix)
torch.matmul(tensor_A, tensor_B.T) # torch.mm also can be used

# Note: A matrix multiplication like this is also referred to as the dot product of two matrices.

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])


tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Neural networks are full of matrix multiplications and dot products.

- The torch.nn.Linear() module, also known as a **feed-forward layer or fully connected layer**, implements a matrix multiplication between an input x and a weights matrix A
y = x.A^T + b

Where:

x is the input to the layer (deep learning is a stack of layers like torch.nn.Linear() and others on top of each other).

A is the weights matrix created by the layer, this starts out as random numbers that get adjusted as a neural network learns to better represent patterns in the data (notice the "T", that's because the weights matrix gets transposed).

Note: You might also often see W or another letter like X used to showcase the weights matrix.
b is the bias term used to slightly offset the weights and inputs.

y is the output (a manipulation of the input in the hopes to discover patterns in it).

This is a linear function (something like $y = mx+b$ in high school or elsewhere), and can be used to draw a straight line!

