In [9]:
import os
from IPython.display import Image, display

<img src="images/tensors_as_sheet.png" alt="Image" width="400">

# Why use Machine Learning?

To understand why we might want to use machine learning to solve a given problem, it's important to first understand how this problem would be solved by traditional programming. Take for example, a program designed to look at an image, and output simply whether the photo contains a bird or does not.

Finding feature descriptors for even a single bird would prove difficult if not impossible. For a blue jay the following might be a start:

- Blue 
- Feathered
- Beak

So first you'd have to identify a discrete shape that is blue. Then you'd have to craft something to detect blue jay feathers, and finally something to detect beaks. So lets imagine you've somehow crafted a perfectly reliable blue-jay detection function that when passed a picture of a bluejay can say that it is indeed a bluejay 100% of the time. Consider these problems:

- Some other birds are also blue
- Blue-jay feathers don't look like goose feathers
- If the image is distorted or the lighting is wrong, the parameters for blue and feather-like detectors break
- You have to do this all over again for every single bird that exists

You can see how the amount of code required to describe and classify whether a photo contains a bird or not could quickly get out of hand. In this case almost certainly impossible to code in a way that works with most images reliably.

<img src="images/bird_classifier.png" alt="bird" width="200" height="500">

The fundamental difference between machine learning and traditional programming is the following:

_In traditional programming you take the input data, and craft a bunch of different rules or operations that will create the desired output_

_In machine learning, you take the input data, and the desired output, and the machine learning model creates the rules for you_

# What is machine learning good at? What is it currently used for?

Machine learning excels at problems that have lots of difficult to define rules, think:

- Self-Driving vehicles (how many thousands of rules would one have to program to solve this task?)
- Environments that are very different in each scenario (like the bird descriptor above)
- Problems with huge datasets

Currently, deep learning is used in nearly every industry you interact with, just maybe not in ways you've ever considered to be "AI" or deep learning. Here are a few examples:

- Recommendation algorithms (netflix, youtube, etc)
- Language translation (google translate)
- Sports broadcasting (player tracking)
- Natural language
  - Chat GPT
  - Spam filters
  - Voice activation ("hey siri" , "hey alexa")
- Stock trading
- Sports betting

# Some quick definitions

These are all examples of machine learning architectures you may have heard of. 

### Traditional (Shallow)
- Examples:
  - Random Forest
  - Gradient Boosting Machine (XGBoost)
  - Bayes
  - Nearest Neighbor
  - Support Vector Machine
- Good for structured data
  - Spreadsheets

### Deep Learning
- Examples:
  - Neural Networks (NN)
  - Fully Connected Neural Networks (FCCN)
  - Convolutional Neural Network (CNN)
  - Recurrent Neural Network (RNN)
  - Feature Pyramid Network (FPN)
  - Transformers
- Good for unstructured data
  - Language, speech, images, audio


# What is a neural network? 

A neural network is a type of program made up of layers of simple units called neurons that are connected together. Each neuron takes in information, processes it, and passes it on to the next layer. By adjusting the connections between neurons, a neural network can learn to recognize patterns in data, like identifying objects in pictures or understanding human speech.

### How do neural networks function?

- **Inputs and Tensors**:
  - A neural network takes inputs, which are numerical data, and organizes them into a matrix (or tensor). Tensors are generalized matrices that can be in multiple dimensions.

- **Mathematical Operations and Features**:
  - These input tensors undergo a series of mathematical operations, such as matrix multiplications, additions, and the application of activation functions.
  - Each layer is usually a combination of linear (straight lines) and non-linear functions.
  - Through these operations, the network extracts patterns from the data. These patterns are often referred to as "features."
    - Also can be called embedding, feature vectors, weights, etc (not all exactly the same but all terms are used)

- **Weights and Representations**:
  - The neural network learns these patterns by adjusting "weights," which are the parameters within the model. Each connection between neurons in different layers of the network has an associated weight.
  - As the input data passes through the network, these weights are applied to transform the data at each layer, gradually building a more complex representation of the input data.

- **Loss Function and Optimization**:
  - The network's performance is measured using a loss function, which quantifies the difference between the network's predictions and the actual target values.
  - During training, an optimization algorithm (such as gradient descent) adjusts the weights to minimize the loss function. The goal is to find the optimal set of weights that result in the lowest possible loss, indicating that the model is making accurate predictions.

<img src="images/nn_diagram.png" alt="nn" width="550">

# What is PyTorch and why should we use it?

- Pytorch is the most popular research deep learning framework 

- Leverages GPUs to run fast deep learning code with python 

- Access to many prebuilt deep learning models 

- Manages the entire stack from preprocessing, model creation, training, and deployment 

- 68% of repositories on paperswithcode.com were written in PyTorch

## Why do we do calculations on GPUs? 

Originally designed for rendering graphics in video games, gpus have thousands of processing cores that are specifically designed to run mathematical calculations extremely quickly. This ability to do math quickly is exactly what we need to quickly train and infer deep learning models 

# A Very brief introduction to vectors and tensors

### What is a vector?  

A vector is a mathematical object that has both magnitude (length) and direction. 

Vectors are composed of basis vectors. Basis vectors have a length of 1 in whatever unit of measurement your vector is in (e.g., millimeters, miles per hour) and point in the standard directions of the vector space (e.g., along the x, y, and z axes in 3D space). 

A vector component is measured by projecting the vector onto the axes. For example, to find the X component, imagine shining a flashlight along the X axis. The shadow of the vector on this axis is the X component. The Y component is the number of basis vectors needed to reach the tip of the vector along the Y axis. 

In other words, the components are the amounts of the basis vectors required to span from the vector's start to its end along each axis, representing its "width" and "height." 

Once you have the components of a vector, you can represent the vector solely using these components. For instance, a vector $\mathbf{v}$ in 2D space with components $v_x$ and $v_y$ can be written as:

$$
\mathbf{v} = v_x \mathbf{i} + v_y \mathbf{j}
$$

Here, $\mathbf{i}$ and $\mathbf{j}$ are the unit vectors along the x and y axes, respectively.


 

### What is a tensor?  

Tensors are generalized mathematical objects that can represent vectors, scalars, and more complex structures. 

| Name   | What is it?                                                                 | Number of dimensions                                                     | Lower or upper (usually/example) |
|--------|-----------------------------------------------------------------------------|--------------------------------------------------------------------------|----------------------------------|
| scalar | a single number                                                             | 0                                                                        | Lower (a)                        |
| vector | a number with direction (e.g., wind speed with direction) but can also have many other numbers | 1                                                                        | Lower (y)                        |
| matrix | a 2-dimensional array of numbers                                            | 2                                                                        | Upper (Q)                        |
| tensor | an n-dimensional array of numbers                                           | can be any number, a 0-dimension tensor is a scalar, a 1-dimension tensor is a vector | Upper (X)                        |


### What makes tensors so powerful? 

Different observers might use different ways (basis vectors and components) to describe the same event, but when these ways are combined, everyone agrees on the overall description. This agreement happens because the basis vectors and components change in opposite ways between reference frames, which means the final combined representation remains the same for all observers, regardless of their point of view. 

<table>
    <tr>
        <td><img src="images/nn_diagram.png" alt="nn" width="550"></td>
        <td><img src="images/tensor_examples.png" alt="nn" width="550"></td>
        <td><img src="images/vector_components.png" alt="nn" width="550"></td>
    </tr>
</table>



## Most importantly, tensors can represent _anything_ 

For example, this tensor could be the sales numbers for a steak and almond butter store. We'll get into the weeds of creating tensors further along, but i wanted to provide these examples to help solidify the use cases for tensors in machine learning.

In [19]:
data = torch.tensor([[[1, 2, 3],
                      [3, 6, 9],
                      [2, 4, 5]]])
print(data)
print(data.shape)
print(data.ndim)

tensor([[[1, 2, 3],
         [3, 6, 9],
         [2, 4, 5]]])
torch.Size([1, 3, 3])
3


<img src="images/tensor_sheet.png" alt="nn" width="550">

This tensor could represent an RGB image

In [20]:
image = torch.rand(3, 224, 224)
print(image)
print(image.shape)
print(image.ndim)

tensor([[[0.2432, 0.7010, 0.7721,  ..., 0.2164, 0.6718, 0.1104],
         [0.3590, 0.1136, 0.6191,  ..., 0.3251, 0.9227, 0.7909],
         [0.9908, 0.5999, 0.8407,  ..., 0.0043, 0.3359, 0.9595],
         ...,
         [0.2602, 0.8757, 0.0726,  ..., 0.6321, 0.2288, 0.7053],
         [0.0192, 0.9383, 0.8483,  ..., 0.6931, 0.0705, 0.4954],
         [0.7864, 0.0267, 0.7485,  ..., 0.6296, 0.4062, 0.9518]],

        [[0.2406, 0.5715, 0.8695,  ..., 0.5303, 0.6114, 0.5407],
         [0.9878, 0.0021, 0.0584,  ..., 0.3154, 0.8661, 0.7833],
         [0.3563, 0.2200, 0.3609,  ..., 0.9002, 0.4999, 0.9957],
         ...,
         [0.0317, 0.1195, 0.1462,  ..., 0.3885, 0.0491, 0.9648],
         [0.1645, 0.2449, 0.9733,  ..., 0.5273, 0.4287, 0.5294],
         [0.4534, 0.3744, 0.7875,  ..., 0.6244, 0.4987, 0.9975]],

        [[0.0131, 0.8527, 0.5348,  ..., 0.3034, 0.4661, 0.0858],
         [0.4131, 0.1749, 0.3724,  ..., 0.4518, 0.4753, 0.3724],
         [0.7175, 0.4745, 0.8599,  ..., 0.8641, 0.8318, 0.

<img src="images/image_as_tensor.png" alt="nn" width="550">

# ok lets create some tensors

### Creating a scalar (or a 0D tensor)

In [25]:
# a scalar is a single numbered tensor

scalar = torch.tensor(5)
print(scalar)
print(f"scalar shape: {scalar.shape} scalar ndim {scalar.ndim}")

tensor(5)
scalar shape: torch.Size([]) scalar ndim 0


> A scalar is the simplest form of a tensor. It is a single number and does not have any dimensions. In other words, a scalar has zero axes.

### Creating a vector (or a 1D tensor)

In [23]:
# a vector is a single dimensioned tensors of an arbitrary amount of numbers
vector_one = torch.tensor([1, 2])
vector_two = torch.tensor([1, 5, 8, 15, 25])

print(vector_one)
print(f"vector_one shape: {vector_one.shape} vector_one ndim {vector_one.ndim}")
print(vector_two)
print(f"vector_two shape: {vector_two.shape} vector_two ndim {vector_two.ndim}")

tensor([1, 2])
vector_one shape: torch.Size([2]) vector_one ndim 1
tensor([ 1,  5,  8, 15, 25])
vector_two shape: torch.Size([5]) vector_two ndim 1


**Dimensionality**

> Notice how the number of dimensions for both vectors remain the same, even if their contents vary in length or values. The "dimensionality" of a tensor refers to the number of axes it has. For example, a 2D tensor (or matrix) has two dimensions: rows and columns. In this case our tensor contains only rows, making it a 1D tensor. Vectors are always 1D tensors, since they always contain only rows.
>
> An easy trick to tell the number of dimensions a tensor in PyTorch is to count the number of square brackets on the outside ([), counting only one side. Our vector example only has one (just to the left of the 1).

**Shape**

> We also print out the 'shape' of our tensor with
> 
```python
 
 print(vector_one.shape)
 
```
> 
> The 'shape' of a tensor describes the dimensions of the tensor. For a 1D tensor (vector), the shape is represented by a single value, which is the number of elements in the vector. We can print out the shape of our tensor using the shape attribute.
> 
> For example, if we have a vector with 5 elements, its shape will be (5,), indicating that it has 5 elements in one dimension.
> To get the shape of a tensor manually, you would count the number of elements in the vector.

### Creating a matrix (or a 2D tensor)

In [27]:
# a matrix is a two-dimensional tensor consisting of rows and columns

matrix = torch.tensor([[7, 8], 
                       [9, 10]])
print(matrix)
print(f"matrix shape: {matrix.shape} matrix ndim {matrix.ndim}")

tensor([[ 7,  8],
        [ 9, 10]])
matrix shape: torch.Size([2, 2]) matrix ndim 2


> **Note:** In this example, our matrix has 2 rows and 2 columns. This is reflected in its shape, which is (2, 2). The first number in the shape tuple represents the number of rows, and the second number represents the number of columns. Since we have both rows and columns, our matrix is two-dimensional. If we added more rows or columns, the shape would change accordingly, but it would still be a two-dimensional tensor because it has both rows and columns

### Creating a tensor 

>**Note**: Tensors are considered "n dimensional" (sometimes written as nD), meaning they can have any number of dimensions. In this instance we'll just create a simple 3 dimensional tensor

In [30]:
# 3D Tensor
tensor_3d = torch.tensor([[[1, 2, 5], 
                           [3, 4, 6], 
                           [7, 8, 9]]])
print(tensor_3d)
print(f"tensor_3d shape: {tensor_3d.shape}, tensor_3d ndim: {tensor_3d.ndim}")


tensor([[[1, 2, 5],
         [3, 4, 6],
         [7, 8, 9]]])
tensor_3d shape: torch.Size([1, 3, 3]), tensor_3d ndim: 3


> **Note:** In this example, our 3D tensor has 1 layer, each with 3 rows and 3 columns. This is reflected in its shape, which is (1, 3, 3). The first number represents the number of layers (if this tensor had two layers, it would be (2, 3, 3), imagine another tensor "stacked on top of it"), the second number represents the number of rows, and the third number represents the number of columns. Since we have a single layer of rows and columns, our tensor is three-dimensional.

## Random tensors
We've now created some basic tensors, established that tensors represent some form of data, and learned how machine learning models such as neural networks manipulate and seek patterns within tensors.

But when building machine learning models with PyTorch, it's rare you'll create tensors by hand (like what we've being doing).

Instead, a machine learning model often starts out with large random tensors of numbers and adjusts these random numbers as it works through data to better represent it.

In essence:

`Start with random numbers -> look at data -> update random numbers -> look at data -> update random numbers...`

As a data scientist, you can define how the machine learning model starts (initialization), looks at data (representation) and updates (optimization) its random numbers.

We'll get hands on with these steps later on.

For now, let's see how to create a tensor of random numbers.

We can do so using torch.rand() and passing in the size parameter.

In [31]:
# Create a random tensor of size (3, 4)
random_tensor = torch.rand(size=(3, 4))
print(random_tensor)
print(f"random_tensor shape: {random_tensor.shape} random_tensor ndim {random_tensor.ndim}")

tensor([[0.4609, 0.7994, 0.5346, 0.8761],
        [0.4840, 0.5521, 0.6118, 0.0815],
        [0.9449, 0.1316, 0.6658, 0.0100]])
random_tensor shape: torch.Size([3, 4]) random_tensor ndim 2


## Tensors filled with zeros or ones

Often you'll use a tensor filled with either zeroes or ones to mark where a model should or should not look for features. PyTorch thankfully has two simple methods for accomplishing this task as well

In [33]:
# Create a tensor of all zeros
zeros = torch.zeros(size=(3, 4))
print(zeros)
print(f"zeros shape: {zeros.shape} zeros ndim {zeros.ndim}")

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])
zeros shape: torch.Size([3, 4]) zeros ndim 2


In [34]:
# Create a tensor of all ones
ones = torch.ones(size=(3, 4))
print(ones)
print(f"ones shape: {ones.shape} ones ndim {ones.ndim}")

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])
ones shape: torch.Size([3, 4]) ones ndim 2


## Creating tensors with the same shape and type as another tensor

For example, a tensor of all zeros with the same shape as a previous tensor.

To do so you can use `torch.zeros_like(input)` or `torch.ones_like(input)` which return a tensor filled with zeros or ones in the same shape as the input respectively.

In [38]:
# we'll create a 2 dimensional tensor of size (3, 4) filled with random numbers, and then create a tensor of the same shape and dimensionality
# but filled with zeros

s_tensor = torch.rand(3, 4)
s2_tensor = s_tensor #copying s_tensor so i can print its original values
s_zeros = torch.zeros_like(input=s2_tensor) # will have same shape

print(f"s_tensor shape: {s_tensor.shape} s_tensor ndim {s_tensor.ndim}")
print(s_tensor)

print(f"s_zeros shape: {s_zeros.shape} s_zeros ndim {s_zeros.ndim}")
print(s_zeros)

s_tensor shape: torch.Size([3, 4]) s_tensor ndim 2
tensor([[0.1160, 0.2033, 0.4449, 0.9072],
        [0.9191, 0.9119, 0.5404, 0.5092],
        [0.5843, 0.7298, 0.2457, 0.6130]])
s_zeros shape: torch.Size([3, 4]) s_zeros ndim 2
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])
