## The MNIST Dataset

When we start to talk about ML, one of the two most frequently used datasets to illustrate the power of machine learning is the MNIST dataset (the other is the CIFAR dataset).

The MNIST dataset is composed of image samples of handwritten digits (0-9) that each have a square like resolution of 28x28 pixels.

- The low definition of the images contributes to making the image processing task for the solution much simpler.
- The `torchvision` based flavor of the dataset comes with **70000** images.

When we choose to use any dataset we need to understand the concept of **normalizing the data.**

---

### What is normalization?

Normalization is a fundamental preprocessing step in machine learning, especially when dealing with neural networks. It helps improve training stability, speeds up convergence, enhances performance, mitigates gradient issues, and improves robustness to variations in the input data. By understanding and applying normalization techniques, you can build more effective and reliable machine learning models.

- Suppose the data for colored pixels. Each pixel typically has 4 channels (red, green, blue, alpha)

- - The three color channels typically have values from 0-255 (for 8 bit colors).
- - The alpha channel is typically on a straightforward range from 0-1 which indicates transpareny of the pixel itself.

**For the MNIST dataset each pixel only has one channel (since it is monochrome) which goes from black (0) to white (255).**

When we divide this single channel's values by 255 we get a normalized scale from 0 to 1 which helps a lot when dealing with training for reasons which will be discussed later.

In Python and PyTorch / Tensorflow more specifically the notion of **tensors** is also very crucial.

---

### What are tensors?

A tensor is a datatype in both PyTorch and Tensorflow

In TensorFlow 2.x, tensors are stored and manipulated using NumPy arrays under the hood. This means that tensors are essentially views or references to NumPy arrays within TensorFlow's internal data structures. However, you can use PyTorch tensors in TensorFlow code if needed.

In PyTorch, tensors are stored and managed as part of the PyTorch library itself. They are not directly related to NumPy arrays, although they can be converted between them for interoperability.

A tensor is simply a multi-dimensional array.

From Wikipedia:

- In mathematics, a tensor is an algebraic object that describes a **multilinear relationship between sets of algebraic objects related to a vector space**. Tensors may map between different objects such as vectors, scalars, and even other tensors.

- In its simplest form a tensor can be a multidimensional array of scalars.

However keep in mind that a tensor could represent more than just a singular set of values.

- Consider the example of a colored image:

- - **Each pixel can be represented as an array of four channels (r, g, b, a)**

- - **We can also represent the location of each pixel in the image as an array two values (x, y)**

Ultimately **we could create a tensor as follows: `[x, y, [r, g, b, a]]` and each tensor would capture all the relevant image of any given pixel**!

- For the case of MNIST data each pixel only has one channel, lets call it hue, then the tensor that represents each pixel would be: `[x, y, hue]`

#### Creating Tensors

```python

# TensorFlow 2.x
import tensorflow as tf

tf_tensor = tf.constant([1, 2, 3], dtype=tf.int32)
print(tf_tensor.dtype)  # Output: <dtype: 'int32'>

# PyTorch
import torch

pytorch_tensor = torch.tensor([1, 2, 3], dtype=torch.int32)
print(pytorch_tensor.dtype)  # Output: Int64Tensor

```

#### Converting to tensors

```python

# Convert TensorFlow tensor to PyTorch tensor
pytorch_tensor = tf.convert_to_tensor([1, 2, 3], dtype=tf.int32)
print(pytorch_tensor.dtype)  # Output: Int64Tensor

# Convert PyTorch tensor to TensorFlow tensor
tf_tensor = torch.tensor([1, 2, 3], dtype=torch.int32)
print(tf_tensor.dtype)  # Output: <dtype: 'int32'>


```

### Converting between PyTorch and TensorFlow tensors.

In [3]:
import tensorflow as tf
import torch
import numpy as np

# Create a PyTorch tensor
pytorch_tensor = torch.tensor([1, 2, 3], dtype=torch.int32)

# Convert the NumPy array to a TensorFlow tensor
tf_tensor = tf.convert_to_tensor(pytorch_tensor.numpy())

print(pytorch_tensor.dtype) # Output: Int64Tensor
print(tf_tensor.dtype) # Output: <dtype: 'int32'>

# Create a TensorFlow tensor
tf_tensor = tf.constant([1, 2, 3], dtype=tf.int32)

# Convert the TensorFlow tensor to a NumPy array
numpy_array = tf_tensor.numpy()

# Convert the NumPy array back to a PyTorch tensor
pytorch_tensor = torch.tensor(numpy_array, dtype=torch.int32)

print("TensorFlow Tensor:", tf_tensor.dtype)      # Output: <dtype: 'int32'>
print("PyTorch Tensor:", pytorch_tensor.dtype)  # Output: Int64Tensor

torch.int32
<dtype: 'int32'>
TensorFlow Tensor: <dtype: 'int32'>
PyTorch Tensor: torch.int32


### Understanding the MNIST dataset (PyTorch)

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

transform = transforms.Compose([
    transforms.ToTensor(), # Turn the input data into tensors
    transforms.Normalize((0.5, ), (0.5, )) # Normalize the data
])

# Create the train and test portions of the data under ./data and apply the transformations to both!
train_dataset = datasets.MNIST(root="./data", train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root="./data", train=False, download=True, transform=transform)

# Define the batch size for each training loop and use it to instantiate DataLoader objects that work hand in hand with the training loop.
batch_size = 64
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

### Define the basic linear model

In [6]:
import torch.nn as nn

# If anything here confuses you, please refer back or re-read the math_for_ml notebook!

class LinearModel(nn.Module):
    def __init__(self):
        super(LinearModel, self).__init__()
        # The first fully connected layer will take in 28x28 = 784 pixels as a flattened layer and output 64 values
        # This means that each of the 784 pixels will be connected to each of the 64 "middle layer" nodes (hence fc for fully connected)
        self.fc1 = nn.Linear(784, 64)
        # Then each of the 64 "middle layer" or hidden nodes will connect to each of the 10 output nodes. Why 10? Well the hope is that
        # each of the ten outputs will capture activate most with a respective digit (0 - 9)
        self.fc2 = nn.Linear(64, 10)
    
    def forward(self, x):
        # Applying relu to the outputs of the first fully connected layer pass
        x = torch.relu(self.fc1(x))
        # Perform a simple straight pass through the second layer
        x = self.fc2(x)