<a href="https://colab.research.google.com/github/JohnVitz/Clothing-Classifier/blob/main/Clothing_Classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Quick Start Tutorial**

Reference: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html

Basic model for predicting clothes

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

In [2]:
## here we create our training data
training_data = datasets.FashionMNIST(
      root="data", # where to store the data (file location)
      train=True, #training vs test data
      download=True, #download dataset to root place
      transform=ToTensor(), #transform to tensor info
)

## here we create our test data
test_data = datasets.FashionMNIST(
    root="data",
    train=False, # this gives us test data
    download=True,
    transform=ToTensor()
)

100%|██████████| 26.4M/26.4M [00:01<00:00, 17.8MB/s]
100%|██████████| 29.5k/29.5k [00:00<00:00, 305kB/s]
100%|██████████| 4.42M/4.42M [00:01<00:00, 3.16MB/s]
100%|██████████| 5.15k/5.15k [00:00<00:00, 12.6MB/s]


In [3]:
## now we need to pass the dataset to the data loader
## this supports auotmatic batching, sampling and shuffling

batch_size = 64 #each element in the dataloader will return batch of 64 features

# create data loaders
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
  #wrap dataset into an iteratable
  #batches the data and shuffles it
  #x.shape = [64,1,28,28] --> 64 images, grayscale, each 28x28)
  #y.shape = [64] --> 64 labels

for X, y in test_dataloader: ## this iterates through the batch
  print(f"Shape of X [N, C, H, W]: {X.shape}")
    # x = batch of image tensors in the shape [N, C, H, W]
  print(f"Shape of y: {y.shape}{y.dtype}")
    # y = batch of integer labels (e.g. 0 = Tshirt)
  break ## this stops the loop once which gives you a peek at your data set



Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64])torch.int64


In [4]:
## creating our model

device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
print(f"Using {device} device")


## define our model
class NeuralNetwork(nn.Module):
  def __init__(self):
      super().__init__()
      self.flatten = nn.Flatten()

      self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512,10)
      )

  def forward(self, x):
    x = self.flatten(x)
    logits = self.linear_relu_stack(x)
    return logits

model = NeuralNetwork()
print(model)

Using cpu device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


#Creating Our Model

this is the full code block
```
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

```

## define our model
```
class NeuralNetwork(nn.Module):
```
nn.module is blueprint for all neural networks. By defining our class with this, it means we are extending this model rather than starting it from scratch

```
  def __init__(self):
```  
This is the constructor for your model. It defines the layers/strcuture.
In this code, we are saying that we are going to start defining the layers for this model.

```
super().__init__()
```
This calls the parent class constructor and make sure that your layers have all of the standard thigns

together, this is akin to saying
def __init__(self) --> I am opening a new franchise restaurant and want it to contain x, y, z features
super().__init__() --> I also want to make sure all the standard stuff is included to like a cash register, logo

```
self.flatten = nn.Flatten()
```
This is telling us to add this layer to the model and remember it. This is ultimately used as a layer to flatten.

Why do we do this separately? We could have added it to the layer sequence below but didnt because:
* nn.Sequential only works when each layer has the same input/output pattern. It mostly expects 2D
* This also makes it clear that its a pre-processing step
* It makes it easier to adjust in the future because you can just change how you flatten the image

```
self.linear_relu_stack = nn.Sequential(
```
A layer is ultimately just a processing step in a neural network. We pass something through and the layer transforms it some way.

--> self.linear_relu_stack = I am creating a bundle of layers (mini pipeline) and I'm naming it this linear_relu_stack so i can refer to later
*   Self means store it in the model for future reference
*.  linear_relu_stack is just a name that you can name anything else

--> Sequential means stack the layers and apply them one after another like a pipeline

nn.Linear = fully connected dense layer

*   nn.Linear = fully connect dense layer
*   nn.ReLU = activation function
*   nn.Conv2D = convolutional layer
*  mm.Flatten = multi-dim vector into a flat vector

## now lets break down the layers

```
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)

```
```nn.Linear(28*28, 512)```

This is a fully connected layer (aka Dense Layer).
It takes the 784-dimension output (28*28) and outputs a 512 dimensional vector. This smaller and more meaningful for the model to learn from
*   512 is more meaningful here because we are getting rid of the noise and summarizing the important parts
*  512 is arbitrary but common choice. Its a ```hyperparameter``` and chosen through trila, convetion, or habit
* 512 is a natural default when you want a big-ish layer. It gives enough to learn, but not slowing it down too much.
* This is a place where you could tune it yourself and see what the outputs look like. You can do hyper-parameter tuning

```nn.Linear()```

This is a Rectified Linear Unit activation function. It applies the function f(x) = max(0,x) to every number on the output. This ultimately turns every negative value 0 and doesn't touch positive values

This introduces non-linearity into the data and allows the model to learn complex relationships rather than linear ones
--> why does this function do this?

```nn.Linear(512,512)```

This is another fully connected layer. Takes the 512 numbers from previous layer and outputs 512 numbers.

This is considered a hidden layer. More layers = more capacity to learn. You're deepening the ability for network to learn complex patterns

```nn.ReLU()```

Same as before, adds non-linearity again
--> why do we need this again

```nn.Linear(512,10)```

This is the output layer of the network. It takes the 512 values from previous layer and outputs 10 values, one for each class of the FashionMNIST data set. This is a tensor with shape [10]

This output is called ```logits``` and is the raw score before turning it into a probability

```result = tensor([2.4, -1.2, 0.3..., 1.1])```

This is a 10-element tensor where each number is a score for each class. The highest one is the predicted label.

## now lets work through the forward component of this

```
    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits
```

```def forward(self, x):```

Every model need a forward method that tells you, when I send data through this model, this is what you should do.
```(self,x)``` really means when someone calls model.forward(x), use this specific model (self) and the input tensor 'x'
* X is usually the input tensor, in this case a batch of images
* X will be shaped like [batch_size, 1, 28, 28]. This is the actual image data that you are going to pass through

```self.flatten = nn.Flatten()```

This takes a 2D image (shape=(batch_size, 1, 28, 28) into a 1D vector that is (batch_size, 784)
* Neural networks expect flat vectors, not 2D vectors. So you reshape the image to a flat line

```logits = self.linear_relu_stack(x)```

You're passing the flattened input ```x``` through the layer stack that we created earlier

It runs the layer through each step that we outlined above. It transforms the input into a higher-level feature and outputs 10 numbers (logits)

```return logits```

This returns the final result of the model: a tensor in the shape [batch_size, 10]. This output is used for training or making predictions. We'll typically pass this into a loss function and identify the predicted class label.

## now lets examine the output

```
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)
```

```NeuralNetwork``` is the name out of the output class (e.g. this is a model of type Neural Network

```(flatten): Flatten(start_dim=1, end_dim=-1)``` is the first layer of our model that flatten it starting at dimension 1.
* This is because dim=0 is usually batch size
* Dimension 1-3 are the image itself
* so it flattens the image into a 1D vector

```(linear_relu_stack): Sequential(...)```
This is the stack of layers one by one. Bias = true means each output also contains a biar term


# **Optimizing Model Parameters **

In [5]:
## this is defining our loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

## this is the actual optimization function / training loop
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
      X, y = X.to(device), y.to(device)

      ## compute the prediction error
      pred = model(X)
      loss = loss_fn(pred,y)

      ##backpropagation
      loss.backward()
      optimizer.step()
      optimizer.zero_grad()

      if batch % 100 == 0:
        loss_val, current = loss.item(), (batch + 1) *len(X)
        print(f"loss: {loss_val:>7f} [{current:>5d}/{size:>5d}]")


## Code Block Explanation ##

This is the full code block that we are working with. Lets go line by line through this.
```
## this is defining our loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

## this is the actual optimization function / training loop
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
      X, y = X.to(device), y.to(device)

      ## compute the prediction error
      pred = model(X)
      loss = loss_fn(pred,y)

      ##backpropagation
      loss.backward()
      optimizer.step()
      optimizer.zero_grad()

      if batch % 100 == 0:
        loss, current = loss.item(), (batch + 1) *len(x)
        print(f"loss: {loss_val:>7f} [{current:>5d}/{size:>5d}]")
```

**Defining our loss function**
```
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
```
```loss_fn = nn.CrossEntropyLoss()``` this is also known as the objective function
* Cross-Entropy Loss is the standard for multi-class classification problems.
* This essentially compares the models predicted scores (logits) to the true labels and calculates how wrong the predictions are

Here are some different types of loss functions for classifications and their common use cases:
* nn.Cross-EntropyLoss() --> most common for multi-class classification problems. Combeins LogSoftmax and Negative Log Likelihood under the hood
* nn.BCELoss() - Binary Cross-Entropy Loss --> most commonly use for binary classification. Works with sigmoid() at the output layer
* nn.BCEwithLogitsLoss() - same as above, but includes the sigmoid step inside, more stable
* nn.NLLLoss() -- Negative Log Likelihood -- used when your model ends with LogSoftMax, less common because CrossEntropyLoss handles this

Here are some different loss functions for regression
* nn.MSELoss() -- Mean Squared Error --> common tasks like predicting price or temp, measures the average of the squared differences between predictions and targets
* nn.L1Loss() -- Mean Absolute Error --> Measures the avg of the absolute differences between predictions and targets. Less sensitive to outliers than MSE.
* nn.SmothL1Loss() -- Huber Loss --> A hybrid between the MSE and L1

**Defining our optimizer**

```optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)```

this is the "coach" to your model that helps it learn better by adjusting the weights (models parameters) based on how wrong its predictions are

```torch.optim.SGD``` is a Stochastic Gradient Descent --> you're saying "Use SGF to help me update my model's weights
* This method udpates the models weights based on gradients that were calculated through backpropagation
* Stochastic = random batches not the full data set which makes it faster

```model.parameters()``` grabs all the weights and biases in your model that need to be trained ---> can we get a reminder on what parameters are in the original model
* These are the numbers inside your layers
* You're saying here are the knows you can turn

Quick refresher on what parameters are. These are the info in your fully connected linear layers. Other layers don't have parameters.
* weights = how much importance to give to each input
* biases = startpoint or adjustment for the neuron
* ```nn.Linear(784,512)``` - weights - matrix shaped: [512,784] & bias vector shaped [512]
* in our function, we will be adjusting 3 weights, and 3 biases to make sure they are right

```lr=1e-3``` - this is the learning rate of the model each time updates the weights. Scientific notation
* Its a small number on purpose to prevent the model from taking too big of a wrong step
--> what are other options for the learning rate?

**Define our training loop**

```
## this is the actual optimization function / training loop

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
      X, y = X.to(device), y.to(device)
```

```def train(dataloader, model, loss_fn, optimizer):``` you are defining a function that is called "train"
* dataloader = batch by batch provider of training data
* model = neutral network you want to train
* loss_fn = how we measure how wrong the model is
* optimizer = how the model should imporve/adjust weights

```size = len(dataloader.dataset)``` tells us how many total examples in the dataset so we can print progress

```model.train()``` puts the model in training mode, which is important for layeers like dropout and batchnorm which treat inputs differently in training or testing

```for batch, (X, y) in enumerate(dataloader):``` tells you to go through  dataset one batch at a time. Each batch gives you x group of images an y true labels for those images
* enumerate adds an index to any iterable so you can keep track of what loop you are on

```X, y = X.to(device), y.to(device)``` moves the batch of images to the right device. GPU if available or CPU if not.

**Computer our Prediction Error**

```
   ## compute the prediction error
      pred = model(X)
      loss = loss_fn(pred,y)
```

```pred = model(x)``` send the images to the model to get predictions. The output of this is the logits for each class

```loss = loss_fn(pred,y)``` calculate the loss or how wrong the model is with the goal of reducing this number over time



**Backpropagation and Optimization**

```
##backpropagation
      loss.backward()
      optimizer.step()
      optimizer.zero_grad()
```

```loss.backward()``` -- look at the loss and figure out how to adjust the models internal weights --> this results in the gradients, the direct amounts to adjust each parameter

```optimizer.step()``` -- This is the actual learning step. Use the gradients to update the weights.

```optimizer.zero_grad()``` -- Clear out the old gradients from the previous round. If you dont do this the gradients will stack up

**Progress**

```
if batch % 100 == 0:
    loss, current = loss.item(), (batch + 1) * len(X)
    print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")
```
This only runs every 100 batches (% means remainder) and prints out the current loss and how many examples we've trained on
* % gives you the remainder when dividing 2 numbers. By saying batch % 100 we're saying only run the print function when batch is divisible by 100
* loss.item is a tensor that is converted to a python number
* (batch+1) * len(x) is the current batch number * the number of examples in a batch which tells you how many total examples you've trained so far

**Check Performance Against Test Data Set to Ensure that its learning**

In [6]:
def test(dataloader, model, loss_fn):
  size = len(dataloader.dataset)
  num_batches = len(dataloader)
  model.eval()
  test_loss, correct = 0, 0
  with torch.no_grad():
      for X, y in dataloader:
        X, y = X.to(device), y.to(device)
        pred = model(X)
        test_loss += loss_fn(pred, y).item()
        correct += (pred.argmax(1)==y).type(torch.float).sum().item()
  test_loss /= num_batches
  correct /= size
  print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")


At a high level, this step runs the model on the test data and measures (1) how wrong the predictions are and (2) how many predictions were correct

**Setup for Test Model**

```def test(dataloader, model, loss_fn):``` this defines a function called test() and uses a dataloader for your test data, your model, and the same loss function you're using

```  size = len(dataloader.dataset)``` gets the total number of test examples (in this case its 10k)

```  num_batches = len(dataloader)``` gets the number of batches in the dataloader, which in this case is 157 if they each have 64. This is used to compute the avg test loss

```model.eval()``` switch the model to eval mode which will impact how certain layers function by disabling them

```test_loss, correct = 0, 0``` initializing 2 variables that tell us the test_loss across all batches and number of correct across all batches

```with torch.no_grad():``` tells the model, dont track gradients or do backpropagation, we're just evaluating

**Loop through the Test Data**


```for X, y in dataloader:``` --> you're telling the model to use the batches of images and the labels for those images
```pred= model(x)``` do the forward pass trhough your model to get the logits or raw predictions

```test_loss += loss_fn(pred,y).item()```
This calculates the loss for a current batch.
* loss_fn gives you the lose tensor
* item() converts it to a number
* += adds it to running list for test_loss

```correct += (pred.argmax(1) == y).type(torch.float).sum().item()```
* pred.argmax(1) -- for each prediction, finds the index with the highest score. It finds the best class for each image
* (pred.argmax(1) == y) --> compares the predicted class to the true class which will return a True/False value
* .type(torch.float) converts the boolean to 0 or 1
* .sum().item() --> adds up the correct predictions in the batch and converts the result to a number you add to correct

```test_loss /= num_batches```
We devide the test_loss for all of the batches by the number of batches to get the avg batch loss

```correct /= size``` means we divide the number of correct predictions by total number of test examples


In [7]:
epochs = 50
for t in range(epochs):
  print(f"Epoch {t+1}\n-------------------------------")
  train(train_dataloader, model, loss_fn, optimizer)
  test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.302102 [   64/60000]
loss: 2.288951 [ 6464/60000]
loss: 2.271698 [12864/60000]
loss: 2.268208 [19264/60000]
loss: 2.249953 [25664/60000]
loss: 2.230263 [32064/60000]
loss: 2.228659 [38464/60000]
loss: 2.202903 [44864/60000]
loss: 2.195704 [51264/60000]
loss: 2.164429 [57664/60000]
Test Error: 
 Accuracy: 46.7%, Avg loss: 2.157645 

Epoch 2
-------------------------------
loss: 2.167465 [   64/60000]
loss: 2.149531 [ 6464/60000]
loss: 2.101513 [12864/60000]
loss: 2.121039 [19264/60000]
loss: 2.058961 [25664/60000]
loss: 2.011565 [32064/60000]
loss: 2.034827 [38464/60000]
loss: 1.960528 [44864/60000]
loss: 1.967425 [51264/60000]
loss: 1.889371 [57664/60000]
Test Error: 
 Accuracy: 50.1%, Avg loss: 1.888002 

Epoch 3
-------------------------------
loss: 1.923011 [   64/60000]
loss: 1.880771 [ 6464/60000]
loss: 1.779556 [12864/60000]
loss: 1.824528 [19264/60000]
loss: 1.701186 [25664/60000]
loss: 1.666509 [32064/60000]
loss: 1.687850 [38464/

In [8]:
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


In [9]:
model = NeuralNetwork().to(device)
model.load_state_dict(torch.load("model.pth", weights_only=True))

<All keys matched successfully>

In [10]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"
