**Computer Vision**

It is an art of teaching a computer to have eyes. For example you can have model to decide whether or not an image is a cat or a dog - *binary classification*. 

Or maybe whether the image is of a dog, cat, or chicken - *multiclass classification*. 

Perhaps checking where an object is in an image? - *object detection*.

You can even check where different objects in an image can be seperated from each other *panoptic segmentation*.

![Display](images/03-computer-vision-problems.png "Computer Vision Examples")

**Applications of Computer Vision**

Have a phone? You already used it. Cameras and other photo apps use camera vision in enhancing and sorting images. Modern cars also use computer vision, such as those with Tesla vehicles in avoiding cars, staying in the appropriate lanes, etc. 
Manufacturers also use computer vision to identify faulty products. 

Basically, what we use our eyes for is a potential application for computer vision.

**Coverage of Chapter**

We're still going to apply all that we've learned before but adding in more 'complexities' each time. But the basic principle always stays the same. 

![Display](images/03-pytorch-computer-vision-workflow.png "How Computer Vision Works")

Specifically, we're looking at the following:

1. **Computer Vision Libraries in PyTorch** : We'll be going over built-in PyTorch computer vision libraries that can help us in working with images.

2. **Load Data** : Before we even get started, we'll always need some form of data. In this chapter we'll be using *FashionMNIST*.

3. **Prepare Data** : Once we have the images, we're going to need to prepare them. Just like in cooking. We'll load them into a PyTorch *DataLoader* so that we can use them in a training loop.

4. **Model 0 - Building Baseline** : We're working with a multiclass classification model to learn patterns in the data (images). We'll choose a loss function, optimizer, and create a training loop.

5. **Making Predictions & Evaluation Model 0** : Use the baseline model, create predictions, and evaluate these predictions.

6. **Setup Device Agnostic Code For Future Models** : We're going to have to make iterations of the baseline model so we need to cut some of the work off by shortening how we work with the device.

7. **Model 1 - Non-Linearity** : We saw the effects of having no non-linearity before and this time we'll be addding this here to see it's effects again.  We'll see if non-linearity will help improve our baseline model.

8. **Model 2 - Convolutional Neural Network (CNN)** : Here we're gonna get into some spicy new field with the introduction of the much powerful convolutional neural network architecture.

9. **Comparing Models** : By this point, we've built three different models and we're going to compare them with each other to see what works.

10. **Evaluating Best Model** : We'll be creating some predictions on random images and then evaluate the best model.

11. **Making Confusion Matrix** : In addition, we'll also be creating a confusion matrix to evaluate a classification model. This will serve as practice and a practical application of how a confusion matrix works and how it helps.

12. **Saving & Loading The Chosen Model** : Once we've done everything, we need to start saving our model and try loading it somewhere else so that we can actually work with it. 

**Computer Vision Libraries in PyTorch**

We'll be taking a look at at some PyTorch computer libraries that we should be aware of before actually writing code. This will serve as a good jumping off point to making the process a lot less confusing along the line.

**torchvision** : this module contains the datasets, model architectures, and image transformations that are vital for computer vision problems. 

**torchvision.datasets** : in this part, you can find many different datasets for a range of problems that you want to work with from image classification, object detection, image captioning, and even video classifcation + more. This also contains many base classes for making custom datasets that you want to make by yourself. 

**torchvision.models** : if you want to work with pre-trained models then you can grab one from here. This module contains well-performing and commonly used vision model architectures that are all implemented in PyTorch.

**torchvision.transforms** : when working with your dataset, you're going to need to transform them to workable data. Images get turned into numbers, processed, and even augmented. Before working with models, you'll need to work with the data first.

**torch.utils.Dataset** : simply the base dataset class for PyTorch.

**torch.utils.data.DataLoader** : this creates an iterable over a dataset. Created by calling *torch.utils.data.Dataset*

**NOTE:** *torch.utils.Dataset* and *torch.utils.DataLoader* classes are also capable of working with other types of data and not only for computer vision.

Now that those PyTorch libraries are now explained, it's time to get started by importing the dependencies needed to start working.

In [1]:
import torch
from torch import nn

import torchvision
from torchvision import datasets
from torchvision.transforms import ToTensor

import matplotlib.pyplot as plt

**Getting Dataset**

Before working with a computer vision problem, we'll need a computer vision dataset! Can't start working without data. So, starting off with FashionMNIST. 

MNIST stands for Modified National Institute of Standards and Technology 

The original MNIST dataset consists of thousands of examples of handwritten digits from 0 to 9 - which was in one of the PyTorch tutorials mentioned before. It was used to build computer vision models in identifying postal service numbers. 

FashionMNIST is in a smiliar setup. Except for the fact that it contains grayscale images of 10 different kinds of clothing.

![Display](images/03-fashion-mnist-slide.png "What's inside FashionMNSIT")

*torchvision.datasets* contains many more examples of datasets that can be used for practicing computer vision code on. FashionMNIST is an example of those datasets. It has 10 different image classes and because of that, it is considered as a multiclassification problem. 

We'll be creating a computer vision neural network that can identify the different styles of clothing in these images a bit later on. 

PyTorch has other common computer vision datasets in the *torchvision.datasets*

This includes the FasionMNIST in *torchvision.datasets.FashionMNIST()*

First, we're going to need to download it and we provide the following parameters:

*root: str* - indicating which folder do you want to download to.
*train: bool* - this asks whether you want the train (true) or test (false) dataset. 
*download: bool* - if you want to download the data.
*transform: torchvision.transformers* - what would be the transformations to be done with the data.
*target_transform* - allowing you to transform the targets (labels) of the data.

These parameters are also available with other *torchvision* datasets. 

In [2]:
# Setting up Training Data
train_data = datasets.FashionMNIST(
    root="data", # Where to download?
    train=True, # Split train / test data?
    download=True, # Should download?
    transform=ToTensor(), # images are in PIL format - Transform to PyTorch Tensors
    target_transform=None # transform labels?
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

In [3]:
# Check the first training sample
image, label = train_data[0]
image, label

(tensor([[[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0000, 0.0000, 0.0000, 0.0039, 0.0000, 0.0000, 0.0510,
           0.2863, 0.0000, 0.0000, 0.0039, 

**Input & Output Shapes of a Computer Vision Model**

So we have large amount of tensors representing our image which all lead to a single value for the target (label). Let's take a look at the shape of the image


In [4]:
image.shape

torch.Size([1, 28, 28])

Breaking this down into a simpler explanation. 

The shape is : [color_channels=1, height=28, width=28]

*color_channels=1* signifies that the image is shown in grayscale. 

![Display](images/03-computer-vision-input-and-output-shapes.png "Image Shapes")

Different problems will have different input and output shapes. But the main idea will always stay the same. Turn the data into numbers, create a model to find a pattern in those numbers, and convert those patterns into something meaningful.

If *color_channels=3* then the image will be displayed in pixel values of red, green, and blue - RGB. 

The order of the shape of the tensor that we've seen just now is referred to as *CHW* (Color Channels, Height, Width). 

However, there are debates on whether an image should be represented as *CHW* which means color channels first or *HWC* which is color channels last.

In addition, you will also encounter *NCHW* and *NHWC* formats. *N* just means the number of images. For example, let's say we have a *batch_size=32*. Our tensor shape would reflect that by [32, 1, 28, 28]. We'll go deeper into *batch_sizes* further on.

PyTorch's default is usually *NCHW* - channels first for many operators. BUT PyTorch also identifies that *NHWC* - channels last performs better and is considered to be the best practice. 

Though, in our case, this wouldn't matter as much since we're dealing with a small dataset and the models that we're making are small. 

But these are some important details to keep in mind in the future.

Let's take a look at more shapes of the data that we are working with. 

In [5]:
len(train_data.data), len(train_data.targets), len(test_data.data), len(test_data.targets)

(60000, 60000, 10000, 10000)

So we are working with 60,000 training samples and 10,000 testing samples. That's quite a jump from the previous work that we were doing and considering the fact that these are images, we're really taking it up a notch. 

But you'll see that these are pretty much the same thing at the end of the day.

Let's take a look at the classes that we have. We can look at them by using the *.classes* attribute with our 