In [None]:
#!pip install portalocker sentencepiece sacremoses transformers

## 5.3 Lab 3 / Case 3: Classifying Images

Now it is YOUR turn to classify some images!

First, you will need to choose and load a [model for image classification](https://pytorch.org/vision/stable/models.html#classification) and its corresponding [weights](https://pytorch.org/vision/stable/models.html#table-of-all-available-classification-weights).

Don't forget to retrieve the prescribed transformation function or model corresponding to the model you chose. Also, take a look at its size and accuracy, so you have an idea of its performance.

TIP: try a very small model (e.g. MobileNet) and a very large model (e.g. VGG) and see how long they take to run inference on your images.

### 5.3.1 Load Weights

Load the weights from the model of your choice into its own object:

In [None]:
from torchvision.models import get_weight

weights = ...

### 5.3.2 Load Model

Load the model using Torch Hub and the weights you've just loaded:

In [None]:
import torch
repo = 'pytorch/vision'

model = ...

### 5.3.3 Extract Metadata

Retrieve the categories used to pretrain the model, and the transformation function that should be applied to the input images:

In [None]:
categories = ...

In [None]:
transforms_fn = ...
transforms_fn

Let's inspect the number of parameters and the metrics of the model you chose:

In [None]:
weights.meta['num_params']/1e6

In [None]:
weights.meta['_metrics']

### 5.3.4 ImageFolder Dataset

To make this lab more entertaining and fun, let's build our own dataset of images from scratch! We'll be using PyTorch's `ImageFolder` dataset, which is a very convenient way of building a dataset from a collection of images organized in folders, one for each category.

But, first, we need to download some images! Keep in mind that these models are trained on the ImageNet 1K dataset, so we should choose images that fit into one or more of these categories. In the next part of the course, we'll learn how to fine-tune them so we can classify images into new categories.

#### 5.3.4.1 ImageNet Dataset

Unfortunately the original ImageNet dataset isn't publicly available, only the original URLs to the images were published. If you're a researcher, though, it is possible to request access to versions of this dataset.

#### 5.3.4.2 Downloading and Saving Images

The function below, inspired by Nate Raw's [HuggingPics](https://github.com/nateraw/huggingpics) project, uses HuggingFace's experimental search API to retrieve and save images files to the disk. 

This is just a quick and dirty way of retrieving a small collection of images that fall under the same search term. As long as your images are neatly organized in folder, one folder for each category, you're good to go.

In [None]:
import os
import requests
from io import BytesIO
from PIL import Image

def get_image_from_url(url, headers=None):
    resp = requests.get(url, headers=headers)
    resp.raise_for_status()
    img = Image.open(BytesIO(resp.content))
    return img

def save_images(folder, search_term, count=10):
    if not os.path.exists(folder):
        os.mkdir(folder)
        
    SEARCH_URL = "https://huggingface.co/api/experimental/images/search"

    params = {"q": search_term, "license": "public", "imageType": "photo", "count": count}

    resp = requests.get(SEARCH_URL, params=params)
    if resp.status_code == 200:
        content = resp.json()['value']
        urls = [img['thumbnailUrl'] for img in content]

        folder = os.path.join(folder, search_term)
        if not os.path.exists(folder):
            os.mkdir(folder)

        i = 0
        for url in urls:
            try:
                img = get_image_from_url(url)
                fname = os.path.join(folder, f'{i}.jpg')
                img.save(fname)
                i += 1
            except Exception:
                pass
        print(f'Retrieved {i} images for {search_term}')
    else:
        print(f'Failed to retrieve URLs for {search_term}')

Let's use the function above to fetch images for three existing categories in ImageNet: hedgehogs, ostriches, and armadillos. I chose those animals because I find them funny (and, sadly, raccoons aren't part of the original 1,000 categories). Feel free to choose any other categories!

We're saving the images to the `lab3` folder:

In [None]:
targets = ['hedgehog', 'ostrich', 'armadillo']

for term in targets:
    save_images('./lab3', term)

Inside the top folder, `lab3`, each search term will have its own folder

In [None]:
!ls -l ./lab3

Inside each category folder, such as `armadillo`, there will be a collection of sequentially-numbered images:

In [None]:
!ls -l ./lab3/armadillo

Now we're set and we can actually create our `ImageFolder` dataset:

In [None]:
from torchvision.datasets import ImageFolder

dataset = ...
targets = dataset.classes

In [None]:
dataset, targets

Notice that we can pass the transforms function (or model) as an argument to the dataset, so it outputs preprocessed images out-of-the-box. Moreover, we're reassinging the targets because the `ImageFolder` dataset uses the alphabetically-ordered folders inside the top folder to numerically-encode the categories.

In [None]:
x, y = dataset[0]
targets[y], x

Clearly, this is a preprocessed image of an armadillo since there are no actual pixel values (in the [0, 255] range) in it.

### 5.3.5 Making Predictions

Now, let's use the pretrained model you've already loaded to predict which category the image above belongs to:

In [None]:
mini_batch = x.unsqueeze(0)
mini_batch

In [None]:
# The mini-batch above has a single data point
# Call the model and get the corresponding predictions(logits)
logit = ...

# Fetch the index of the largest logit
idx = ...

# Find the corresponding category
categories[idx]

That can't be right, what if we try it one more time?

In [None]:
# You can either re-run the cell above, or copy and paste it here first, and run this cell instead

It's very likely that, not only you'll get a wrong prediction again, but yet a DIFFERENT wrong prediction. Perhaps you've figured it out that I (purposefully) forgot to set the model to evaluation mode. That shoudl fix it:

In [None]:
# Set the model to evaluation mode
# write your code here
...


# Then find the predicted category as above
logit = ...
idx = ...
categories[idx]

#### 5.3.5.1 Dropout

The behavior above is due to the existence of dropout layers. Dropout is probabilistic in nature, that is, it will randomly drop some of the inputs to force the model to learn more than one way to achieve its target, thus working as a regularizer. 

The idea behind regularization is that, if left unchecked, a model will learn the "easy way out" of its problem, so forcing it to work with a random subset of features should reduce overfitting and improve generalization. In other words, the model needs to learn how to handle a distribution of values that is centered at the value the output would have if there was no dropout. That works really well, and many models have dropout layers to make them more robust during training.

Let's illustrate this with a dummy model that contains only one dropout layer:

In [None]:
import torch.nn as nn

dropping_model = nn.Sequential(nn.Dropout(p=0.5))

Now, let's create some random input for it:

In [None]:
random_input = torch.randn(10)
random_input

What happens to these inputs once they go through the dropout model?

In [None]:
dropping_model.train()
output_train = dropping_model(random_input)
output_train

On average, half of the values should have been dropped. Don't forget that dropout is probabilistic, so you may get three, or maybe seven, or perhaps four, or exactly five zeros. If you run it a large number of times, the average number of dropped points should be five (since the probability is 0.5).

The problem is, we cannot keep this behavior once you deploy the model, otherwise our users will get different predictions for the same input, as we've just seen above. Thus, dropout does not really drop anything once the model is switched to evaluation mode.

In [None]:
dropping_model.eval()
output_eval = dropping_model(random_input)
output_eval

In evaluation mode, nothing gets dropped!

#### 5.3.5.2 Probabilities

We can also use the softmax function to convert logits into probabilities:

In [None]:
import torch.nn.functional as F
probabilities = F.softmax(logit, dim=0)
probabilities

Or, better yet, use PyTorch's `topk` method to get the top K values only together with their corresponding indices:

In [None]:
values, indices = torch.topk(probabilities, 1)
values, indices

The target or label is the class corresponding to the index above:

In [None]:
categories[indices[0]]

In a real-world deployment, though, you won't have the input data neatly assembled as a dataset. You will have to create a mini-batch of user's input data, feed it to the model to get its predicted logits, and then convert them into one or more predictions and probabilities that need to be returned to the user.

#### 5.3.5.3 Testing

Write a function that takes either an URL or a filepath, a model, its prescribed transformations, and a list of target categories, and returns a list of the top K predictions:

In [None]:
def predict(path_or_url, model, transforms_fn, categories, topk=1, headers=None):
    if path_or_url.startswith('http'):
        img = get_image_from_url(path_or_url, headers=headers)
    else:
        img = Image.open(path_or_url)
        
    # Apply the transformation to the image
    preproc_img = ...
    
    # If the transformation doesn't return a mini-batch
    # We make one ourselves by unsqueezing the first dimension
    if len(preproc_img.shape) == 3:
        preproc_img = preproc_img.unsqueeze(0)
    
    # Set the model to evaluation mode
    # write your code here
    ...
    
    device = next(model.parameters()).device
    
    # Make predictions (logits)
    pred = model(preproc_img)
    
    # Compute probabilities out of the predicted logits
    # and then get the topk values and indices
    probabilities = ...
    values, indices = ...
    
    return [{'label': categories[i], 'value': v.item()} for i, v in zip(indices, values)]

Use the metadata from your model's weights as arguments to the function you wrote:

In [None]:
transforms_fn = ...
categories = ...

# Call the predict function on an image you download, for example ./lab3/ostrich/0.jpg
# write your code here
...

Let's make a prediction using an image's URL:

In [None]:
url = 'https://upload.wikimedia.org/wikipedia/commons/c/ce/Daisy_G%C3%A4nsebl%C3%BCmchen_Bellis_perennis_01.jpg'
# Complying with Wikimedia User Agent's policy: https://meta.wikimedia.org/wiki/User-Agent_policy
headers = {'User-Agent': 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'}

# Call the predict function on an URL of an image, like the one above
# Don't forget to pass the headers as argument
# write your code here
...