# Thinking in tensors, writing in PyTorch

Hands-on training by [Piotr Migdał](https://p.migdal.pl), part of [Thinking in tensors, writing in PyTorch](https://github.com/stared/thinking-in-tensors-writing-in-pytorch).
This notebook was supported by New Trends in Machine Learning by The University of Silesia in Katowice (2019).


## Extra: Using an ImageNet-pretrained model


<a href="https://colab.research.google.com/github/stared/thinking-in-tensors-writing-in-pytorch/blob/master/convnets/Using%20an%20ImageNet-pretrained%20model.ipynb" target="_parent">
    <img src="https://colab.research.google.com/assets/colab-badge.svg"/>
</a>

Do you want to use deep learning (so called "AI") to detect cats and dogs in a picture?
Well, you can use one of many pre-trained ImageNet networks!

I see there are many tutorials on:

* training convolutional neural networks from scratch,
* using a pre-trained neural network to detect new objects.

But let's do something simpler - using a ready network. No training or tweaking needed!
Before we go, let's play with [some browser-based demos](http://p.migdal.pl/interactive-machine-learning-list/), in this case [SqueezeNet v1.1 in Keras.js](https://transcranial.github.io/keras-js/#/squeezenet-v1.1) (depicted below) or [in ONNX.js](https://microsoft.github.io/onnxjs-demo/#/squeezenet).

[![](imgs/squeezenet_fox_kerasjs.png)](https://transcranial.github.io/keras-js/#/squeezenet-v1.1)


What is **ImageNet**, anyway?

> The classification task is made up of 1.2 million images in the training set, each labeled with one of 1000 categories that cover a wide variety of objects, animals, scenes, and even some abstract geometric concepts such as “hook”, or “spiral”. - [What I learned from competing against a ConvNet on ImageNet](http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/) by Andrej Karpathy (2014) 

### Further references

* [ImageNet hierarchy diagram](https://observablehq.com/@mbostock/imagenet-hierarchy) by Mike Bostock (I created [a bit racy version of that (NSFW!)](https://observablehq.com/@stared/tree-of-reddit-sex-life))
* [ImageNet Neural Network Architectures](https://towardsdatascience.com/neural-network-architectures-156e5bad51ba) by Eugenio Culurciello (with their performance and sizes)
* [Measuring the Progress of AI Research](https://www.eff.org/ai/metrics) by Electronic Frontier Foundation
* [State of the Art - ImageNet Image Classification](https://paperswithcode.com/sota/image-classification-on-imagenet) by Papers with Code
* [ImageNet API](http://image-net.org/download-API) (hierarchy, examples, etc)
 
### Outline

In this notebook we show how to (using PyTorch):

* Load a pre-trained ImageNet model
* Load a picture
* Pass a picture trough a neural network to make the predictions...
* ...and make sense of that

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from PIL import Image
import requests
from IPython import display
from io import BytesIO

import torch
from torchvision import models, transforms

## Loading a pre-trained model

We can look from scripts scattered over different GitHub repositories. Fortunately, for ImageNet some models are built-in PyTorch.

* OLD: [torchvision.models](https://pytorch.org/docs/stable/torchvision/models.html) (AlexNet, VGG, ResNet, SqueezeNet, DenseNet, Inception v3, GoogLeNet, ShuffleNet v2, MobileNet v2, ResNeXt)
* NEW: [PyTorch Hub](https://pytorch.org/hub), also with other models, including [Natural Language Processing with GPT-2](https://pytorch.org/hub/huggingface_pytorch-pretrained-bert_gpt2/); see blog post [Towards Reproducible Research with PyTorch Hub](towards-reproducible-research-with-pytorch-hub/) (10 June 2019)

Is is important that we: 

* load it with pretrained weights with `pretrained=True` (as opposed to only their architecture); note that some models are heavy (VGG16 weights approximately 500MB)
* set it to evaluate mode with `.eval()` (some layers such as dropout or batch normalization work differently for training and evaluation).
* move it to GPU `.to('cuda:0')` (but only if we have a CUDA-enabled GPU)

With the last one we shouldn't be worried. While there is a significant speedup for using GPU, for prediction we should be fine. We use SqueezeNet v1.1 as it is small and fast.

In [None]:
# we load the model with pretrained weights and set in in the eval mode
model = models.squeezenet1_1(pretrained=True).eval()

In [None]:
# if we have GPU, let's use that!
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)
device

## Loading images

Loading images is a simple step, once you know the basics (e.g. that PyTorch uses [Pillow (PIL)]((https://pillow.readthedocs.io/)) to preprocess images).

* use PIL to load an image 
* use `transforms` to prepare it as a suitable tensor:
    * resize and crop to match size with a network input (here: 224 x 224) 
    * scale colors
* stack it into a 4-dimensional tensor (`batch x channels x width x height`)

Even if we use a single image we need to use a batch (sample) of size 1.

In [None]:
#img_path = "../imgs/dog.jpeg"
img_path = "https://raw.githubusercontent.com/stared/thinking-in-tensors-writing-in-pytorch/master/imgs/dog.jpeg"

if ":" in img_path:
    response = requests.get(img_path)
    img = Image.open(BytesIO(response.content))
else:
    img = Image.open(img_path)

img

In [None]:
# we need to transform data for these models
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop((224,224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

In [None]:
# the network sees
transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop((224,224))
])(img)

In [None]:
# 1 image x 3 channels (RGB) x 224 x 224 pixels
img_tensors = transform(img).unsqueeze(dim=0).to(device)
img_tensors.size()

## Using model and making sense of it

Finally, we can use a model!

* it works with 4 dimensional tensors
* if we need to pass data preprocessed (if we miss that it will perform worse, due to effectively seeing different colors, contracts or scale)

And for all that we are rewarded with:

* 1000 numbers per input image, no labels
* ...and not even probabilities, but logits

In [None]:
# processing an image with the model 
pred_logits_tensor = model(img_tensors)

In [None]:
# we get many numbers
print(pred_logits_tensor.size())
pred_logits_tensor[:,:10]

In [None]:
# to turn them into probabilities we need to perform softmax
# then to use as a NumPy array we need to transfer it to CPU and convert
pred_probs = pred_logits_tensor.softmax(dim=1).cpu().data.numpy()

In [None]:
pred_probs[:,:10]

OK, but what do these number mean? Well, without any legend we can guess.

* http://www.image-net.org/synset?wnid=n01440764
* http://www.image-net.org/api/text/imagenet.synset.geturls?wnid=n04154340 for URLs to images with a given class

Fortunately, there is an [imagenet_class_index.json](https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json) that I preprocessed below.

In [None]:
# locally can be "../data/imagenet_classes.csv"
imagenet_classes = pd.read_csv("https://raw.githubusercontent.com/stared/thinking-in-tensors-writing-in-pytorch/master/data/imagenet_classes.csv", index_col='id')
imagenet_classes.head()

In [None]:
# let's add our predictions and show the most probable classes
imagenet_classes['prediction'] = pred_probs[0]

imagenet_classes.sort_values(by='prediction', ascending=False).head(10)

In [None]:
def show_predictions_visually(img, imagenet_classes_with_preds):

    fig, (ax0, ax1) = plt.subplots(nrows=1,ncols=2, figsize=(7, 4))

    ax0.imshow(img)

    top_preds = imagenet_classes_with_preds.set_index('name')['prediction']
    top_preds = top_preds.sort_values(ascending=False).head(10)
    top_preds *= 100
    top_preds.index.name = ""
    top_preds.plot.bar(ax=ax1)

    fig.tight_layout()

In [None]:
show_predictions_visually(img, imagenet_classes)

## Let's wrap it... in a function

In [None]:
def show_predictions(img_path,
                     visually=True,
                     imagenet_classes_path="https://raw.githubusercontent.com/stared/thinking-in-tensors-writing-in-pytorch/master/data/imagenet_classes.csv"):
    
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    
    model = models.squeezenet1_1(pretrained=True).eval().to(device)
    
    transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop((224,224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225])
    ])
    
    if ":" in img_path:
        response = requests.get(img_path)
        img = Image.open(BytesIO(response.content))
    else:
        img = Image.open(img_path)
    
    img_tensor = transform(img).unsqueeze(dim=0).to(device)
    
    pred_logits_tensor = model(img_tensor)
    pred_probs = pred_logits_tensor.softmax(dim=1).cpu().data.numpy()
    
    imagenet_classes = pd.read_csv(imagenet_classes_path, index_col='id')
    imagenet_classes['prediction'] = pred_probs[0]
    
    if visually:
        return show_predictions_visually(img, imagenet_classes)
    else:
        return imagenet_classes.sort_values(by='prediction', ascending=False).head(10)

In [None]:
# locally can be "../imgs/dog.jpeg"
show_predictions("https://raw.githubusercontent.com/stared/thinking-in-tensors-writing-in-pytorch/master/imgs/dog.jpeg")

In [None]:
show_predictions("http://farm1.static.flickr.com/106/284682545_454d85f1b2.jpg",
                 visually=True)

In [None]:
show_predictions("https://live.staticflickr.com/8101/8557163376_ca33f48840_b.jpg",
                 visually=True)

## OK, what's next?


### Playing with it

What would happen if you:

* use different images?
* use different ImageNet networks?
* use different resize procedure (e.g. only `transforms.Resize((224, 224))`)?
* execution time of a model depending if it uses CPU or GPU


### Let's break things!

What would happen if you:

* remove `transforms.Normalize`?
* set `pretrained=False`?


### Next tutorials

* Transfer learning - use a network to detect own classes!
* Data augumentation - learn how to pre-process data

(Both are Work in Progress in https://github.com/stared/thinking-in-tensors-writing-in-pytorch/tree/master/extra)


### Footnote

Brought to you by [Thinking in tensors, writing in PyTorch](https://github.com/stared/thinking-in-tensors-writing-in-pytorch) by Piotr Migdał. Follow me [@pmigdal](https://twitter.com/pmigdal).

