# Lecture 3.16: Computer Vision Pt. 2

[**Lecture Slides**](https://docs.google.com/presentation/d/11OZUT30pmz3N45cch9wzwjYZy-Xn46Fsk5q_QR0dtYU/edit?usp=sharing)

This lecture, we are going to fine-tune a deep Convolutional Neural Network (CNN) in pytorch.

**Learning goals:**
- classify images using a pre-trained model from torchvision
- preprocess images with `torchvision.transforms`
- differentiate between `model.train()` and `model.eval()` modes
- apply image data augmentation
- fine-tune a pre-trained model to a binary image classification task
- predict using a fine-tuned model

## 1. Introduction

We seek to soothe these times of uncertainty with a sprinkle of sweetness. ✨ We want to open a dessert shop. We are now faced with a difficult decision: waffle 🧇, or ice cream 🍦 ? These two treats are unequivocally delicious, but which is the most irresistible? This critical choice warrants some market research. We turn to the internet, but closer inspection of instagram posts reveals _millions_ of mouth-watering pictures to sift through. It's difficult to assess what the world needs most right now! 

To help analyse these enticing images, we decide to channel the power of _computer vision_. ⚡️ In particular, we learned how modern CV machine learning models are _pre-trained_ on ImageNet, a large image classification dataset. These can then be _fine-tuned_ to solve specific tasks, with _transfer learning_. Let's try it out, and create a waffle/ice cream image classifier 🍴

To ensure that this notebook can run on any hardware, locally or on the cloud, we assign the correct `device` to pytorch:

In [0]:
import torch

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

This notebook will extensively use the [`torchvision`](https://github.com/pytorch/vision) library. It is maintained by pytorch, and according to them:

> The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision.

The dependency was added to this repo's `Pipfile`, so if running locally, remember to execute:

    pipenv install

## 2. ImageNet Predictions

### 2.1 Dogs

Before we jump into fine-tuning and optimization, let's try to use a state-of-the-art model as is. Most of these models were pre-trained on ImageNet with 1000 classes. This should be broad enough to classify simple common images, for example a _dog_.

We choose [ResNet-50](https://arxiv.org/abs/1512.03385) (although you could try any other!). `torchvision` makes it easy to download the neural network and its pre-trained weights:



In [0]:
from torchvision import models

model = models.resnet50(pretrained=True)
model = model.to(device)

🧠🧠 Can you list what's special about the ResNet architecture?  (this [blogpost](https://towardsdatascience.com/illustrated-10-cnn-architectures-95d78ace614d#e4b1) might help)

ResNet-50 is the first _deep_ learning model of this course. 50 stands for 50 layers, and we can count its parameters:

In [0]:
sum(p.numel() for p in model.parameters() if p.requires_grad)

25.6 _million_ parameters 🤯 Good thing we don't need to train this beast from scratch!

We want to test the ResNet-50, but we need an image to classify first. This adorable dog will do nicely:

In [0]:
from PIL import Image
!wget --quiet https://github.com/pytorch/hub/raw/master/dog.jpg
dog_img = Image.open('dog.jpg')
print(f'Image size: {dog_img.size}')
dog_img

Before we can feed this cutie to the ResNet-50 monster however, it needs to be converted into a pytorch `Tensor`.

Last lecture, we manually transformed images into `ndarray`s. The task was manageable because our all our inputs were already 64x64 grayscale icons. This time, the sizes are inconsistent:
- most `torchvision` pre-trained models require 3x224x224  inputs
- our doggo is a 3x1546x1213 image

Since we don't want to implement a cropping function ourselves, we can use `torchvision`'s transform package to help out. It provides a convenient `CenterCrop` class to reduce our doggo to a 224x224 image:

In [0]:
from torchvision import transforms

transforms.CenterCrop(224)(dog_img)

Ah. 😑 This might be tricky to classify as a dog... Instead, we can first _resize_ the image, and then _crop_ small edges to make it square:

In [0]:
dog_img_resize = transforms.Resize(256)(dog_img)
transforms.CenterCrop(224)(dog_img_resize)

What a good dog! 🐶 

On top of this resizing, `torchvision` models are trained on a _normalized_ version of ImageNet, meaning that we must _feature scale_ our inputs. This can be done with `transforms.Normalize()`, with the channel means and standard deviations specified in the [documentation](https://pytorch.org/docs/stable/torchvision/models.html#classification). 

However chaining all these transforms can get messy, so in the spirit of sklearn's `Pipeline`, we stack them in a [`Compose`](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Compose) class:

In [0]:
from torchvision import transforms
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

input_tensor = preprocess(dog_img)
input_tensor.shape

`transforms.ToTensor` even did the `ndarray` to `Tensor` conversion for us! All there is left to do is to reshape the `Tensor` , and send it to the `device`:

In [0]:
input_batch = input_tensor.unsqueeze(0)
input_batch = input_batch.to(device)
print(input_batch.shape)

🧠 What do the dimensions in our 1x3x224x224 input represent?

Our data is ready, but there's one last thing to take care of before prediction.

Complex models often use advanced regularization methods, such a [batch normalization](https://youtu.be/nUUqwaxLnWs) or [dropout](https://youtu.be/ARq74QuavAo). These affect the neural network _optimization_ , but aren't welcome during _prediction_. Instead of manually deactivating these behaviours, the `nn.Module` base class provides a method to switch between training & evaluation modes: `model.train()` and `model.eval()`.

To make sure that our Resnet-50 classifies our dog image correctly, we have to switch the model to evaluation mode. We also use `torch.no_grad()` to prevent pytorch from tracking unnecessary `.grad_fn`, and we're good to go!

In [0]:
model.eval()
with torch.no_grad():
  output = model(input_batch)
  print(output.shape)

1000 outputs? 🤔 that's because of the 1000 ImageNet classes this model was trained on. Just like last lecture's CNN, the `torchvision` ResNet-50 used `CrossEntropyLoss` for improved numerical stability. To turn these outputs into probabilities, we must therefore apply a _softmax_. To turn these probabilities into predictions, we must follow with an _argmax_ operation to pick the most likely class.

In [0]:
probas = torch.softmax(output, dim=1)
prob, pred = torch.max(probas, dim=1)
print(f'ResNet-50 prediction: {pred.item()}, probability: {prob.item()}')

Our model is 87% confident about its choice... but what's a `258` though? 🤨 

we have to convert this index to its class name! Let's use a ready-made json dictionary so we don't have to download all of ImageNet.

In [0]:
!wget --quiet https://raw.githubusercontent.com/raghakot/keras-vis/master/resources/imagenet_class_index.json

import json

with open('imagenet_class_index.json') as f:
  class_dict = json.load(f)
print(class_dict["99"])

Index `99` is a `goose`, and index `258` is ...

In [0]:
pred_string = class_dict[str(pred.item())][1]
print(f'ResNet-50 prediction: {pred_string}, probability: {prob.item()}')

A Samoyed! A quick google search confirms that not only is a Samoyed a dog, it's the exact breed shown in the picture! 🐕

### 2.2 Ice cream

Classifying a dog is pretty impressive, but we're interested in ice cream and waffles. A `waffle_or_ice_cream` dataset is available on the course's S3 bucket, so we download it and extract its contents:

In [0]:
!wget --quiet https://introduction-to-machine-learning-ilia-university.s3.eu-west-2.amazonaws.com/waffle_or_ice_cream.tar.gz
!tar -xf waffle_or_ice_cream.tar.gz
!ls waffle_or_ice_cream

Two directories, `train` and `test`. Since we're interested in prediction, let's leave the `train` folder aside for now, and test our ResNet-50 on this ice cream picture:

In [0]:
ice_cream_path = 'waffle_or_ice_cream/test/ice_cream.jpg'
Image.open(ice_cream_path)

To avoid copy pasting code everywhere,  we wrapped the preprocessing and prediction in a function. Also we added a `print_top_predictions` function to return the top n probabilities of the network outputs:

In [0]:
import os

def predict_imagenet(path, model, top_n=None):
  input_image = Image.open(path)
  preprocess = transforms.Compose([
      transforms.Resize(256),
      transforms.CenterCrop(224),
      transforms.ToTensor(),
      transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
  ])
  input_tensor = preprocess(input_image)
  input_batch = input_tensor.unsqueeze(0)
  input_batch = input_batch.to(device)

  model.eval()
  with torch.no_grad():
    outputs = model(input_batch)
    probas = torch.nn.functional.softmax(outputs, dim=1)
    prob, index = torch.max(probas, 1)
    pred = class_dict[str(index.item())][1]
    filename = os.path.basename(path)
    print(f'ResNet-50 file: {filename}, prediction: {pred}, probability: {prob.item()}')
    if top_n:
      print_top_predictions(probas[0], top_n)
    return input_image
  

def print_top_predictions(probas, top_n):
  probas, indices = torch.sort(probas, descending=True)
  sorted_probas_indices = list(zip(probas, indices))
  print(f'top {top_n} predictions:')
  for prob, index in sorted_probas_indices[:top_n]:
    pred = get_class_name_imagenet(index.item())
    print(f'prediction: {pred}, probability: {prob.item()}')

def get_class_name_imagenet(index):
  return class_dict[str(index)][1]

predict_imagenet(ice_cream_path, model, 5)

99% ice cream! ResNet-50 is showing some promise, and might be useful for our market research. 😏

🧠 Can you describe how the `print_top_predictions()` function works?

### 2.3 Waffles

Let's check if the model is also fond of waffles.

In [0]:
waffle_path = 'waffle_or_ice_cream/test/waffle.jpg'
predict_imagenet(waffle_path, model, 5)

... `waffle_iron`? Close, but it's not quite what we're looking for. If you check the [list of ImageNet classes](https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a),  you'll find that there is no `waffle` class ... only `waffle_iron`! Even though there is no iron in the picture, the model has clearly _associated_ the waffle visuals with this class. On one hand, this is an interesting insight into the limitations of A.I. One the other, it means that we can't use this model for our market research. 😓

This is typical of large pre-trained models. They are optimized on academic datasets, and can rarely be used out of the box. This is when _fine-tuning_ comes in. 😎

## 3. Fine-tuning

Let's try to fine-tune ResNet-50 with our `waffle_or_ice_cream` dataset.

### 3.1 Data Munging


In [0]:
!ls waffle_or_ice_cream/train

Inside the `train` directory, we find an `ice_cream` and a `waffle` folder. It is common to have different image classes split into named directories. So common in fact, that `torchvision` includes a `Dataset` implementation just for this usecase. Instead of manually loading the images and giving them labels, we simply pass the main directory as argument to [`ImageFolder`](https://pytorch.org/docs/stable/torchvision/datasets.html#imagefolder):

In [0]:
from torchvision import datasets

train_dir = 'waffle_or_ice_cream/train'
img_dataset = datasets.ImageFolder(train_dir)
img_dataset

The `ImageFolder` has found the 694 images, but hasn't loaded them in memory yet. This is a python `iterable`, which we can use in a for-loop, or turn into an `iterator`:

In [0]:
first_element = next(iter(img_dataset))
first_element

`ImageFolder` elements are typles of features (image) & labels (integer class index). It also stores the string class names into a list:

In [0]:
class_names = img_dataset.classes
class_names

Which allows us to format an example as such:

In [0]:
img1, label1 = first_element
print('First example')
print(f'Class: {img_dataset.classes[label1]}')
img1

`torchvision` pre-trained models expect normalized 3x224x224 images, but our ice creams and waffles are of many different shapes, sizes, and scales. Last section, we used `torchvision.transforms` to simplify the image preprocessing. We'll do the same for this training dataset, with an added twist.

There are only 700 data points in our waffle or ice cream dataset. We wish to _augment_ our dataset by adding _randomness_. 🎲 Once again, `torchvision` offers two transforms to make this easy:
- [`RandomResizedCrop()`](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.RandomResizedCrop)
- [`RandomHorizontalFlip()`](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.RandomHorizontalFlip)

They both work as their names suggest, but feel free to check out the documentation for more details (and see [albumentations](https://github.com/albumentations-team/albumentations) for more advanced image augmentation).

We can stack these two transforms as part of our preprocessing pipeline:

In [0]:
from torchvision import transforms

train_preprocess = transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

And then apply our `train_preprocess` directly into the `ImageFolder`. Neat! 👌

In [0]:
img_dataset = datasets.ImageFolder(train_dir, train_preprocess)
img_dataset

Despite the randomness, the length of our dataset is still 695. This is because the randomness is applied _lazily_ as we step through the examples.

`Dataset`s feed in nicely with `DataLoader`s, which means our training data is ready for fine-tuning:

In [0]:
dataloader = torch.utils.data.DataLoader(img_dataset, batch_size=4, shuffle=True, num_workers=4)

### 3.2 Optimization

We could reuse the ResNet-50 from earlier in this notebook, but we chose to downgrade to ResNet-18, which "only" has 18 layers. This is because our problem and dataset size is similar to the [bees and ants](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html) fine-tuning tutorial, and they use `resnet18`. Typically, using very large models with small datasets leads to overfitting, so going big isn't always the right choice.

In [0]:
from torchvision import models

model_ft = models.resnet18(pretrained=True)

Before we can fine-tune this neural network, we have to adapt the last classification layer to our task. 
We know that ResNet (and CNNs in general) use a _fully-connected layer_ as final output layer. Pre-trained `torchvision` models were optimized on 1000 class ImageNet classification, so their last layer has 1000 outputs. We are dealing with binary classification, so we must replace it with a 1 neuron output layer.

`nn.Linear` requires two arguments, the input dimension, and the output dimension. We need to know how many outputs are returned by the penultimate layer to construct our new `nn.Linear`. This is done by returning the `.in_features` field (input features) of the `fc` field (fully connected), of our `model_ft`.

In [0]:
import torch.nn as nn

num_ftrs = model_ft.fc.in_features
num_ftrs

This final layer maps 512 activations to the outputs, so in the case of our waffle/ice-cream classification, 512 inputs to 1 output. We can replace the model's fully connected layer directly 🔄:

In [0]:
model_ft.fc = nn.Linear(num_ftrs, 1)

🧠 Take the time to understand how we replaced the ResNet-18's last layer, and _why_.

Let's not forget to send the model parameters to the correct `device`:

In [0]:
model_ft = model_ft.to(device)

We'll use the numerically stable `BCEWithLogitsLoss`, and a simple `SGD` optimizer with momentum.

In [0]:
import torch.optim as optim

criterion = nn.BCEWithLogitsLoss()
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

The two new additions to our training loop are:
- `model.train()` to activate ResNet's batch normalization, acts like the opposite of `model.eval()`
- `with torch.set_grad_enabled(True):` necessary to _unfreeze_ the pretrained model weights, acts like the opposite of `with torch.no_grad()`

In [0]:
import numpy as np
import time

def train_model(model, criterion, optimizer, num_epochs=20):

    torch.manual_seed(1337)
    np.random.seed(666)

    start_time = time.time()
    epoch_losses = []

    # enable batch normalization
    model.train()

    for epoch in range(num_epochs):
        running_losses = []

        for inputs, labels in dataloader:
            inputs = inputs.to(device)
            # pytorch likes floats
            labels = labels.float().unsqueeze(1).to(device)
            optimizer.zero_grad()
            # unfreeze pretrained model
            with torch.set_grad_enabled(True):
                outputs = model(inputs)
                loss = criterion(outputs, labels)
                loss.backward()
                optimizer.step()
            running_losses.append(loss.item())
        
        epoch_loss = np.array(running_losses).mean()
        epoch_losses.append(epoch_loss)
        print(f'epoch: {epoch}, loss: {epoch_loss:4f}')

    training_time = time.time() - start_time
    print(f'Training complete in {training_time//60:.0f}m {training_time%60:.0f}s')
    
    return model, epoch_losses

We are ready to train! 🏋️‍♀️

In [0]:
model_ft, epoch_losses = train_model(model_ft, criterion, optimizer_ft, num_epochs=20)

import matplotlib.pyplot as plt

fig = plt.figure(dpi=120)
ax = fig.add_subplot()
ax.plot(epoch_losses)
ax.set_ylabel('loss')
ax.set_xlabel('epochs')
ax.set_title('Loss Curve');

The loss decreases, but it's not the prettiest optimization. 😐 That's often the deal with the highly non-convex loss surfaces of deep learning!

We _are_ however using regular mini-batch gradient descent with momentum. We can do better than that! The loss is very _unstable_ towards the end of training, which might suggest that the learning rate is too high.

🧠 Why can a high learning rate prevent the loss from converging?

We learned in lecture 3.13 that _learning rate decay_ can help with this problem. In pytorch, this can be done with a learning rate scheduler, such as [`StepLR`](https://pytorch.org/docs/stable/optim.html#torch.optim.lr_scheduler.StepLR). 

💪💪 Fine-tune ResNet-18 with learning rate decay.
- rewrite the `train_model()` to also include a `StepLR`. check the [documentation](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate) for more details.
- try hyperparameters `step_size=4`, `gamma=0.2` (or others!)
- reload a `resnet18` model from `torchvision`. Don't forget to modify the last layer
- train this new model
- plot the loss curve

In [0]:
# INSERT YOUR CODE HERE



🧠 Has the loss curve improved? Is this what you expected? 

🧠🧠 What other factors can explain the variance of this loss curve?

## 4. Waffle Or Ice Cream Prediction

Now that we have fine-tuned our ResNet-18 model, let's check if it fares better than ResNet-50 on our classification task. We rewrite a `predict()` function inspired by that of section 2, with a few differences.

🧠 Why are we using `torch.sigmoid()` instead of `torch.softmax()` here?

🧠 Can you explain how the `get_class_name()` function works?

In [0]:
import os

def predict_waffle_or_ice_cream(path, model):
  input_image = Image.open(path)
  preprocess = transforms.Compose([
      transforms.Resize(256),
      transforms.CenterCrop(224),
      transforms.ToTensor(),
      transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
  ])
  input_tensor = preprocess(input_image)
  input_batch = input_tensor.unsqueeze(0)
  input_batch = input_batch.to(device)

  model.eval()
  with torch.no_grad():
    outputs = model(input_batch)
    prob = torch.sigmoid(outputs).item()
    pred = get_class_name(prob)
    filename = os.path.basename(path)
    print(f'ResNet-50 file: {filename}, prediction: {pred}, waffle probability: {prob:.4f}')
  return input_image

def get_class_name(prob):
  class_names = img_dataset.classes
  return class_names[1] if prob > 0.5 else class_names[0]

ice_cream_path = 'waffle_or_ice_cream/test/ice_cream.jpg'
predict_waffle_or_ice_cream(ice_cream_path, model_ft)

In [0]:
waffle_path = 'waffle_or_ice_cream/test/waffle.jpg'
predict_waffle_or_ice_cream(waffle_path, model_ft)

Our model has managed to correctly classify both the ice cream and the waffle!

This was possible because we fine-tuned ResNet-18, and leveraged its pre-trained knowledge. Without transfer learning, this task would have taken more resources, more time, and more data.

So, are we ready to take over the world of desserts? 🍨 Some examples aren't as simple as the two images above. No model is perfect, and there are almost always exceptions to the classes we're trying to identify. For our waffle or ice cream problem, let's introduce:  
✨ the waffle cone ✨



In [0]:
waffle_path = 'waffle_or_ice_cream/test/waffle_ice_cream.jpg'
predict_waffle_or_ice_cream(waffle_path, model_ft)

🧠 what was the waffle cone prediction? do you agree with it?

This image is confusing for the model ... and for humans! 🤨 In fact, if you retrain the model several times, you might end up with completely opposite results.

🧠🧠 If we were to use this classifier in a production setting, how would you remedy this instability in prediction? 

## 5. Bonus

We used the pre-trained ResNet-50 on a few dog, ice cream, and waffle images. But it's capable of predicting 1000 classes! You can use your laptop's webcam inside this notebook to take pictures and save them locally. 📸 These can then be classified by the original pre-trained ResNet-50.

In [0]:
from IPython.display import display, Javascript
from google.colab.output import eval_js
from base64 import b64decode

def take_photo(filename='photo.jpg', quality=0.8):
  js = Javascript('''
    async function takePhoto(quality) {
      const div = document.createElement('div');
      const capture = document.createElement('button');
      capture.textContent = 'Capture';
      div.appendChild(capture);

      const video = document.createElement('video');
      video.style.display = 'block';
      const stream = await navigator.mediaDevices.getUserMedia({video: true});

      document.body.appendChild(div);
      div.appendChild(video);
      video.srcObject = stream;
      await video.play();

      // Resize the output to fit the video element.
      google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);

      // Wait for Capture to be clicked.
      await new Promise((resolve) => capture.onclick = resolve);

      const canvas = document.createElement('canvas');
      canvas.width = video.videoWidth;
      canvas.height = video.videoHeight;
      canvas.getContext('2d').drawImage(video, 0, 0);
      stream.getVideoTracks()[0].stop();
      div.remove();
      return canvas.toDataURL('image/jpeg', quality);
    }
    ''')
  display(js)
  data = eval_js('takePhoto({})'.format(quality))
  binary = b64decode(data.split(',')[1])
  with open(filename, 'wb') as f:
    f.write(binary)
  return filename

In [0]:
import IPython

try:
  filename = take_photo()
  print('Saved to {}'.format(filename))
  
  # Show the image which was just taken.
  display(IPython.display.Image(filename))
except Exception as err:
  # Errors will be thrown if the user does not have a webcam or if they do not
  # grant the page permission to access it.
  print(str(err))

In [0]:
predict_imagenet('photo.jpg', model, 5)

Here's my attempt :) 

In [0]:
!wget --quiet https://introduction-to-machine-learning-ilia-university.s3.eu-west-2.amazonaws.com/wine_bottle.jpg
predict_imagenet('wine_bottle.jpg', model, 5)

## 6. Summary

Today we learned about **modern computer vision models** and **transfer learning**. We first revisited the **history of CNNs** to understand the improvements that led to the current state of the art. We pointed out a trend of **wider**, **deeper** models, and **larger** datasets, and noted that restricts most data scientists from training modern CV models from scratch. We explored the **hidden representations** of deep CNNs and understood the relationship between the **depth**, **specifity**, and **power** of the layers. After noting that large CNNs learned **similar** low-level features for each training, we defined **transfer learning** as techniques that **share** this information between models for similar tasks. We described two methods: **feature extraction** and **fine-tuning**, and showed how they can help create accurate CV models with **smaller datasets** and **less hardware**. We highlighted how this played into the unique machine learning **open-source** ecosystem, which democratises knowledge and tools to build cutting edge A.I systems. We reviewed a few pro-tips for CV, and explained how **data augmentation** can help further squeeze learnable information out of a dataset. Finally, we applied transfer learning to computer vision by fine-tuning a **ResNet-18** model pre-trained on ImageNet, to a waffle or ice cream image classification problem.

# Resources

## Core Resources

- [Transfer learning tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html)  
Official pytorch tutorial that this notebook was based on
- [cs231n - transfer learning](https://cs231n.github.io/transfer-learning/)  
Succint and practical notes on transfer learning from Karpathy's classic computer vision course
- [torchvision](https://github.com/pytorch/vision)  
torchvision github repository
- [Waffle or ice cream dataset](https://www.kaggle.com/sapal6/waffles-or-icecream)  
Kaggle dataset used to make this notebook

### Additional Resources

- [Towards reprodubility with pytorch hub](https://pytorch.org/blog/towards-reproducible-research-with-pytorch-hub/)  
Pytorch hub introduction
- [Finetuning torchvision models](https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html)  
More in depth official pytorch tutorial showing how to fine-tune _all_ torchvision pre-trained models
- [CNN architectures](https://towardsdatascience.com/neural-network-architectures-156e5bad51ba)  
Breakdown of major CNN architectures.
- [Illustrated 10 CNN architectures](https://towardsdatascience.com/illustrated-10-cnn-architectures-95d78ace614d)  
Illustrated breakdown of CNN architectures
- [Feature visualization in NNs](https://distill.pub/2017/feature-visualization/)  
Article visually representing the hidden representations of GoogLeNet
- [albumentations](https://github.com/albumentations-team/albumentations)  
Image augmentation library, part of the pytorch ecosystem
- [The data that changed the world](https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/)  
Quartz article about the disruptive effect of ImageNet on CV and A.I
- [Why does batch normalization work](https://youtu.be/nUUqwaxLnWs)  
Deeplearning.ai video providing intuition on why batch normalization works
- [ResNets](https://www.coursera.org/lecture/convolutional-neural-networks/resnets-HAhz9)  
deeplearning.ai lecture on ResNets and skip connections
- [Understanding dropout](https://youtu.be/ARq74QuavAo)  
deeplearning.ai video provising intuition on why dropout works 