# **Pretrained CNN models**
Pretrained CNN models are convolutional neural networks that have been trained on large datasets and can be fine-tuned for specific tasks. These models have already learned to recognize general features and patterns in images, making them a good starting point for many computer vision tasks.


**Advantages of Pretrained CNN Models**

**Reduced Training Time:** Pretrained models can be fine-tuned much faster than training a model from scratch.

**Improved Performance:** Pretrained models have already learned to recognize general features and patterns in images, which can improve performance on specific tasks.

**Transfer Learning:**  Pretrained models can be used as a starting point for other tasks, allowing for transfer learning.
Popular Pretrained CNN Models


* **VGG16:** A 16-layer convolutional neural network trained on the ImageNet dataset.
* **ResNet50:** A 50-layer convolutional neural network trained on the ImageNet dataset.
* **InceptionV3:** A 48-layer convolutional neural network trained on the ImageNet dataset.
* **DenseNet121:** A 121-layer convolutional neural network trained on the ImageNet dataset.
* **MobileNetV2:** A lightweight convolutional neural network trained on the ImageNet dataset.

**Applications of Pretrained CNN Models**

* **Image Classification:** Pretrained models can be fine-tuned for image classification tasks.
* **Object Detection:** Pretrained models can be used as a starting point for object detection tasks.
* **Segmentation:** Pretrained models can be used for image segmentation tasks.
* **Image Generation:** Pretrained models can be used for image generation tasks.

**LOADING LIBRARIES**

We are loading the necessary libraries for data analysis, modeling, and visualization. These libraries will be used for data processing, model training, and evaluation.

In [1]:
pip install ISLP



Note: you may need to restart the kernel to use updated packages.


In [2]:
import numpy as np, pandas as pd
from matplotlib.pyplot import subplots
from sklearn.linear_model import \
     (LinearRegression,
      LogisticRegression,
      Lasso)
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import KFold
from sklearn.pipeline import Pipeline
from ISLP import load_data
from ISLP.models import ModelSpec as MS
from sklearn.model_selection import \
     (train_test_split,
      GridSearchCV)

### Torch-Specific Imports
There are a number of imports for `torch`. (These are not
included with `ISLP`, so must be installed separately.)
First we import the main library
and essential tools used to specify sequentially-structured networks.

In [3]:
import torch
from torch import nn
from torch.optim import RMSprop
from torch.utils.data import TensorDataset

There are several other helper packages for `torch`. For instance,
the `torchmetrics` package has utilities to compute
various metrics to evaluate performance when fitting
a model. The `torchinfo` package provides a useful
summary of the layers of a model. We use the `read_image()`
function when loading test images in Section 10.9.4.

In [4]:
from torchmetrics import (MeanAbsoluteError,
                          R2Score)
from torchinfo import summary

The package `pytorch_lightning` is a somewhat higher-level
interface to `torch` that simplifies the specification and
fitting of
models by reducing the amount of boilerplate code needed
(compared to using `torch` alone).

In [5]:
from pytorch_lightning import Trainer
from pytorch_lightning.loggers import CSVLogger

In order to reproduce results we use `seed_everything()`. We will also instruct `torch` to use deterministic algorithms
where possible.

In [6]:
from pytorch_lightning import seed_everything
seed_everything(0, workers=True)
torch.use_deterministic_algorithms(True, warn_only=True)

Seed set to 0


We will use several datasets shipped with `torchvision` for our
examples: a pretrained network for image classification,
as well as some transforms used for preprocessing.

In [7]:
from torchvision.io import read_image
from torchvision.datasets import MNIST, CIFAR100
from torchvision.models import (resnet50,
                                ResNet50_Weights)
from torchvision.transforms import (Resize,
                                    Normalize,
                                    CenterCrop,
                                    ToTensor)

We have provided a few utilities in `ISLP` specifically for this lab.
The `SimpleDataModule` and `SimpleModule` are simple
versions of objects used in `pytorch_lightning`, the
high-level module for fitting `torch` models. Although more advanced
uses such as computing on graphical processing units (GPUs) and parallel data processing
are possible in this module, we will not be focusing much on these
in this lab. The `ErrorTracker` handles
collections of targets and predictions over each mini-batch
in the validation or test stage, allowing computation
of the metric over the entire validation or test data set.

In [8]:
from ISLP.torch import (SimpleDataModule,
                        SimpleModule,
                        ErrorTracker,
                        rec_num_workers)

In addition we have included some helper
functions to load the
`IMDb` database, as well as a lookup that maps integers
to particular keys in the database. We’ve included
a slightly modified copy of the preprocessed
`IMDb` data from `keras`, a separate package
for fitting deep learning models. This saves us significant
preprocessing and allows us to focus on specifying and fitting
the models themselves.

In [9]:
from ISLP.torch.imdb import (load_lookup,
                             load_tensor,
                             load_sparse,
                             load_sequential)

Finally, we introduce some utility imports  not directly related to
`torch`.
The `glob()` function from the `glob` module is used
to find all files matching wildcard characters, which we will use
in our example applying the `ResNet50` model
to some of our own images.
The `json` module will be used to load
a JSON file for looking up classes to identify the labels of the
pictures in the `ResNet50` example.

In [10]:
from glob import glob
import json

This yields approximately two- or three-fold  acceleration for each epoch.
We have protected this code block using `try:` and `except:`
clauses; if it works, we get the speedup, if it fails, nothing happens.

## Using Pretrained CNN Models
We now show how to use a CNN pretrained on the  `imagenet` database to classify natural
images, and demonstrate how we produced Figure 10.10.
We copied six JPEG images from a digital photo album into the
directory `book_images`. These images are available
from the data section of  <www.statlearning.com>, the ISLP book website. Download `book_images.zip`; when
clicked it creates the `book_images` directory. 

The pretrained network we use is called `resnet50`; specification details can be found on the web.
We will read in the images, and
convert them into the array format expected by the `torch`
software to match the specifications in `resnet50`. 
The conversion involves a resize, a crop and then a predefined standardization for each of the three channels.
We now read in the images and preprocess them.

**Data Preprocessing**

**Resizing:** Resize images to 232x232 pixels with antialiasing.

**Cropping:** Crop resized images to 224x224 pixels using center crop.

**Normalization:** Normalize pixel values to have a mean of [0.485, 0.456, 0.406] and a standard deviation of [0.229, 0.224, 0.225].

**Loading and Preprocessing:** Load images, apply resizing, cropping, and normalization, and stack them into a tensor.

**Output:** A tensor of size (num_images, 3, 224, 224) containing the preprocessed images.

In [11]:
resize = Resize((232,232), antialias=True)
crop = CenterCrop(224)
normalize = Normalize([0.485,0.456,0.406],
                      [0.229,0.224,0.225])
imgfiles = sorted([f for f in glob('book_images/*')])
imgs = torch.stack([torch.div(crop(resize(read_image(f))), 255)
                    for f in imgfiles])
imgs = normalize(imgs)
imgs.size()

torch.Size([6, 3, 224, 224])

We now set up the trained network with the weights we read in code block~6. The model has 50 layers, with a fair bit of complexity.

## ResNet-50 Architecture
For this experiment, the ResNet-50 model was used, known for its deep structure and efficiency.
ResNet-50 consists of 50 layers and is built using residual blocks with shortcut connections that allow gradients to flow more easily during backpropagation, solving the vanishing gradient problem commonly found in deep networks.

### Key components:

**Convolutional Layers:**  
It begins with a 7x7 convolution followed by max-pooling to reduce input dimensions.

**Residual Blocks:** The architecture includes 16 residual blocks, each with three convolutions (1x1, 3x3, and 1x1) that allow the model to learn deep features efficiently.

**Bottleneck Layers:** 1x1 convolutions reduce computational complexity while maintaining depth.

**Global Average Pooling:** Averages the features before a fully connected layer for classification.

**Parameters:** The model has around 23 million parameters.

ResNet-50 is well-suited for CIFAR-100 due to its ability to efficiently learn deep features without performance degradation, making it ideal for handling complex image classification tasks.

In [12]:
resnet_model = resnet50(weights=ResNet50_Weights.DEFAULT)
summary(resnet_model,
        input_data=imgs,
        col_names=['input_size',
                   'output_size',
                   'num_params'])

Layer (type:depth-idx)                   Input Shape               Output Shape              Param #
ResNet                                   [6, 3, 224, 224]          [6, 1000]                 --
├─Conv2d: 1-1                            [6, 3, 224, 224]          [6, 64, 112, 112]         9,408
├─BatchNorm2d: 1-2                       [6, 64, 112, 112]         [6, 64, 112, 112]         128
├─ReLU: 1-3                              [6, 64, 112, 112]         [6, 64, 112, 112]         --
├─MaxPool2d: 1-4                         [6, 64, 112, 112]         [6, 64, 56, 56]           --
├─Sequential: 1-5                        [6, 64, 56, 56]           [6, 256, 56, 56]          --
│    └─Bottleneck: 2-1                   [6, 64, 56, 56]           [6, 256, 56, 56]          --
│    │    └─Conv2d: 3-1                  [6, 64, 56, 56]           [6, 64, 56, 56]           4,096
│    │    └─BatchNorm2d: 3-2             [6, 64, 56, 56]           [6, 64, 56, 56]           128
│    │    └─ReLU: 3-3      

We set the mode to `eval()` to ensure that the model is ready to predict on new data.

In [13]:
resnet_model.eval()

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

Inspecting the output above, we see that when setting up the
`resnet_model`, the authors defined a `Bottleneck`, much like our
`BuildingBlock` module.

We now feed our six images through the fitted network.

**Predictions (img_preds) by applying the trained ResNet-50 model to the input images. The model outputs a set of probabilities for each class (in the case of CIFAR-100, 100 possible classes). These predictions are then used to evaluate the performance of the model by comparing them to the ground truth labels, allowing for the calculation of metrics such as accuracy and loss.**

In [14]:
img_preds = resnet_model(imgs)

Let’s look at the predicted probabilities for each of the top 3 choices. First we compute
the probabilities by applying the softmax to the logits in `img_preds`. Note that
we have had to call the `detach()` method on the tensor `img_preds` in order to convert
it to our a more familiar `ndarray`.

In [18]:
img_probs = np.exp(np.asarray(img_preds.detach()))
img_probs /= img_probs.sum(1)[:,None]

In order to see the class labels, we must download the index file associated with `imagenet`. {This is avalable from the book website and  [s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json](https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json).}

In [19]:
labs = json.load(open('imagenet_class_index.json'))
class_labels = pd.DataFrame([(int(k), v[1]) for k, v in 
                           labs.items()],
                           columns=['idx', 'label'])
class_labels = class_labels.set_index('idx')
class_labels = class_labels.sort_index()

We’ll now construct a data frame for each image file
with the labels with the three highest probabilities as
estimated by the model above.

In [20]:
for i, imgfile in enumerate(imgfiles):
    img_df = class_labels.copy()
    img_df['prob'] = img_probs[i]
    img_df = img_df.sort_values(by='prob', ascending=False)[:3]
    print(f'Image: {imgfile}')
    print(img_df.reset_index().drop(columns=['idx']))

Image: book_images/Cape_Weaver.jpg
      label      prob
0   jacamar  0.297499
1     macaw  0.068107
2  lorikeet  0.051104
Image: book_images/Flamingo.jpg
            label      prob
0        flamingo  0.609515
1       spoonbill  0.013586
2  American_egret  0.002132
Image: book_images/Hawk_Fountain.jpg
            label      prob
0            kite  0.184681
1           robin  0.084022
2  great_grey_owl  0.061274
Image: book_images/Hawk_cropped.jpg
            label      prob
0            kite  0.453834
1  great_grey_owl  0.015914
2             jay  0.012210
Image: book_images/Lhasa_Apso.jpg
             label      prob
0            Lhasa  0.260316
1         Shih-Tzu  0.097196
2  Tibetan_terrier  0.032820
Image: book_images/Sleeping_Cat.jpg
         label      prob
0  Persian_cat  0.163069
1        tabby  0.074143
2    tiger_cat  0.042578


We see that the model
is quite confident about `Flamingo.jpg`, but a little less so for the
other images.

We end this section with our usual cleanup.

### Based on the provided predictions and probabilities for each image, here are the observations:

**Image: Cape Weaver**

- The model's top prediction is jacamar with a probability of 29.75%, followed by macaw and lorikeet with much lower probabilities.

- The model shows some confusion, likely because the Cape Weaver bird is not in the top predictions, suggesting that it struggles with this image.

**Image: Flamingo**

- The model confidently predicts flamingo with a probability of 60.95%, which is a correct classification.
Other less likely predictions include spoonbill and American egret, with minimal probabilities.
Image: Hawk Fountain

- The model predicts kite as the top class with a probability of 18.47%, though with relatively low confidence.
Other predictions, such as robin and great grey owl, indicate uncertainty in classifying this image.

**Image: Hawk Cropped**

- The model once again predicts kite with higher confidence (45.38%).
- Lower-ranked predictions include great grey owl and jay, suggesting the cropped image gave clearer visual cues for the correct class.

**Image: Lhasa Apso**

- The model's top prediction is Lhasa with 26.03% probability, followed by Shih-Tzu and Tibetan terrier.
- These predictions are reasonable as the Lhasa Apso dog breed shares visual similarities with these other breeds, although the confidence is not very high.

**Image: Sleeping Cat**

- The model predicts Persian cat as the most likely class with 16.31% confidence, followed by tabby and tiger cat.
- The relatively low probabilities suggest the model is uncertain, which could be due to similarities between various types of cats in the image.


## Overall Observations:

**The model performs well on some images, such as the flamingo, but struggles with others, especially when visual similarity across classes (e.g., different types of birds or dogs) is high.**

**In cases like the Cape Weaver and Hawk Fountain, the model is uncertain or incorrect, highlighting potential areas where fine-tuning or more data might help improve accuracy.**