https://www.kaggle.com/code/vortexkol/alexnet-cnn-architecture-on-tensorflow-beginner

Step 1. Use one of the deep learning frameworks (Pytorch, Tensorflow, Keras, ...) and
load the pre-trained AlexNet model. Input these 156 images to the pre-trained AlexNet
model and extract feature maps/activations from Conv 1, 2, 3, 4, 5, fc6, and fc7 layers.
Vectorize the activations corresponding to each image. You should have a vector of
activations per image per layer mentioned above. (20 points)

In [61]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import os

import torch
import torch.nn as nn
from torchvision import models, transforms
from torchvision.models.feature_extraction import create_feature_extractor
from PIL import Image
import matplotlib.pyplot as plt


In [62]:
model = models.alexnet(weights='DEFAULT')
model.eval()

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
 

In [87]:
imgOutputs = []
path = "./Image Set"
for file in os.listdir(path):
    img=os.path.join(path,file)

    model = create_feature_extractor(model, 
                    {"features.0":"conv1","features.3":"conv2","features.6":"conv3","features.8":"conv4","features.10":"conv5",
                    "classifier.1":"fc6","classifier.4":"fc7"})

    input_image = Image.open(img)
    preprocess = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])

    input_tensor = preprocess(input_image)
    input_batch = input_tensor.unsqueeze(0)
    with torch.no_grad():
        out = model(input_batch)
    imgOutputs.append(out)

# for k, v in out.items():
#     print(k, v.shape)

Step 2. Create a representational dissimilarity matrix (RDM) which is 156 x156 matrix,
each row and column in this matrix is indexed by one of the images in the image set
and each element in the matrix is the Euclidean distance between the activation vectors
of the corresponding images you extracted in Step 1. (30 points)

Step 3. Plot the RDM for each layer (Conv 1, 2, 3, 4, 5, fc6, fc7), and their
corresponding multidimensional scaling (MDS) visualization in 2 Dimension. The class
labels you should use for the MDS plotting includes images 1 to 28 are Animals, 29-64
are Objects, 65 to 100 are scenes, 101 to 124 are human activities, 125 to 156 are
faces. (30 points)

Please write a short report presenting and discussing the results of this assignment. (20
points)