# Exploring DNN Activations with interdim

In this notebook, we demonstrate how the `interdim` package can be used to visualize and explore various types of data, such as deep neural network (DNN) activations. Specifically, we will explore the internal responses of AlexNet to the FashionMNIST dataset. We will extract the responses of the 'features' module to a random subset of 1000 sample images from the dataset. Although we are using a subset for speed, the package can handle many more samples.

---

# Data Preparation

First, we define the necessary transformations and load the FashionMNIST dataset. We will convert the grayscale images to RGB, resize them to 224x224 pixels (as expected by MobileNetV2), and normalize them.

In [1]:
import torch
from torch.utils.data import Subset
import torchvision
import torchvision.transforms as transforms
from torchvision.models import alexnet
import numpy as np
from interdim import InterDimAnalysis
from interdim.vis import InteractionPlot
from tqdm import tqdm

In [2]:
# Define transformations
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.Normalize((0.5,), (0.5,)),
])

# Load FashionMNIST dataset with conversion to RGB
full_trainset = torchvision.datasets.FashionMNIST(
    root='./data', 
    train=True, 
    download=True, 
    transform=transforms.Compose([
        transforms.Grayscale(num_output_channels=3),  # Convert grayscale to RGB
        transforms.ToTensor(),
    ])
)

# Create a random subset of 1000 samples
indices = np.random.choice(len(full_trainset), 1000, replace=False)
trainset = Subset(full_trainset, indices)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=False)


---

# Model Setup

Next, we load a pre-trained AlexNet model and set it to evaluation mode. We also define a hook to capture the output of the 'features' module, which is the last convolutional layer of AlexNet. This layer is responsible for extracting high-level features from the input images.


In [3]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load pre-trained AlexNet
model = alexnet(pretrained=True)
model.eval()
model.to(device)

# Define a hook to capture the output of a specific layer
activation = {}
def get_activation(name):
    def hook(model, input, output):
        activation[name] = output.detach()
    return hook

# Register the hook
model.features[-1].register_forward_hook(get_activation('features'))



<torch.utils.hooks.RemovableHandle at 0x7764775b8a40>


---

# Extracting Activations

We will now pass the images through the model and extract the activations from the 'features' module. This will allow us to visualize and analyze the internal responses of the model to the FashionMNIST images.


In [4]:
latents, labels, images = [], [], []

with torch.no_grad():
    for batch_images, batch_targets in tqdm(trainloader):
        output = model(transform(batch_images))
        latents.extend(activation['features'].mean(dim=[2, 3]).cpu().numpy())
        labels.extend(batch_targets.numpy())
        images.extend(batch_images.cpu().numpy())

latents = np.array(latents)
labels = np.array(labels)
images = np.array(images)

100%|██████████| 16/16 [00:04<00:00,  3.53it/s]



---

# Visualizing with interdim

Finally, we will use the `interdim` package to visualize the extracted activations. By comparing true labels and clustering-derived labels, we can observe how well the model's internal representations align with the actual classes in the dataset. Additionally, we can explore the structure of the activation space by hovering over the scatter plot. Check it out!


We'll use UMAP for this demo, which requires the `umap-learn` library. You can install it via pip if you don't have it already via the following command:


In [None]:
!pip install umap-learn


If you don't want to do this, you can just change the `method` argument in the `reduce` to 'tsne'.

In [7]:
ida = InterDimAnalysis(latents, true_labels=labels, verbose=True)
ida.reduce(method='umap', n_components=2)
ida.cluster(method='dbscan')
ida.score(method='adjusted_rand')

interaction_plot = InteractionPlot(images*255, plot_type='image')
ida.show(n_components=2, point_visualization=interaction_plot, marker_kwargs={"colorscale": 'Rainbow'})

Performing dimensionality reduction via UMAP with default arguments.
Reduced data shape: (1000, 2)
Performing clustering via DBSCAN with default arguments.
Clustering complete. Number of clusters: 4
Clustering Evaluation Result (adjusted_rand):
Score: 0.23713197635088457


<dash.dash.Dash at 0x7764d7f3a1b0>

After applying the `interdim` package to these activations, we can make some interesting observations. For example, by comparing between true labels and clustering-derived labels (selected via the radio selectors above the scatter plot), we can see that we actually do a decent job of pulling out some of the clothing categories as their own clusters. This is despite 1) having provided no information to the clustering about classes in the data, and 2) the fact that AlexNet used here wasn't trained on this dataset at all! Additionally, by hovering your cursor around the scatter plot space, you can also observe rough 'axes' within these clusters. This includes axes from 'pants' to 'shirts', with dresses in between, and other axes of clothing type and light versus dark. Cool!


---

# Bonus

What if we did this, but with an untrained network? You can do this by setting `pretrained=False` in cell 3, and then rerunning it and the following cells. Would you expect samples to be randomly distributed throughout the activation space?

Once you've done this, look at the results--there's structure here! The images aren't as clearly clustered as before, but there's still a clear class-based structure here. Furthermore, you may notice some similar axes like before by hovering your mouse around the space.

### Why is this?
<details>
<summary>Explanation</summary>
One possibility: Even an untrained network can exhibit some structure in its activations due to the inherent biases in the network architecture and the nature of the input data. The initial random weights can still capture some low-level features, and the network's layers can impose a form of organization on the data. Additionally, good weight initialization settings, which are designed to help the model learn more effectively, can manifest as starting the model in a 'good' spot. This initial structure likely aids in the formation of class-based clusters and axes in the activation space, even without any training.
</details>