# Training Neural Networks using Pytorch
The first part of this notebook demonstrates how to train neural networks using the pytorch library. Almost everything in this section can be found in the pytorch tutorial series. You are encouraged to look over the tutorial provided on pytorch. https://pytorch.org/tutorials/. The outline is as followed:

- Dataset and Dataloader
- Building your own model
- Use a ImageNet-trained model 
- Training the model
- Extracting layer activations

**Note: Pytorch uses tensors instead of numpy arrays.Generally, a numpy array can be converted to a tensor using torch.Tensor()**


In [2]:
import torch 
import numpy as np

X = [3,4,5]
A = np.array(X)
A

array([3, 4, 5])

In [3]:
tensor_A = torch.Tensor(A)
tensor_A

tensor([3., 4., 5.])

## Datasets 

For a simple feedforward neural network, pytorch expects datasets to be iterable, with each iteration providing the data itself and the labels. Pytorch provides several popular datasets, including MNIST, FashionMNIST, CIFAR10, and CIFAR100. See https://pytorch.org/vision/stable/datasets.html#built-in-datasets for more.

In [19]:
import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor

# Getting FashionMNIST from the pytorch library. All datasets that are available on pytorch is separated with its training set and its test set
training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

19.8%

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


100.0%


Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz


100.0%
100.0%

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz
Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz
Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz



100.0%

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz
Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw






In [6]:
# You can define your own custom datasets. There are several ways of doing this depending on your needs, but pytorch does provide a template. 
#import os
#import pandas as pd
from torchvision.io import read_image

# Provide the csv file containing the labels of the images, the directory of the images, any transformations to the images (transform) or the image labels (target_transform)
class CustomImageDataset(Dataset):
    def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
        self.img_labels = pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        return image, label

# my_dataset = CustomImageDataset("labels.csv", os.getcwd()+"/my_images")

**The most important thing for the Dataset structure is that the __getitem__ function is defined such that either only the input data is returned or both the input data and the corresponding labels are returns together.**

Before training your neural network, you must have your dataset in a dataloader structure. Dataloaders allows us to iterate through the dataset in batches and feed each batch into the neural network. 


In [7]:
from torch.utils.data import DataLoader

train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)

In [8]:
for X,Y in train_dataloader:
  print(X.shape)
  print(Y.shape)
  break

torch.Size([64, 1, 28, 28])
torch.Size([64])


# Build your own feedforward model
 
ANN are defined as classes in Pytorch

In [9]:
from torch import nn
class NeuralNetwork(nn.Module):
    # init defines the layers of the network in sequence. For each layer, you provide the input dimension and the output dimension
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

In [10]:
my_neural_network = NeuralNetwork()
for X, Y in train_dataloader:
  output = my_neural_network(X)
  print(output.shape)
  break

torch.Size([64, 10])


# Training feedforward model

Let's return to training a simple feedforward neural network on FashionMNIST

In [24]:
import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
from torch import nn

training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

In [28]:
class MySimpleNeuralNetwork(nn.Module):
    def __init__(self):
        super(MySimpleNeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 300),
            nn.ReLU(),
            nn.Linear(300, 100),
            nn.ReLU(),
            nn.Linear(100, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

my_model = MySimpleNeuralNetwork()

In [29]:
# Define hyparameters, loss function, and optimizer

learning_rate = 0.01            # learning rate
epochs = 20          # training time
batch_size = 64      

# Popular loss functions
# nn.MSELoss
# nn.CrossEntropyLoss
loss_fn = nn.CrossEntropyLoss()

# Popular Optimizers 
from torch.optim import SGD
optimizer = SGD(my_model.parameters(), lr = learning_rate)


In [30]:
# We train our neural network using a loop calling a training function
def training_step(dataloader,model,loss_fn,optimizer):
  for X, Y in dataloader:
    pred = model(X)
    loss = loss_fn(pred,Y)
    # Backpropagation
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()  
  return loss

for epoch in range(epochs):
  loss = training_step(train_dataloader,my_model,loss_fn,optimizer)
  print(f"Epoch: {epoch}, Loss: {loss:>7f}")


Epoch: 0, Loss: 0.812270
Epoch: 1, Loss: 0.587958
Epoch: 2, Loss: 0.613685
Epoch: 3, Loss: 0.353832
Epoch: 4, Loss: 0.491124
Epoch: 5, Loss: 0.295042
Epoch: 6, Loss: 0.346110
Epoch: 7, Loss: 0.402282
Epoch: 8, Loss: 0.480839
Epoch: 9, Loss: 0.425764
Epoch: 10, Loss: 0.293018
Epoch: 11, Loss: 0.117683
Epoch: 12, Loss: 0.404831
Epoch: 13, Loss: 0.283222
Epoch: 14, Loss: 0.409397
Epoch: 15, Loss: 0.270050
Epoch: 16, Loss: 0.282569
Epoch: 17, Loss: 0.450281
Epoch: 18, Loss: 0.175580
Epoch: 19, Loss: 0.221896


NameError: name 'y' is not defined

In [40]:
# Testing step
def testing_step(dataloader,model,loss_fn):
  correct = 0
  for X, Y in dataloader:
    pred = model(X)
    correct += (pred.argmax(1) == Y).type(torch.float).sum().item()
  return correct

correct = testing_step(test_dataloader,my_model,loss_fn)
size = len(test_dataloader.dataset)
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}% \n")

Test Error: 
 Accuracy: 85.8% 



# Get a pretrained model

Pytorch provides imagenet trained convolutional neural network models in their torchvision library.

In [None]:
from torchvision.models import resnet18, ResNet18_Weights

resnet_model = resnet18(weights=ResNet18_Weights.DEFAULT)
# Using pretrained models requires that your data be preprocessed in the way the model expects it
weights = ResNet18_Weights.DEFAULT
preprocess = weights.transforms()
preprocess 

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /Users/nyulo/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
93.4%

In [42]:
from torchvision import transforms
from torchvision import datasets
from torch.utils.data import DataLoader

# the preprocess functions go into transform for your dataset
training_data = datasets.CIFAR10(
    root="data",
    train=True,
    download=True,
    transform=preprocess
)

test_data = datasets.CIFAR10(
    root="data",
    train=False,
    download=True,
    transform=preprocess
)

train_dataloader = DataLoader(training_data, batch_size=32, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=32, shuffle=True)

1.2%

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to data/cifar-10-python.tar.gz


100.0%


Extracting data/cifar-10-python.tar.gz to data
Files already downloaded and verified


In [43]:
resnet_model.eval()

X,Y = next(iter(test_dataloader))
out = resnet_model(X)
print(out.shape)


torch.Size([32, 1000])


# Extracting Layer Activations

Extracting layer activations can be useful for neuroscience. In Pytorch, to extract activations involve putting a forward hook on the layer you want and creating a dictionary to put the layer activations in. 

In [44]:
activation = {}                                       # activation is a global variable
def get_activation(name):                             # get_activation gets a name for the layer activation and and returns a hook
    def hook(model, input, output):
        activation[name] = output.detach()
    return hook

In [45]:
list(resnet_model.children())

[Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False),
 BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
 ReLU(inplace=True),
 MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False),
 Sequential(
   (0): BasicBlock(
     (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
     (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
     (relu): ReLU(inplace=True)
     (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
     (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
   )
   (1): BasicBlock(
     (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
     (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
     (relu): ReLU(inplace=True)
     (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), pad

In [46]:
resnet_model.layer1[0].relu.register_forward_hook(get_activation('Resnet18_Relu1'))

<torch.utils.hooks.RemovableHandle at 0x7f7ef096df60>

In [47]:
X, Y = next(iter(test_dataloader))

output = resnet_model(X)
activation['Resnet18_Relu1'].shape

torch.Size([32, 64, 56, 56])

# Comparing representations

Having extracted activations, we can compare the representations of stimuli in the neural network. Here is a simple demonstration of a representational similarity analysis ([Kriegeskorte, 2008][1]) on layer activations of neural networks. You can replace one of the neural network activations with neural data to compare neural network representations with neural representations.

[1]: https://www.frontiersin.org/articles/10.3389/neuro.06.004.2008/full#h3 

In [48]:
poor_resnet_model = resnet18()
poor_resnet_model.layer1[0].relu.register_forward_hook(get_activation('PoorResnet18_Relu1'))
otuput = poor_resnet_model(X)
activation['PoorResnet18_Relu1'].shape

torch.Size([32, 64, 56, 56])

In [49]:

X1 = activation["Resnet18_Relu1"].reshape((32,64*56*56))
X2 = activation["PoorResnet18_Relu1"].reshape((32,64*56*56))


import numpy as np
from scipy.stats import pearsonr 
def rdm(M):
  d = []
  for i in range(M.shape[0]):
    for j in range(i+1,M.shape[0]):
      r = pearsonr(M[i,:],M[j,:])[0]
      d.append(1-r)
  return d

def rsa(r1,r2):
  return pearsonr(r1,r2)[0]

r1 = rdm(X1)
r2 = rdm(X2)
print(rsa(r1,r2))


0.4396650246878896


# (Advance) Manifold Geometry Analysis

Here, we performed the manifold geometry analysis on neural network data. We will use pre-written functions in utils to make manifold data as expected. Then, we use manifold_analysis.py to analyze the data geometric properties. Functions in manifold_analysis.py expects input data as a list of numpy arrays. The length of list X should be the number of manifolds P. Each array should have shape (N,M) where N is the number of dimensions/features/neurons and M is the number of examples that the manifold consists of.  

**X = [(N_1,M_1), (N_2,M_2), ... (N_P,M_P)]**

Please note that manifold_analysis.py does not account for manifold center correlation. Hence, capacity is poorly estimated. For a better estimate, use manifold_analysis_corr.py 


In [52]:
from utils.activation_extractor import *
from utils.make_manifold_data import * 

# To make manifold data, first load the model. We will load a pre-trained model and use stimuli from CIFAR100 
model = resnet18(weights=ResNet18_Weights)
weights = ResNet18_Weights.DEFAULT
preprocess = weights.transforms()

training_data = datasets.CIFAR10(
    root="data",
    train=True,
    download=True,
    transform=preprocess
)

test_data = datasets.CIFAR10(
    root="data",
    train=False,
    download=True,
    transform=preprocess
)

model.eval();

Files already downloaded and verified
Files already downloaded and verified


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [69]:
sampled_classes = 40
examples_per_class = 40
data = make_manifold_data(training_data,sampled_classes,examples_per_class)           # Given the data, the number of classes, and the number of examples per class, we get the data we want as input to the neural network
data = [d for d in data]
activations_dict = extractor(model,data,layer_nums=[3])                    # extract_activations uses the same strategy we discussed previously to get activations from specified layers. We can specify based on layer numbers or layer type (i.e. con2d, relu, etc)
                                                                                      
# Reshape the manifold data to the expected dimensions
for layer, activations in activations_dict.items():
    X = [d.reshape(d.shape[0],-1).T for d in activations]
    my_activations_dict[layer] = X
    

In [71]:
from manifold_geometry.manifold_analysis import *

kappa = 0    # Specify the margin (usually 0) 
n_t = 300    # Specify the number of Gaussian vectors to sample (200 or 300 is a good default) 

cap = []
rad = []
dim = []
count = 0
for layer, activations in activations_dict.items():
    alpha, radius, dimension = manifold_analysis(X,kappa,n_t)
    c = 1/np.mean(1/alpha)
    r = np.mean(radius)
    d = np.mean(dimension)
    cap.append(c)
    rad.append(r)
    dim.append(d)
