# Brain-Score Benchmarking

In this notebook, I will try to benchmark my own models with brain-score and, in a simplified way, test the effectiveness of aligning object-detection models with neuroanatomy.

I tried to base my code on the [official tutorial](https://github.com/brain-score/candidate_models/blob/master/examples/score-model.ipynb), but it seems to be deprecated as not even the imports work. In the end, not even the guides on the brain-score website helped me, so I had to dive into the implementation of the packages themselves. After lots of trial-and-error, I finally made it work properly.

In [29]:
import numpy as np
import torch
from torch import nn
import torch.nn.functional as F
import functools
from brainscore_vision.model_helpers.activations.pytorch import load_preprocess_images
from brainscore_vision.model_helpers.activations.pytorch import PytorchWrapper
from brainscore_vision.model_helpers.brain_transformation import ModelCommitment
from brainscore_vision import model_registry
from brainscore_vision import benchmark_registry
from brainscore_vision.benchmarks.majajhong2015 import __init__

We are using a simple, non-brain-aligned model with a few layers as baseline.

In [None]:
class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv1 = torch.nn.Conv2d(in_channels=3, out_channels=2, kernel_size=3)
        self.relu1 = torch.nn.ReLU()
        linear_input_size = np.power((224 - 3 + 2 * 0) / 1 + 1, 2) * 2
        self.linear = torch.nn.Linear(int(linear_input_size), 1000)
        self.relu2 = torch.nn.ReLU()  # can't get named ReLU output otherwise

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = x.view(x.size(0), -1)
        x = self.linear(x)
        x = self.relu2(x)
        return x

We need a simple function that instantiated the model for us and wraps it with the Brain-Model wrapper for Pytorch. Make sure to give the model a unique identifier. Then, we pass this wrapper to a ```ModelCommitment```, where we once again use the unique identifier we chose and specify which layers should be conidered for benchmarking. We are not neuroanatomically aligned, which is why we will try all layers.

In [4]:
def get_model():
    model = MyModel()
    preprocessing = functools.partial(load_preprocess_images, image_size=224)
    activations_model = PytorchWrapper(identifier='my-model', model=MyModel(), preprocessing=preprocessing)
    model = ModelCommitment(identifier='my-model', activations_model=activations_model,
                            # specify layers to consider
                            layers=['conv1', 'relu1', 'relu2'])
    model.image_size = 224
    return model

Afterwards, we put the function into the model registry with the unique identifier we specified. Furthermore, we define a benchmark that we want to use. I chose this one arbitrarily. Essentially, we will check the neural predictivity based on the V4 region.

In [5]:
model_registry['my-model'] = get_model
benchmark = benchmark_registry['MajajHong2015public.V4-pls']()



We then just need to instantiate the model with our function and put it through the benchmark. Cross-Validation, dataset download etc. will all happen automatically, but it will take a while. Even for this simple model on only one benchmark, it took about 15 minutes.

In [6]:
model = get_model()
score = benchmark(model)

print(score)

cross-validation: 100%|██████████| 10/10 [15:59<00:00, 95.93s/it] 

<xarray.Score ()>
array(0.13852261)
Attributes:
    error:    <xarray.Score ()>\narray(0.00626453)
    raw:      <xarray.Score ()>\narray(0.3289301)\nAttributes:\n    raw:     ...
    ceiling:  <xarray.DataArray ()>\narray(0.88377819)\nAttributes:\n    raw:...





We receive a score of 0.13, which is (obviously) not very brain-like. The error is small, which is why we can be rather certain that the score is correct.

In [17]:
print(f"Score: {score.item()}")
print(f"Error: {score.error.item()}")
print(f"Raw: {score.raw.item()}")
print(f"Ceiling: {score.ceiling.item()}")

Score: 0.13852260950600967
Error: 0.006264534611139793
Raw: 0.32893009629761405
Ceiling: 0.8837781905660179


Next, we will try a simplified version of the CORnet-S model, as proposed in the original paper. We are given three convolutional layers that represent V1, V2 and V4 areas, and one linear layer for the output, which will represent the IT layer.

In [37]:
class CortexInspiredModel(nn.Module):
    def __init__(self):
        super(CortexInspiredModel, self).__init__()
        # V1
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=7, stride=2, padding=3)
        # V2
        self.conv2 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, stride=2, padding=2)
        # V4
        self.conv3 = nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=2, padding=1)
        # IT
        self.fc = nn.Linear(256 * 28 * 28, 1000)  # input 224x224 pixels

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = F.relu(self.conv3(x))
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

We once again define a function to retrieve our model. We now have a model which is aligned to neuroanatomy. Since we are benchmarking on V4 neuronal recordings, we will directly map the third convolutional layer without trying out other layers.

In [38]:
def get_model():
    model = CortexInspiredModel()
    preprocessing = functools.partial(load_preprocess_images, image_size=224)
    activations_model = PytorchWrapper(identifier='my-brain-model', model=model, preprocessing=preprocessing)
    model = ModelCommitment(identifier='my-brain-model', activations_model=activations_model,
                            layers=['conv3']) # directly map V4 for benchmark
    model.image_size = 224
    return model

We once again put the model into the registry and re-instantiate the benchmark. Make sure to use a different identifier than before!

In [39]:
model_registry['my-brain-model'] = get_model
benchmark = benchmark_registry['MajajHong2015public.V4-pls']()



We had a more complicated model this time, which is reflected in the time needed for benchmarking. This time, we needed ~32 minutes for checking only one layer instead of three.

In [40]:
model = get_model()
score = benchmark(model)

print(score)

layers:   0%|          | 0/1 [00:00<?, ?it/s]



activations:   0%|          | 0/3200 [00:00<?, ?it/s]

  warn(


activations:   0%|          | 0/1024 [00:00<?, ?it/s]

layer packaging:   0%|          | 0/1 [00:00<?, ?it/s]


[A
layer principal components: 100%|██████████| 1/1 [00:34<00:00, 34.40s/it]


layer packaging:   0%|          | 0/1 [00:00<?, ?it/s]

cross-validation: 100%|██████████| 10/10 [00:15<00:00,  1.53s/it]
layers: 100%|██████████| 1/1 [02:01<00:00, 121.75s/it]


activations:   0%|          | 0/3200 [00:00<?, ?it/s]

layer packaging:   0%|          | 0/1 [00:00<?, ?it/s]

cross-validation: 100%|██████████| 10/10 [32:47<00:00, 196.79s/it]

<xarray.Score ()>
array(0.3121215)
Attributes:
    error:    <xarray.Score ()>\narray(0.00634253)
    raw:      <xarray.Score ()>\narray(0.49374774)\nAttributes:\n    raw:    ...
    ceiling:  <xarray.DataArray ()>\narray(0.88377819)\nAttributes:\n    raw:...





The results show that this kind of neuroanatomical alignment pays off to achieve brain-likeness. **With this very simple Feed-Forward Network, we already achieved a score of 0.312 instead of 0.138!**

In [41]:
print(f"Score: {score.item()}")
print(f"Error: {score.error.item()}")
print(f"Raw: {score.raw.item()}")
print(f"Ceiling: {score.ceiling.item()}")

Score: 0.3121214992564354
Error: 0.006342530207302218
Raw: 0.4937477416650767
Ceiling: 0.8837781905660179
