Benchmarks consist of a target assembly and a metric to compare assemblies.
They accept a source assembly to compare against and yield a score.

### Pre-defined benchmarks

Brainscore defines benchmarks, which can be run on brain models. To implement a model, the BrainModel interface has to be implemented by the model to be tested. A very simple implementation could look like this:

In [None]:
import numpy as np
from typing import List, Tuple

from brainscore.model_interface import BrainModel
from brainio_base.assemblies import DataAssembly

class LayerModel(BrainModel):
    def __init__(self, identifier, region, layer):
        self.identifier = identifier
        self.layer = layer
        self.region = region

    def look_at(self, stimuli):
        rnd = np.random.mtrand.RandomState(0)
        source = DataAssembly(rnd.rand(len(stimuli), 5, 1),
                              coords={'image_id': ('presentation', stimuli['image_id']),
                                      'object_name': ('presentation', stimuli['object_name']),
                                      'neuroid_id': ('neuroid', np.arange(5)),
                                      'time_bin_start': ('time_bin', [70]),
                                      'time_bin_end': ('time_bin', [170])},
                              dims=['presentation', 'neuroid', 'time_bin'])
        source.name = 'dicarlo.mock'
        return source

    def start_task(self, task, **kwargs):
        if task != BrainModel.Task.passive:
            raise NotImplementedError()

    def start_recording(self, recording_target=BrainModel.RecordingTarget, time_bins=List[Tuple[int]]):
        if str(recording_target) != self.region:
            raise NotImplementedError("Region ", recording_target, " is not committed")


The implementation maps a given brain region to a neural network layer. In the look_at method, the class just creates a mock result and returns it. The other two methods only check for correctness of the input values.

The following example loads the `dicarlo.Majaj2015.IT` benchmark (consisting of neural recordings in macaque IT and a neural predictivity metric to compare),
and compares it against an instance of this .

In [None]:
from brainscore import benchmarks

from brainscore.benchmarks import benchmark_pool
def run_notebook():
    it_benchmark = benchmark_pool['dicarlo.Majaj2015.IT-pls']
    model = LayerModel('mock', 'IT', 'main.Conv2')
    score = it_benchmark(model)

cross-validation: 100%|██████████| 10/10 [00:19<00:00,  1.91s/it]


The benchmark applied the neural predictivity metric to compare the two recordings, and, as you can see, already cross-validated to estimate errors.
The resulting score now contains the center (i.e. the average of the splits, in this case the mean) and the error (in this case standard-error-of-the-mean).

In [9]:
center, error = score.sel(aggregation='center'), score.sel(aggregation='error')
print(f"score: {center.values:.3f}+-{error.values:.3f}")

score: 0.495+-0.003


We can also check the raw values (correlations per neuroid, per split).
These are saved in the attributes under 'raw'.

In [10]:
raw_scores = score.attrs['raw']
print(raw_scores)

<xarray.DataAssembly (split: 10, neuroid: 168)>
array([[0.188818, 0.305646, 0.234136, ..., 0.6615  , 0.682484, 0.642704],
       [0.278896, 0.281422, 0.319581, ..., 0.670958, 0.584426, 0.584051],
       [0.223199, 0.260997, 0.184756, ..., 0.64629 , 0.732094, 0.69738 ],
       ...,
       [0.203238, 0.249746, 0.188271, ..., 0.697742, 0.648629, 0.590761],
       [0.153523, 0.333198, 0.160535, ..., 0.721704, 0.694001, 0.636323],
       [0.216352, 0.287344, 0.292694, ..., 0.690422, 0.718551, 0.61446 ]])
Coordinates:
  * split       (split) int64 0 1 2 3 4 5 6 7 8 9
  * neuroid     (neuroid) MultiIndex
  - neuroid_id  (neuroid) object 'Chabo_L_A_2_4' ... 'Chabo_L_A_8_4'
  - arr         (neuroid) object 'A' 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A' 'A'
  - col         (neuroid) int64 4 3 5 0 1 2 3 4 5 6 2 ... 6 1 2 3 4 5 6 7 2 3 4
  - hemisphere  (neuroid) object 'L' 'L' 'L' 'L' 'L' 'L' ... 'L' 'L' 'L' 'L' 'L'
  - subregion   (neuroid) object 'cIT' 'cIT' 'aIT' 'cIT' ... 'cIT' 'cIT' 'cIT'
  - a

### Custom benchmarks

We can also define our own benchmarks.

One way is to put together existing assemblies and metrics in new ways using the `build` method:

In [13]:
from brainscore.benchmarks.neural import build_benchmark
from brainscore.assemblies.public import assembly_loaders
from brainscore.metrics.ceiling import InternalConsistency
from brainscore.metrics.regression import CrossRegressedCorrelation, mask_regression, ScaledCrossRegressedCorrelation, \
    pls_regression, pearsonr_correlation
similarity_metric = CrossRegressedCorrelation(
    regression=pls_regression(), correlation=pearsonr_correlation(),
    crossvalidation_kwargs=dict(stratification_coord='object_name'))
my_benchmark = build_benchmark(identifier='my-benchmark', assembly_loader=assembly_loaders['dicarlo.Majaj2015.lowvar.IT'],
                                similarity_metric=similarity_metric, ceiler=InternalConsistency())
model = LayerModel('mock', 'IT', 'main.Conv2')
score = my_benchmark(model)
print("\n", score)

cross-validation: 100%|██████████| 10/10 [00:20<00:00,  2.01s/it]


 Score(_variable=<xarray.Variable (aggregation: 2)>
array([0.494592, 0.003259])
Attributes:
    raw:      <xarray.DataAssembly (split: 10, neuroid: 168)>\narray([[0.1888...,_coords=OrderedDict([('aggregation', <xarray.IndexVariable 'aggregation' (aggregation: 2)>
array(['center', 'error'], dtype='<U6'))]),_name=None,_file_obj=None,_initialized=True)





We can also create a custom benchmark from scratch, using our own methods.
To interface with the rest of Brain-Score, it is easiest if we just provide those to the Benchmark class.
(But we could also not inherit and define the `__call__` method ourselves).