This framework is concerned with comparing two sets of data, for instance source brain and target brain.
It does not take care of trying multiple combinations of source data (such as multiple layers in models), but only makes direct comparisons.

### Benchmarks

Benchmarks consist of a target assembly and a metric to compare assemblies.
They accept a source assembly to compare against and yield a score.

### Pre-defined benchmarks

The following example loads the `dicarlo.Majaj2015.IT` benchmark (consisting of neural recordings in macaque IT and a neural predictivity metric to compare),
and compares it against recordings from V4.

In [2]:
from brainscore import benchmarks

it_benchmark = benchmarks.load('dicarlo.Majaj2015.IT')
v4_data = benchmarks.load_assembly('dicarlo.Majaj2015').sel(region='V4')
score = it_benchmark(v4_data)

cross-validation:   0%|          | 0/10 [00:00<?, ?it/s]

cross-validation:  10%|█         | 1/10 [00:01<00:17,  1.91s/it]

cross-validation:  20%|██        | 2/10 [00:03<00:15,  1.91s/it]

cross-validation:  30%|███       | 3/10 [00:05<00:13,  1.88s/it]

cross-validation:  40%|████      | 4/10 [00:07<00:11,  1.86s/it]

cross-validation:  50%|█████     | 5/10 [00:09<00:09,  1.89s/it]

cross-validation:  60%|██████    | 6/10 [00:11<00:07,  1.88s/it]

cross-validation:  70%|███████   | 7/10 [00:13<00:05,  1.89s/it]

cross-validation:  80%|████████  | 8/10 [00:15<00:03,  1.93s/it]

cross-validation:  90%|█████████ | 9/10 [00:17<00:01,  1.94s/it]

cross-validation: 100%|██████████| 10/10 [00:19<00:00,  1.91s/it]




The benchmark applied the neural predictivity metric to compare the two recordings, and, as you can see, already cross-validated to estimate errors.
The resulting score now contains the center (i.e. the average of the splits, in this case the mean) and the error (in this case standard-error-of-the-mean).

In [9]:
center, error = score.sel(aggregation='center'), score.sel(aggregation='error')
print(f"score: {center.values:.3f}+-{error.values:.3f}")

score: 0.495+-0.003


We can also check the raw values (correlations per neuroid, per split).
These are saved in the attributes under 'raw'.

In [10]:
raw_scores = score.attrs['raw']
print(raw_scores)

<xarray.DataAssembly (split: 10, neuroid: 168)>
array([[0.188818, 0.305646, 0.234136, ..., 0.6615  , 0.682484, 0.642704],
       [0.278896, 0.281422, 0.319581, ..., 0.670958, 0.584426, 0.584051],
       [0.223199, 0.260997, 0.184756, ..., 0.64629 , 0.732094, 0.69738 ],
       ...,
       [0.203238, 0.249746, 0.188271, ..., 0.697742, 0.648629, 0.590761],
       [0.153523, 0.333198, 0.160535, ..., 0.721704, 0.694001, 0.636323],
       [0.216352, 0.287344, 0.292694, ..., 0.690422, 0.718551, 0.61446 ]])
Coordinates:
  * split       (split) int64 0 1 2 3 4 5 6 7 8 9
  * neuroid     (neuroid) MultiIndex
  - neuroid_id  (neuroid) object 'Chabo_L_A_2_4' ... 'Chabo_L_A_8_4'
  - arr         (neuroid) object 'A' 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A' 'A'
  - col         (neuroid) int64 4 3 5 0 1 2 3 4 5 6 2 ... 6 1 2 3 4 5 6 7 2 3 4
  - hemisphere  (neuroid) object 'L' 'L' 'L' 'L' 'L' 'L' ... 'L' 'L' 'L' 'L' 'L'
  - subregion   (neuroid) object 'cIT' 'cIT' 'aIT' 'cIT' ... 'cIT' 'cIT' 'cIT'
  - a

### Custom benchmarks

We can also define our own benchmarks.

One way is to put together existing assemblies and metrics in new ways using the `build` method:

In [13]:
from brainscore import benchmarks
my_benchmark = benchmarks.build(name='my-benchmark', assembly_name='dicarlo.Majaj2015', metric_name='rdm')
v4_data = benchmarks.load_assembly('dicarlo.Majaj2015').sel(region='V4')
score = it_benchmark(v4_data)
print("\n", score)

cross-validation:   0%|          | 0/10 [00:00<?, ?it/s]

cross-validation:  10%|█         | 1/10 [00:01<00:17,  1.92s/it]

cross-validation:  20%|██        | 2/10 [00:04<00:18,  2.26s/it]

cross-validation:  30%|███       | 3/10 [00:06<00:14,  2.13s/it]

cross-validation:  40%|████      | 4/10 [00:08<00:12,  2.03s/it]

cross-validation:  50%|█████     | 5/10 [00:10<00:09,  1.97s/it]

cross-validation:  60%|██████    | 6/10 [00:12<00:07,  1.96s/it]

cross-validation:  70%|███████   | 7/10 [00:14<00:05,  1.94s/it]

cross-validation:  80%|████████  | 8/10 [00:16<00:03,  1.91s/it]

cross-validation:  90%|█████████ | 9/10 [00:18<00:01,  1.91s/it]

cross-validation: 100%|██████████| 10/10 [00:20<00:00,  2.01s/it]


 Score(_variable=<xarray.Variable (aggregation: 2)>
array([0.494592, 0.003259])
Attributes:
    raw:      <xarray.DataAssembly (split: 10, neuroid: 168)>\narray([[0.1888...,_coords=OrderedDict([('aggregation', <xarray.IndexVariable 'aggregation' (aggregation: 2)>
array(['center', 'error'], dtype='<U6'))]),_name=None,_file_obj=None,_initialized=True)





We can also create a custom benchmark from scratch, using our own methods.
To interface with the rest of Brain-Score, it is easiest if we just provide those to the Benchmark class.
(But we could also not inherit and define the `__call__` method ourselves).

In [14]:
from brainscore import benchmarks
from brainscore.benchmarks import Benchmark
from brainscore.metrics.rdm import RDMCrossValidated
from brainscore.metrics.ceiling import InternalConsistency


class MyBenchmark(Benchmark):
    def __init__(self):
        assembly = benchmarks.load_assembly('dicarlo.Majaj2015')  # approximate V4 and IT together
        metric = RDMCrossValidated()
        ceiling = InternalConsistency()
        super(MyBenchmark, self).__init__(name='my-benchmark', target_assembly=assembly, metric=metric, ceiling=ceiling)


v4_data = benchmarks.load_assembly('dicarlo.Majaj2015').sel(region='V4')
score = it_benchmark(v4_data)
print("\n", score)


cross-validation:   0%|          | 0/10 [00:00<?, ?it/s]

cross-validation:  10%|█         | 1/10 [00:01<00:15,  1.73s/it]

cross-validation:  20%|██        | 2/10 [00:04<00:15,  1.90s/it]

cross-validation:  30%|███       | 3/10 [00:06<00:14,  2.03s/it]

cross-validation:  40%|████      | 4/10 [00:08<00:12,  2.09s/it]

cross-validation:  50%|█████     | 5/10 [00:12<00:12,  2.49s/it]

cross-validation:  60%|██████    | 6/10 [00:14<00:09,  2.40s/it]

cross-validation:  70%|███████   | 7/10 [00:16<00:06,  2.29s/it]

cross-validation:  80%|████████  | 8/10 [00:18<00:04,  2.41s/it]

cross-validation:  90%|█████████ | 9/10 [00:22<00:02,  2.73s/it]

cross-validation: 100%|██████████| 10/10 [00:24<00:00,  2.53s/it]


 Score(_variable=<xarray.Variable (aggregation: 2)>
array([0.494592, 0.003259])
Attributes:
    raw:      <xarray.DataAssembly (split: 10, neuroid: 168)>\narray([[0.1888...,_coords=OrderedDict([('aggregation', <xarray.IndexVariable 'aggregation' (aggregation: 2)>
array(['center', 'error'], dtype='<U6'))]),_name=None,_file_obj=None,_initialized=True)





## Metrics

### Pre-defined metrics

Brain-Score comes with many standard metrics used in the field.
For instance, we can easily use regression methods to compare two assemblies based on how well one predicts the neural firing rates in the other:

In [20]:
import numpy as np

from brainscore.assemblies import NeuroidAssembly
from brainscore.metrics.neural_predictivity import PlsPredictivity

source = target = NeuroidAssembly(np.ones((30, 25)),
                                  coords={'image_id': ('presentation', np.arange(30)),
                                          'object_name': ('presentation', ['a', 'b', 'c'] * 10),
                                          'neuroid_id': ('neuroid', np.arange(25)),
                                          'region': ('neuroid', [None] * 25)},
                                  dims=['presentation', 'neuroid'])
metric = PlsPredictivity()
score = metric(source=source, target=target)
print("\n", score)





cross-validation:   0%|          | 0/10 [00:00<?, ?it/s]

[A

  r = r_num / r_den



cross-validation:  10%|█         | 1/10 [00:00<00:01,  5.47it/s]

[A

  r = r_num / r_den





cross-validation:  20%|██        | 2/10 [00:00<00:01,  5.41it/s]

[A

  r = r_num / r_den



cross-validation:  30%|███       | 3/10 [00:00<00:01,  5.43it/s]

[A

  r = r_num / r_den





cross-validation:  40%|████      | 4/10 [00:00<00:01,  5.29it/s]

[A



  r = r_num / r_den



cross-validation:  50%|█████     | 5/10 [00:00<00:00,  5.29it/s]

[A

  r = r_num / r_den





cross-validation:  60%|██████    | 6/10 [00:01<00:00,  4.53it/s]

[A

  r = r_num / r_den



cross-validation:  70%|███████   | 7/10 [00:01<00:00,  4.80it/s]

[A

  r = r_num / r_den





cross-validation:  80%|████████  | 8/10 [00:01<00:00,  4.70it/s]

[A

  r = r_num / r_den





cross-validation:  90%|█████████ | 9/10 [00:01<00:00,  4.80it/s]

[A

  r = r_num / r_den





cross-validation: 100%|██████████| 10/10 [00:02<00:00,  4.80it/s]

[A




[A

  r = func(a, **kwargs)
  return np.nanmean(a, axis=axis, dtype=dtype)



 Score(_variable=<xarray.Variable (aggregation: 2)>
array([nan, nan])
Attributes:
    raw:      <xarray.DataAssembly (split: 10, neuroid: 25)>\narray([[nan, na...,_coords=OrderedDict([('aggregation', <xarray.IndexVariable 'aggregation' (aggregation: 2)>
array(['center', 'error'], dtype='<U6'))]),_name=None,_file_obj=None,_initialized=True)


  keepdims=keepdims)


In [1]:
# load and standardize data

import brainscore

neural_data = brainscore.get_assembly(name="dicarlo.Majaj2015")
neural_data.load()
neural_data = neural_data.sel(variation=6).multi_groupby(['category_name', 'object_name', 'image_id']) \
    .mean(dim='presentation').squeeze('time_bin').T
# Mostly, we compare neural data with computational models, 
# for deep neural networks see https://github.com/mschrimpf/brain-score-models.
# This repository is agnostic of the comparison system, 
# To show-case the functionality, we are going to compare different regions.
v4_data = neural_data.sel(region='V4')
it_data = neural_data.sel(region='IT')

# We can compare a set of assemblies directly by calling the metric 
# but for more sophisticated comparisons we will usually build a benchmark.

In [7]:
# To compare two systems, we instantiate the metric and call it on the source and target assembly.
# The neural fit also relies on training data to instantiate the regression.
from brainscore.metrics.neural_fit import NeuralFit

neural_fit = NeuralFit()
# For demonstration purposes, we will use the same data for training and testing. 
# (which you should obviously never do in practice)
score = neural_fit(v4_data, it_data, v4_data, it_data)
# This gives us a score, containing the correlations per neuroid. 
# For instance, there is one value per cross-validation split.
print("per neuroid: ", score[:10], "...\n")
# Usually we want to aggregate over neuroids to yield a single scalar value:
aggregate = neural_fit.aggregate(score)
print("median over neuroids:", aggregate)


per neuroid:  <xarray.DataAssembly (neuroid_id: 10)>
array([0.431195, 0.5617  , 0.554427, 0.554573, 0.551613, 0.446308, 0.500028,
       0.562305, 0.551316, 0.472134])
Coordinates:
  * neuroid_id  (neuroid_id) object 'Chabo_L_M_5_9' 'Chabo_L_M_6_9' ... ...

median over neuroids: <xarray.DataAssembly ()>
array(0.55406)


In the same way, we can use RDMs:

In [3]:
# We can easily swap out the specific metric and use e.g. RDMs.
# To compare two systems, we instantiate the metric and call it on the source and target assembly.
from brainscore.metrics.rdm import RDMMetric

rdm = RDMMetric()
score = rdm(v4_data, it_data)
print(score)
# Note how the score is much lower with RDMs due to missing re-mapping.

<xarray.DataAssembly ()>
array(0.289892)


### Custom metrics