This example is primarily concerned with loading raw data. This data is often not fully pre-processed (e.g. neuroids that we don't trust are not filtered, repetitions are not averaged, hard stimuli are not pre-selected etc.).

If you only want to compare data with each other, you are probably better off 
using benchmarks directly (e.g. `from brainscore import benchmarks; benchmarks.load('dicarlo.MajajHong2015')`) or
loading the data through benchmarks (e.g. `from brainscore import benchmarks; benchmarks.load_assembly('dicarlo.MajajHong2015')`).

### Neural assembly

We can load data (called "assembly") using the `get_assembly` method.
In the following, we load neural data from the DiCarlo lab, published in Majaj, Hong et al. 2015.


In [None]:
import brainscore
neural_data = brainscore.get_assembly(name="dicarlo.MajajHong2015.public")
neural_data

This gives us a NeuronRecordingAssembly, a sub-class of xarray DataArray.
The behavioral and neural assemblies are always handled with the xarray framework.
xarray data is basically a multi-dimensional table with annotating coordinates, similar to pandas. 
More info here: http://xarray.pydata.org.

The neural assembly `dicarlo.MajajHong2015.public`
is structured into the dimensions `neuroid x presentation`.
`neuroid` is a MultiIndex containing information about the recording site, such as the animal and the region.
`presentation` refers to the single presentation of a stimulus with coords annotating 
e.g. the image_id and the repetition.
Finally, `time_bin` informs us about the time in milliseconds from when neural responses were collected. 
This assembly contains averaged spike rates in the 70-170ms window.

The data is in a raw format, but typically we use a pre-processed version.
We can further process the data e.g. as follows: 

1. average across repetitions,

2. filter neuroids from the IT region,

3. get rid of the scalar time_bin dimension,

4. and reshape into `presentation x neuroid`.

In [None]:
compact_data = neural_data.multi_groupby(['category_name', 'object_name', 'image_id']).mean(dim='presentation')  # (1)
compact_data = compact_data.sel(region='IT')  # (2)
compact_data = compact_data.squeeze('time_bin')  # (3)
compact_data = compact_data.transpose('presentation', 'neuroid')  # (4)
compact_data

The data now contains 3200 images and the responses of 168 neuroids.

In [None]:
print(compact_data.shape)

Note that the data used for benchmarking is typically already pre-processed.
For instance, the target assembly for the public benchmark `MajajITPublicBenchmark`  
is the same as our pre-processed version here:

In [None]:
from brainscore.benchmarks.public_benchmarks import MajajHongITPublicBenchmark

benchmark = MajajHongITPublicBenchmark()
benchmark_assembly = benchmark._assembly
print(benchmark_assembly.shape)


### Stimulus Set

You may have noticed the attribute `stimulus_set` in the previous assembly.
A stimulus set contains the stimuli that were shown to measure the neural recordings.
Specifically, this entails e.g. the image_id and the object_name, packaged in a pandas DataFrame.

In [None]:
stimulus_set = neural_data.attrs['stimulus_set']
print(stimulus_set[:3])

We can also directly retrieve any image using the `get_image` method.

In [None]:
image_path = stimulus_set.get_image(stimulus_set['image_id'][0])
print(image_path)


Images are automatically downloaded locally and can thus be loaded and displayed directly.

In [None]:
%matplotlib inline
from matplotlib import pyplot, image
img = image.imread(image_path)
pyplot.imshow(img)
pyplot.show()