In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
from neuro_data.movies.configs import DataConfig
from neuro_data.movies.data_schemas import MovieMultiDataset

# Configurations and Groups

Each set of recordings has a group ID. Each group can be one or several scans. First try to find the group that belongs to the scan you want to run. 

In [None]:
MovieMultiDataset()

In [None]:
MovieMultiDataset.Member & 'group_id=2'

Once you have the scan, you need to find a config on top. A configuration could determine what area you want, which layer, etc. All those configs are in `neuro_data.movies.configs.DataConfig`.

In [None]:
DataConfig()

This table has a lot of subtables that determine and implement the actual config. A good place to start is `AreaLayer`.

In [None]:
DataConfig.AreaLayer() & 'layer="L2/3" and brain_area="V1"'

You need to get the hash of the config you want to load it. Let's use natural movies (`stimulus.Clip`) and `V1` and `L2/3`. 
Note that the config is independent of the `group_id` so you need to specify it. Some combinations do not make sense. You should not try to load `LM` neurons from a recording that only has `V1`. 

In [None]:
trainsets, train_loaders = DataConfig().load_data(dict(data_hash='ecb7c24fafd19503a2eef756ac4a24a4', group_id=2), tier='train', cuda=True, batch_size=5)

# Dataloader and datasets

Dataloaders and datasets are both dictionaries. They correspond to the different scans. 

Each dataset has *transforms* that restrict the "columns" (neurons) in a certain way or transfor the data. For instance, the example below chooses a subsequence of 150 frames, normalizes the data (but not the inputs or responses), subsamples to the right set of neurons, and converts the result to a tensor. 


In [None]:
trainsets['group002-18142-6-3-pre0-seg3-spi5-pip1']

This is configured by the subtable in the `DataConfig` and you should look at the code to understand what it does. Note that the configuration can be different depending on whether `train`, `test`, or `validation` sets are loaded. For instance, subsampling to 150 frames is nice for trainins, but doesn't make sense for testing. 

In [None]:
testsets, _ = DataConfig().load_data(dict(data_hash='ecb7c24fafd19503a2eef756ac4a24a4', group_id=2), tier='test', cuda=True)

In [None]:
testsets['group002-18142-6-3-pre0-seg3-spi5-pip1']

The dataloader returns the right set of trials, transformed in the correct way. 

In [None]:
for movie, _, _, responses in train_loaders['group002-18142-6-3-pre0-seg3-spi5-pip1']:
    print(movie.shape, responses.shape)