# Data Assember
Uses a dataloader object underneath to load spikes and features. This object provides training and test data in paired fashion for model fitting/training the model. 
This abstract the details loading data and pairing the spikes and features in response to a stimulus.\
Also we can simply sub-class the base data assember class to customize the feature e.g. to use spectrogram features, to use all layer features combined etc.

In [1]:
from auditory_cortex.utils import set_up_logging
set_up_logging()

from auditory_cortex.neural_data import create_neural_dataset
from auditory_cortex.dnn_feature_extractor import create_feature_extractor
from auditory_cortex.data_assembler import DNNDataAssembler

dataset_name = 'ucdavis'
session_id = 3
neural_dataset = create_neural_dataset(dataset_name, session_id)

model_name = 'whisper_tiny'
feature_extractor = create_feature_extractor(model_name)



    If you do not have SoX, proceed here:
     - - - http://sox.sourceforge.net/ - - -

    If you do (or think that you should) have SoX, double-check your
    path variables.
    


/bin/sh: line 1: sox: command not found
  torchaudio.set_audio_backend("sox_io")


INFO:Changing convolution kernels for: whisper_tiny


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [2]:
layer_id = 2
bin_width = 50
mVocs = False

data_assembler = DNNDataAssembler(
        neural_dataset, feature_extractor, layer_id, bin_width=bin_width, mVocs=mVocs,
        )


INFO:creating Dataset for timit data.
INFO:Loading data for session at bin_width-50ms.
INFO:Reading features for model: whisper_tiny
INFO:Resamping ANN features at bin-width: 50


In [26]:
features_list, spikes_list = data_assembler.get_training_data()

In [27]:
print(f'Length of features list: {len(features_list)}')
print(f'Length of spikes  list: {len(spikes_list)}')

Length of features list: 451
Length of spikes  list: 451


In [28]:
features_list[0].shape

(48, 384)

In [29]:
spikes_list[0].shape

(41, 9)

In [30]:
data_assembler.channel_ids

[1001, 1002, 201, 202, 2001, 301, 3001, 4001, 4002]

Spikes for all channels (or units in general) have been stacked together for ease of handling and modelling but we can map channel index e.g. 4 to actual channel id, as shown in the next cell. We can use these actual ids to interpret or save the results so that they can be understood properly. 

In [23]:
ch = 4
print(f"Unit id at index {ch} is {data_assembler.channel_ids[ch]}")

Unit id at index 4 is 2001


#### Loading spikes for other sessions
Once data assembler object has been created, in order to get training and test data for a different session, we don't need to create a new object. Rather we can use the same object so that we don't have to reload the DNN features. \
We can use **read_session_spikes** method to get the data pairs for the new session, as shown below. We can tell the difference by the different channel_ids (or unit_ids) 

In [8]:
session = 4
neural_data = create_neural_dataset(dataset_name, session)
data_assembler.read_session_spikes(neural_data)

INFO:Loading data for session at bin_width-50ms.


In [9]:
data_assembler.channel_ids

[101, 201, 3001, 3002, 3003, 3004, 4001]

In [10]:
data_assembler.get_session_id()

4