# Neural datasets
Neural dataset objects provide an interface to integrate neural data to this repository. You can read about how to add a new neural data to the already supported datasets in the documents section [here](./../docs/1_neural_data.md).
This notebook is a good place to start working with datasets already supported by this repo. \\

Each dataset is identified by a unique name or identifier. Retrieve a list of dataset names that are available.

In [1]:
from auditory_cortex.neural_data import list_neural_datasets
list_neural_datasets()

['ucsf', 'ucdavis']

There can multiple recoring sessions for each dataset. There are two kinds of objects for each neural data. 
- **metadata**: common for all the sessions, provides general information about stimuli used e.g. sampling rate, stimulus ids, stimulus durations etc.
- **dataset**: separate object created for every session, provides functionality to access neural spikes for all the stimuli for a specific session.

Given below are the examples of creating `metadata` and `dataset` object.  


In [1]:
from auditory_cortex.neural_data import create_neural_metadata

metadata = create_neural_metadata('ucsf')
metadata.get_all_available_sessions()

numexpr.utils - INFO - Note: NumExpr detected 20 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
numexpr.utils - INFO - NumExpr defaulting to 8 threads.


array(['180413', '180420', '180501', '180502', '180613', '180622',
       '180627', '180717', '180719', '180720', '180724', '180728',
       '180730', '180731', '180807', '180808', '180810', '180814',
       '190604', '190605', '190606', '190703', '190726', '190801',
       '191113', '191115', '191121', '191125', '191206', '191209',
       '191210', '191211', '191219', '200205', '200206', '200207',
       '200212', '200213', '200219', '200313', '200318'], dtype='<U6')

In [66]:
stim_ids = metadata.get_stim_ids()
print(f"{stim_ids.keys()}")

dict_keys(['repeated', 'unique'])


In [29]:
from auditory_cortex.neural_data import create_neural_dataset

dataset_name = 'ucsf'
session_id = '200206'   # can also be in int of float
dataset = create_neural_dataset(dataset_name, session_id)

auditory_cortex.neural_data.ucsf_data.ucsf_dataset - INFO - NeuralData:  Creating object for session: 200206 ... 
auditory_cortex.neural_data.ucsf_data.ucsf_dataset - INFO - Done.


#### spikes for unique stimuli..

In [7]:
bin_width=50    # in milliseconds
repeated=False  # if True, gives data for repeated trials (test data)
mVocs=False     # if True, gives spikes for monkey vocalizations, otherwise for timit
spikes = dataset.extract_spikes(bin_width=bin_width, repeated=repeated, mVocs=mVocs)

In [None]:
print(f"stim_ids (for unique):\n {spikes.keys()}")

stim_ids: dict_keys([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 213, 214, 215, 216, 217, 219, 220, 221, 222, 223, 224, 22

In [9]:
stim_id = 1  # example stim id
print(f"Channel ids for stim: {stim_id}")
print(f"{spikes[stim_id].keys()}")

Channel ids for stim: 1
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63])


In [10]:
stim_id = 1     # example stim id
channel_id = 0  # example channel id
print(f"Shape of spikes for stim {stim_id}, channel {channel_id}:")
print(f"{spikes[stim_id][channel_id].shape}")

Shape of spikes for stim 1, channel 0:
(1, 36)


#### spikes for repeated stimuli..

In [30]:
bin_width=50    # in milliseconds
repeated=True  # if True, gives data for repeated trials (test data)
mVocs=False     # if True, gives spikes for monkey vocalizations, otherwise for timit
spikes = dataset.extract_spikes(bin_width=bin_width, repeated=repeated, mVocs=mVocs)

In [5]:
print(f"stim_ids (for repeated):\n {spikes.keys()}")

stim_ids (for repeated):
 dict_keys([12, 13, 32, 43, 56, 163, 212, 218, 287, 308])


In [31]:
stim_id = 12  # example stim id
print(f"Channel ids for stim: {stim_id}")
print(f"{spikes[stim_id].keys()}")

Channel ids for stim: 12
dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63])


In [32]:
stim_id = 12     # example stim id
channel_id = 3  # example channel id
print(f"Shape of spikes for stim {stim_id}, channel {channel_id}:")
print(f"{spikes[stim_id][channel_id].shape}")

Shape of spikes for stim 12, channel 3:
(11, 27)


In [33]:
spikes[stim_id][channel_id][0]

array([0, 0, 0, 1, 3, 1, 4, 2, 3, 0, 1, 4, 2, 2, 0, 0, 0, 1, 0, 1, 2, 1,
       1, 0, 1, 1, 1], dtype=int32)

### ucdavis 

In [9]:
from auditory_cortex.neural_data import create_neural_metadata

dataset_name = 'ucdavis'
metadata = create_neural_metadata(dataset_name)
metadata.get_all_available_sessions()

[0, 1, 2, 3, 4, 5, 6, 7, 8]

Some sessions were recorded with 3 repeats of stimuli in the test set and others with 12 repeats. To get only the sessions with specific number of repeats, we can specify *num_repeats*

In [10]:
num_repeats = 12
metadata.get_all_available_sessions(num_repeats)

[3, 4, 5, 6, 7, 8]

Session IDs in this case are the indices assigned to each session for ease of use. To get the full name of recording session that has recording data and monkey id, additional method is provided.

This might be needed for finding out further details about session for interpretation of results but for the purpose of running code and computing results, session IDs would suffice.

In [18]:
sess_id = 0 # example session id
metadata.full_session_name(sess_id)

'relayz_2024-10-28b_boilermaker.mat'

In [19]:
sess_id = 0 # example session id
metadata.num_repeats_for_sess(sess_id)

3

Next is the unique identifier of each stimulus presented to the monkeys. We can also get lists of stimuli presented once or repeated using metadata object. 

Note: These stimulus ids are valid ONLY for 12-repeat experiments. For 3-repeat experiments we rely on the neural recording files, so metadata object MUST NOT be used for stimlus ids of 3-repeat experiments.

In [8]:
timit_stim_ids = metadata.get_stim_ids(mVocs=False)
print(f"For TIMIT:")
for stim_type in timit_stim_ids:
    print(f"\t: {stim_type}, number of stimuli: {len(timit_stim_ids[stim_type])}")

For TIMIT:
	: repeated, number of stimuli: 46
	: unique, number of stimuli: 451


In [9]:
mVocs = True  # if True, gives spikes for monkey vocalizations, otherwise for timit
mVocs_stim_ids = metadata.get_stim_ids(mVocs=mVocs)
print(f"For mVocs:")
for stim_type in mVocs_stim_ids:
    print(f"\t: {stim_type}, number of stimuli: {len(mVocs_stim_ids[stim_type])}")

For mVocs:
	: repeated, number of stimuli: 153
	: unique, number of stimuli: 1415


In [4]:
metadata.total_stimuli_duration(mVocs=False)

{'unique': 762.4506249999997, 'repeated': 84.8345}

In [3]:
metadata.total_stimuli_duration(mVocs=True)

{'unique': 900.2547708333341, 'repeated': 100.06502083333329}

Here is how we can use stimulus ids using the neural dataset object. I initially found some issues in stimulus ids from metadata files and dataset objects to not be matched. 

**06-23-25**: I found the *indUsed* field in metadata files to be accurate and matching the experimental setup in the neural dataset objects. 

In [12]:
from auditory_cortex.neural_data import create_neural_dataset
dataset_name = 'ucdavis'
session_id = 3
dataset = create_neural_dataset(dataset_name, session_id)

In [59]:
print(f"# of stimuli: ")
stim_type = 'mVocs' if mVocs else 'timit'
for mVocs in [False, True]:
    stim_ids = dataset.get_stim_ids(mVocs=mVocs)
    print(f"For: {stim_type}")
    for typee in stim_ids:
        print(f"\t{typee}: {stim_ids[typee].shape}")

# of stimuli: 
For: mVocs
	unique: (451,)
	repeated: (46,)
For: mVocs
	unique: (1415,)
	repeated: (153,)


In [13]:
bin_width=50    # in milliseconds
repeated=True  # if True, gives data for repeated trials (test data)
mVocs=False     # if True, gives spikes for monkey vocalizations, otherwise for timit
spikes = dataset.extract_spikes(bin_width=bin_width, repeated=repeated, mVocs=mVocs)

In [15]:
stim_ids = list(spikes.keys())
print(f"stim_ids (for repeated):\n {stim_ids}")

stim_ids (for repeated):
 ['103-fmjf0_si1254.wfm', '120-frew0_si1030.wfm', '124-fsah0_si1874.wfm', '13-fcaj0_si1804.wfm', '138-ftaj0_si1329.wfm', '142-ftmg0_si2162.wfm', '145-fvkb0_si1789.wfm', '148-mabw0_si2294.wfm', '151-maeo0_si1956.wfm', '153-majc0_si2095.wfm', '16-fcft0_si1808.wfm', '164-mbcg0_si486.wfm', '174-mbom0_si1644.wfm', '183-mccs0_si2099.wfm', '186-mcem0_si1398.wfm', '187-mchh0_si1634.wfm', '189-mcmb0_si1898.wfm', '193-mctm0_si720.wfm', '195-mctw0_si2003.wfm', '206-mdhs0_si1530.wfm', '240-meal0_si2177.wfm', '263-mges0_si1481.wfm', '278-mjar0_si2247.wfm', '292-mjeb0_si656.wfm', '308-mjmm0_si625.wfm', '315-mjrh0_si1125.wfm', '340-mklt0_si583.wfm', '345-mljc0_si1855.wfm', '360-mmdm1_si2043.wfm', '362-mmds0_si1973.wfm', '373-mmrp0_si2034.wfm', '387-mpgh0_si924.wfm', '394-mrab0_si1224.wfm', '396-mram0_si1905.wfm', '4-fawf0_si1000.wfm', '401-mrdm0_si1595.wfm', '402-mrds0_si1167.wfm', '404-mreh1_si2229.wfm', '41-fdrw0_si1423.wfm', '412-mrgg0_si1199.wfm', '440-mrxb0_si1585.wfm', 

In [19]:
stim_id = stim_ids[0]  # example stim id
unit_ids = list(spikes[stim_id].keys())
print(f"unit_ids ids for stim: '{stim_id}' \n {unit_ids}")

unit_ids ids for stim: '103-fmjf0_si1254.wfm' 
 [1001, 1002, 201, 202, 2001, 301, 3001, 4001, 4002]


In [21]:
stim_id = stim_ids[0]  # example stim id
unit_id = unit_ids[0]  # example channel id
print(f"Shape of spikes for stim '{stim_id}', unit {unit_id}:")
print(f"{spikes[stim_id][unit_id].shape}")

Shape of spikes for stim '103-fmjf0_si1254.wfm', unit 1001:
(12, 41)


In [24]:
spikes[stim_id][unit_id][3]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1])