# Working with NWB in Python 

In [1]:
import numpy as np
import pandas as pd 
from matplotlib import pyplot as plt
from pynwb import NWBHDF5IO

## Reading our NWB file

To access the data in our nwb file we must read the file. This is done in two steps:
- assign our file as an `NWBHDF5IO` object
- read our file 

The first step is done using the `NWBHDF5IO` class to create our `NWBHDF5IO` object and map our file to HDF5 format. Once we have done this, we can use the `read()` method to return our nwb file. For more information on how to read NWB files, please visit the *Reading data from an NWB file* section from the <a href = 'https://pynwb.readthedocs.io/en/latest/tutorials/general/file.html'> NWB Basics Tutorial</a>. For more information on the `NWBHDF5IO` class, please visit the <a href = 'https://pynwb.readthedocs.io/en/latest/pynwb.html#pynwb.NWBHDF5IO'> original documentation</a>.

In [2]:
# first read the file 
io = NWBHDF5IO('000017/sub-Cori/sub-Cori_ses-20161214T120000.nwb', 'r')
nwb_file = io.read()
nwb_file

root pynwb.file.NWBFile at 0x140457060178576
Fields:
  acquisition: {
    lickPiezo <class 'pynwb.base.TimeSeries'>,
    wheel_position <class 'pynwb.base.TimeSeries'>
  }
  devices: {
    0 <class 'pynwb.device.Device'>,
    1 <class 'pynwb.device.Device'>
  }
  electrode_groups: {
    Probe1 <class 'pynwb.ecephys.ElectrodeGroup'>,
    Probe2 <class 'pynwb.ecephys.ElectrodeGroup'>
  }
  electrodes: electrodes <class 'hdmf.common.table.DynamicTable'>
  experiment_description: Large-scale Neuropixels recordings across brain regions of mice during a head-fixed visual discrimination task. 
  experimenter: ['Nick Steinmetz']
  file_create_date: [datetime.datetime(2019, 11, 26, 13, 54, 42, 972670, tzinfo=tzoffset(None, -28800))]
  identifier: Cori_2016-12-14
  institution: University College London
  intervals: {
    spontaneous <class 'pynwb.epoch.TimeIntervals'>,
    trials <class 'pynwb.epoch.TimeIntervals'>
  }
  keywords: <HDF5 dataset "keywords": shape (7,), type "|O">
  lab: The Cara

In [3]:
# dictionary of all neurodata_type objects in the NWBFile
print(nwb_file.objects)

{'f9d101b7-2424-450b-9826-78367d4d4789': root pynwb.file.NWBFile at 0x140457060178576
Fields:
  acquisition: {
    lickPiezo <class 'pynwb.base.TimeSeries'>,
    wheel_position <class 'pynwb.base.TimeSeries'>
  }
  devices: {
    0 <class 'pynwb.device.Device'>,
    1 <class 'pynwb.device.Device'>
  }
  electrode_groups: {
    Probe1 <class 'pynwb.ecephys.ElectrodeGroup'>,
    Probe2 <class 'pynwb.ecephys.ElectrodeGroup'>
  }
  electrodes: electrodes <class 'hdmf.common.table.DynamicTable'>
  experiment_description: Large-scale Neuropixels recordings across brain regions of mice during a head-fixed visual discrimination task. 
  experimenter: ['Nick Steinmetz']
  file_create_date: [datetime.datetime(2019, 11, 26, 13, 54, 42, 972670, tzinfo=tzoffset(None, -28800))]
  identifier: Cori_2016-12-14
  institution: University College London
  intervals: {
    spontaneous <class 'pynwb.epoch.TimeIntervals'>,
    trials <class 'pynwb.epoch.TimeIntervals'>
  }
  keywords: <HDF5 dataset "keywords

In [4]:
# loop though objections dictionary 
# returns the obeject ID along with the type 
# and the object name 
for obj in nwb_file.objects.values():
    print('%s: %s "%s"' % (obj.object_id, obj.neurodata_type, obj.name))

f9d101b7-2424-450b-9826-78367d4d4789: NWBFile "root"
9534de6d-c9a2-4306-ad4c-b60b70d8ae4b: Units "units"
c4e74cad-fecd-4445-a148-d1a3f9f1336a: VectorData "spike_depths"
ea2f8bd7-0248-45cf-b164-5edca411c5eb: VectorIndex "spike_depths_index"
252aa4db-072c-4934-bc9b-b2a248be7502: VectorData "spike_amps"
14378ac2-a755-48df-b930-fb47738ee0ec: VectorIndex "spike_amps_index"
a6a600f7-0018-4411-8d36-873218e1a878: VectorData "waveform_mean"
53d491c8-4abc-4584-a0ff-f142805f5b4f: VectorData "electrode_group"
96c9443d-2a36-4229-b731-9a73dc63a57f: DynamicTableRegion "electrodes"
28e40a56-4578-45a2-8c3c-9bac8fbf6168: VectorIndex "electrodes_index"
3a9a3535-d574-4e0d-b547-f5e93b4c03c5: VectorData "spike_times"
83a61368-49c5-4a34-914b-c5661736491e: VectorIndex "spike_times_index"
654c5fad-188b-493c-896f-b4aa61bb330e: VectorData "sampling_rate"
706e6dca-7742-4da7-9b99-cd72b3806ba0: VectorData "cluster_depths"
c2da8ac8-bc8c-4209-bf5e-01762a9be25c: VectorData "phy_annotations"
06cfe852-7554-40d9-8a71-0c2

## File Hierarchy: Groups, Datasets, and Attributes

The NWB file is composed of various Groups, Datasets, and Attributes. The data/datasets and cooresponding meta-data are encapsulated within these Groups. The `fields` attribute returns a dictionary contiaining the metadata of the Groups of our nwb file. The dictionary `keys` are the various Groups within the file which we will use to access our datasets.

In [5]:
# nwb_file.fields

In [6]:
# Get the Groups for the nwb file 
nwb_fields = nwb_file.fields
print(nwb_fields.keys())

dict_keys(['acquisition', 'analysis', 'scratch', 'stimulus', 'stimulus_template', 'processing', 'devices', 'electrode_groups', 'imaging_planes', 'icephys_electrodes', 'ogen_sites', 'intervals', 'lab_meta_data', 'session_description', 'identifier', 'session_start_time', 'timestamps_reference_time', 'file_create_date', 'keywords', 'epoch_tags', 'electrodes', 'subject', 'trials', 'units', 'experiment_description', 'lab', 'institution', 'experimenter', 'related_publications'])


Each NWB file will have information on where the experiment was conducted, what lab conducted the experiment, as well as a description of the experiment. This information can be accessed using `institution`, `lab`, and `description`, attributes on our `nwb_file`, respectively. 

In [7]:
# Get Meta-Data from NWB file 
print('The experiment within this NWB file was conducted at {} in the lab of {}. The experiment is detailed as follows: {}'.format(nwb_file.institution, nwb_file.lab, nwb_file.experiment_description))

The experiment within this NWB file was conducted at University College London in the lab of The Carandini and Harris Lab. The experiment is detailed as follows: Large-scale Neuropixels recordings across brain regions of mice during a head-fixed visual discrimination task. 


We can access metadata from each group in our `nwb_file` with the following syntax: `nwb_file.group`. This is no different than executing a method and/or attribute. The `acquisition` group contains datasets of acquisition data. We can look at the look at the `description` field in the metadata to understand what each dataset in the group contains. 

In [8]:
# example showing how to return meta data from groups in nwb file 
# 'acquisition' is the first group in our file 
nwb_file.acquisition

{'lickPiezo': lickPiezo pynwb.base.TimeSeries at 0x140457108043024
 Fields:
   comments: no comments
   conversion: 1.0
   data: <HDF5 dataset "data": shape (1314000,), type "<f8">
   description: Voltage values from a thin-film piezo connected to the lick spout, so that values are proportional to deflection of the spout and licks can be detected as peaks of the signal.
   rate: 0.002000031887945625
   resolution: -1.0
   starting_time: 33.65250410481991
   starting_time_unit: seconds
   unit: V,
 'wheel_position': wheel_position pynwb.base.TimeSeries at 0x140457108043152
 Fields:
   comments: The wheel has radius 31 mm and 1440 ticks per revolution, so multiply by 2*pi*r/tpr=0.135 to convert to millimeters. Positive velocity (increasing numbers) correspond to clockwise turns (if looking at the wheel from behind the mouse), i.e. turns that are in the correct direction for stimuli presented to the left. Likewise negative velocity corresponds to right choices.
   conversion: 0.135
   dat

In this file, the acquisition group contains two different dataets, `lickPiezo` and `wheel_position`. To access the actual data array of these datasets we must first subset our dataset of interest from the group. We can then use `data[:]` to return our actual data array. 

In [9]:
# select our dataset of interest 
dataset = 'lickPiezo'
lickPiezo_ds = nwb_file.acquisition[dataset]

# return data array 
lickPiezo_data_array = wheel_pos_in.data[:20]

print(lickPiezo_data_array)

NameError: name 'wheel_pos_in' is not defined

In [12]:
# testing out each key for nwb file 
# 'units' seems to return data that was recorded 
nwb_file.processing['behavior']

behavior pynwb.base.ProcessingModule at 0x140350722931408
Fields:
  data_interfaces: {
    BehavioralEpochs <class 'pynwb.behavior.BehavioralEpochs'>,
    BehavioralEvents <class 'pynwb.behavior.BehavioralEvents'>,
    BehavioralTimeSeries <class 'pynwb.behavior.BehavioralTimeSeries'>,
    PupilTracking <class 'pynwb.behavior.PupilTracking'>
  }
  description: behavior module

The `trials` Group contains data from our experimental trials such as start/stop time, response time, feedback time, etc. You can return the trials data as a dataframe by using the `to_dataframe` method.

In [34]:
# trials table
trials = nwb_file.trials
trials_df = trials.to_dataframe()
trials_df.head()

Unnamed: 0_level_0,start_time,stop_time,included,go_cue,visual_stimulus_time,visual_stimulus_left_contrast,visual_stimulus_right_contrast,response_time,response_choice,feedback_time,feedback_type,rep_num
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
0,62.900284,67.423484,True,66.296625,65.269408,1.0,0.0,66.419612,1.0,66.456227,1,1.0
1,68.420838,73.604476,True,72.077117,71.202703,0.0,0.5,72.602206,-1.0,72.640326,1,1.0
2,74.602902,78.006757,True,76.877593,76.05238,1.0,0.5,77.001671,1.0,77.038396,1,1.0
3,79.003653,84.506778,True,81.996875,81.235263,0.0,0.0,83.502065,0.0,83.531699,1,1.0
4,85.501795,88.621336,True,87.462962,86.800952,0.5,1.0,87.617727,1.0,87.628565,-1,1.0


The `intervals` group also contains a `trials` dataset and can be used to access the experimental trials similar to what was accomplished in the cell above. 

In [18]:
# Select the group of interest 
intervals = nwb_file.intervals

# Subset the dataset from the group and assign it as a dataframe
interval_trials_df = intervals['trials'].to_dataframe()
interval_trials_df.head()

Unnamed: 0_level_0,start_time,stop_time,included,go_cue,visual_stimulus_time,visual_stimulus_left_contrast,visual_stimulus_right_contrast,response_time,response_choice,feedback_time,feedback_type,rep_num
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
0,62.900284,67.423484,True,66.296625,65.269408,1.0,0.0,66.419612,1.0,66.456227,1,1.0
1,68.420838,73.604476,True,72.077117,71.202703,0.0,0.5,72.602206,-1.0,72.640326,1,1.0
2,74.602902,78.006757,True,76.877593,76.05238,1.0,0.5,77.001671,1.0,77.038396,1,1.0
3,79.003653,84.506778,True,81.996875,81.235263,0.0,0.0,83.502065,0.0,83.531699,1,1.0
4,85.501795,88.621336,True,87.462962,86.800952,0.5,1.0,87.617727,1.0,87.628565,-1,1.0


The `description` attribute provides a short description on each column of the dataframe. 

In [22]:
print(intervals['trials']['feedback_type'].description)

Enumerated type. -1 for negative feedback (white noise burst); +1 for positive feedback (water reward delivery).


In [23]:
# test cell 
nwb_file.intervals

{'spontaneous': spontaneous pynwb.epoch.TimeIntervals at 0x140350724349648
 Fields:
   colnames: ['start_time' 'stop_time']
   columns: (
     start_time <class 'hdmf.common.table.VectorData'>,
     stop_time <class 'hdmf.common.table.VectorData'>
   )
   description: Intervals of sufficient duration when nothing else is going on (no task or stimulus presentation
   id: id <class 'hdmf.common.table.ElementIdentifiers'>,
 'trials': trials pynwb.epoch.TimeIntervals at 0x140350724348176
 Fields:
   colnames: ['start_time' 'stop_time' 'included' 'go_cue' 'visual_stimulus_time'
  'visual_stimulus_left_contrast' 'visual_stimulus_right_contrast'
  'response_time' 'response_choice' 'feedback_time' 'feedback_type'
  'rep_num']
   columns: (
     start_time <class 'hdmf.common.table.VectorData'>,
     stop_time <class 'hdmf.common.table.VectorData'>,
     included <class 'hdmf.common.table.VectorData'>,
     go_cue <class 'hdmf.common.table.VectorData'>,
     visual_stimulus_time <class 'hdmf.co