# Cosyne 2019 NWB:N Tutorial - Extracellular Electrophysiology

## Introduction
In this tutorial, we will create fake data for a hypothetical extracellular electrophysiology experiment with a freely moving animal. The types of data we will convert are:
- Animal position
- LFP
- Spike times
- Trials
- Subject (species, strain, age, etc.) 

## Installing PyNWB
If you are in the tutorial using DANDI Hub, PyNWB is already installed. 
If participating from your own machine, install PyNWB using pip or conda:
- `pip install pynwb`
- `conda install -c conda-forge pynwb`

## Set up the NWB file
An NWB file represents a single session of an experiment. Each file must have a session description, identifier, and session start time. Create a new `NWBFile` object with those and additional metadata. For all PyNWB constructors, we recommend using keyword arguments.

In [1]:
from pynwb import NWBFile
from datetime import datetime
from dateutil import tz

start_time = datetime(2018, 4, 25, 2, 30, 3, tzinfo=tz.gettz('US/Pacific'))

nwb = NWBFile(session_description='Mouse exploring an open field',
              identifier='Mouse5_Day3',
              session_start_time=start_time,
              session_id='session_1234',                                # optional
              experimenter='My Name',                                   # optional
              lab='My Lab Name',                                        # optional
              institution='University of My Institution',               # optional
              related_publications='DOI:10.1016/j.neuron.2016.12.011')  # optional
nwb

root pynwb.file.NWBFile at 0x1827505564808
Fields:
  experimenter: ['My Name']
  file_create_date: [datetime.datetime(2020, 2, 24, 19, 1, 8, 460192, tzinfo=tzlocal())]
  identifier: Mouse5_Day3
  institution: University of My Institution
  lab: My Lab Name
  related_publications: ['DOI:10.1016/j.neuron.2016.12.011']
  session_description: Mouse exploring an open field
  session_id: session_1234
  session_start_time: 2018-04-25 02:30:03-07:00
  timestamps_reference_time: 2018-04-25 02:30:03-07:00

## Subject information
Create a `Subject` object to store information about the experimental subject, such as age, species, genotype, sex, and a freeform description. And set `nwb.subject` to the `Subject` object.


<img src="images/subject_diagram.png" width="800">

In [2]:
from pynwb.file import Subject

nwb.subject = Subject(age='9 months', 
                      description='mouse 5',
                      species='Mus musculus', 
                      sex='M')

## Position
The `Position` object is a special type of object called a `MultiContainerInterface`. It holds one or more `SpatialSeries` objects. 

<img src="images/spatial_series.png" width="800">

## SpatialSeries
`SpatialSeries` is a subclass of `TimeSeries`. `TimeSeries` is a common base class for measurements sampled over time, and provides fields for data and time (regularly or irregularly sampled).

<img src="images/position.png" width="800">

Create a `SpatialSeries` object named `'position'` with some fake data.

First, create a new `SpatialSeries` object with some arbitrary data:

In [3]:
import numpy as np
from pynwb.behavior import SpatialSeries, Position

position_data = np.array([np.linspace(0, 10, 100),
                          np.linspace(1, 8, 100)]).T
spatial_series_object = SpatialSeries(
    name='position', 
    data=position_data,
    reference_frame='unknown',
    timestamps=np.linspace(0, 100) / 200)

Then, create a `Position` object which contains the `SpatialSeries` object you created above.

In [4]:
pos_obj = Position(spatial_series=spatial_series_object)

Finally, create a new processing module named `'behavior'` in the NWB file and add the `Position` object to the processing module.

In [5]:
behavior_module = nwb.create_processing_module(
    name='behavior',
    description='processed behavioral data')

behavior_module.add(pos_obj)

Position pynwb.behavior.Position at 0x1825642697992
Fields:
  spatial_series: {
    position <class 'pynwb.behavior.SpatialSeries'>
  }

## Write to file

In [6]:
from pynwb import NWBHDF5IO

with NWBHDF5IO('ecephys_tutorial.nwb', 'w') as io:
    io.write(nwb)

## Trials

<img src="images/trials.png" width="500"> <img src="images/trials_example.png" width="500">

`DynamicTable` objects are used to store tabular metadata throughout NWB, including electrodes and sorted units. They offer flexibility for tabular data by allowing required columns, optional columns, and custom columns.
Trials are stored in a `TimeInterval` object which subclasses `DynamicTable`. Here, we are adding a column named `'correct'`, which will be a boolean array.

In [7]:
nwb.add_trial_column(name='correct', description='whether the trial was correct')
nwb.add_trial(start_time=1.0, stop_time=5.0, correct=True)
nwb.add_trial(start_time=6.0, stop_time=10.0, correct=False)

## Electrodes table
Extracellular electrodes are stored in a `electrodes`, which is a `DynamicTable`. `electrodes` has several required fields: x, y, z, impedence, location, filtering, and electrode_group. Here, we also demonstate how to add optional columns to a table by adding the `'label'` column.<img src="images/electrodes_table.png" width="300">

In [8]:
nwb.add_electrode_column('label', 'label of electrode')
shank_channels = [4, 3]

electrode_counter = 0
device = nwb.create_device('implant')
for shankn, nelecs in enumerate(shank_channels):
    electrode_group = nwb.create_electrode_group(
       name='shank{}'.format(shankn),
       description='electrode group for shank {}'.format(shankn),
       device=device,
       location='brain area')
    for ielec in range(nelecs):
        nwb.add_electrode(
           x=5.3, y=1.5, z=8.5, imp=np.nan,
           location='unknown', filtering='unknown',
           group=electrode_group,
           label='shank{}elec{}'.format(shankn, ielec))
        electrode_counter += 1

all_table_region = nwb.create_electrode_table_region(
  list(range(electrode_counter)), 'all electrodes')

## LFP
`LFP` is another `MultiContainerInterface`. It holds one or more `ElectricalSeries` objects, which are `TimeSeries`. Here, we put an `ElectricalSeries` named `'lfp'` in an `LFP` object, in a `ProcessingModule` named `'ecephys'`.
<img src="images/lfp.png" width="800">

In [9]:
from pynwb.ecephys import ElectricalSeries, LFP
lfp_data = np.random.randn(100, 7)
ecephys_module = nwb.create_processing_module(
    name='ecephys',
    description='extracellular electrophysiology data')
ecephys_module.add_data_interface(
LFP(ElectricalSeries('lfp', lfp_data, all_table_region, 
rate=1000., resolution=.001, conversion=1.)))

## Spike Times
Spike times are stored in another `DynamicTable` of subtype `Units`. The main `Units` table is at `/units` in the HDF5 file. You can add columns to the `Units` table just like you did for `electrodes`.

In [10]:
for shankn, channels in enumerate(shank_channels):
    for n_units_per_shank in range(np.random.poisson(lam=5)):
        n_spikes = np.random.poisson(lam=10)
        spike_times = np.abs(np.random.randn(n_spikes))
        nwb.add_unit(spike_times=spike_times)

## Write and read
Data arrays are read passively from the file. That means `TimeSeries.data` does not read the entire data object, but presents an h5py object that can be indexed to read data. Index this array just like a numpy array to read only a specific section of the array, or use the `[:]` operator to read the entire thing.

In [12]:
from pynwb import NWBHDF5IO

with NWBHDF5IO('ecephys_tutorial.nwb', 'w') as io:
    io.write(nwb)

with NWBHDF5IO('ecephys_tutorial.nwb', 'r') as io:
    nwb2 = io.read()

    print(nwb2.modules['ecephys']['LFP'].electrical_series['lfp'].data[:])

[[-3.17687559e-01  1.06933885e+00 -4.08771453e-01 -1.06724034e-01
   5.87710158e-01  2.25953747e+00 -5.04159854e-01]
 [-1.95261753e-01  1.56374651e+00  7.90528097e-02  9.93210229e-01
  -6.04640965e-01 -1.12765390e+00 -1.67279966e+00]
 [-1.84317950e-01  8.65828644e-01  2.24389884e-01 -3.63046239e-01
  -1.49122565e-01 -1.92375357e-01 -1.32347669e-01]
 [-8.44949610e-01  1.81481397e-01 -9.16683279e-01  7.07325154e-02
  -1.47630223e+00  9.19206264e-01  1.04187081e+00]
 [-3.54100562e-02  2.80254289e-01  7.01632385e-01  2.10090599e+00
   1.72031089e+00 -3.39981709e-01  7.96286622e-01]
 [ 8.57053292e-01  2.22558113e+00 -5.51899760e-01 -1.66995000e-01
  -4.65639339e-01  1.01620188e+00  1.42636820e+00]
 [ 3.03997163e-01  1.05783759e+00 -6.25460133e-01  4.32042099e-01
  -2.31971252e+00 -1.22470397e+00  1.12757601e+00]
 [-1.48534248e+00 -9.81686823e-02  9.74332776e-01 -4.52162590e-01
  -1.86399852e-01 -8.68657061e-01  1.02129340e-01]
 [ 1.19039340e+00  1.15019647e+00  1.01500627e+00 -3.89435139e-0

## Accessing data regions
You can easily read subsections of datasets

In [13]:
io = NWBHDF5IO('ecephys_tutorial.nwb', 'r')
nwb2 = io.read()

print('section of lfp:')
print(nwb2.modules['ecephys']['LFP'].electrical_series['lfp'].data[:10,:5])
print('')
print('')
print('spike times from first unit:')
print(nwb2.units['spike_times'][0])
io.close()

section of lfp:
[[-0.31768756  1.06933885 -0.40877145 -0.10672403  0.58771016]
 [-0.19526175  1.56374651  0.07905281  0.99321023 -0.60464097]
 [-0.18431795  0.86582864  0.22438988 -0.36304624 -0.14912257]
 [-0.84494961  0.1814814  -0.91668328  0.07073252 -1.47630223]
 [-0.03541006  0.28025429  0.70163238  2.10090599  1.72031089]
 [ 0.85705329  2.22558113 -0.55189976 -0.166995   -0.46563934]
 [ 0.30399716  1.05783759 -0.62546013  0.4320421  -2.31971252]
 [-1.48534248 -0.09816868  0.97433278 -0.45216259 -0.18639985]
 [ 1.1903934   1.15019647  1.01500627 -0.38943514  2.16404781]
 [ 1.43304683  0.8129997   0.65595608  0.94499753 -0.08953424]]


spike times from first unit:
[1.70096116 0.84163096 1.41842164 1.79395251 0.29065762 0.6197467
 0.44287034 0.08426759 0.05116897 0.77593199 2.07200258 0.83680109
 0.689722  ]
