# Chapter 2:
## This chapter demonstrates how to work with data that is NWB format and in the DataJoint pipeline.

The first part of the chapter walks through retrieving data from the file created in Chapter 1.  The second part walks through retreiving that data from the DataJoint pipelines.

Important: Before continuing, update ImportsAndTableDefinitions.py with your DataJoint credentials. 

In [1]:
#Import libraries
import numpy as np
import os
import pynwb #NWB API
import pandas as pd
import h5py #Provides methods to supplement NWB
import datajoint as dj #
import csv 
from datetime import datetime
from dateutil import tz
from pynwb import NWBHDF5IO
from nwbwidgets import nwb2widget #Displays file contents in Jupyter
from matplotlib import pylab as plt #Needed for ERD diagrams

#Import our classes and functions
from ImportsAndTableDefinitions import *

Connecting marikelreimer@tutorial-db.datajoint.io:3306


# Metadata setup

So that we can access the table definitions, they are saved in a python script called ImportsAndTableDefinitions.py residing in the root directory. This script should als be updated with your DataJoint credentials, so that you can connect to DataJoint's tutorial server. To promote standardization across a large group of people, it is advisable to share these definitions in Git/Github. Whenever code needs to be updated, having one source of record, greatly streamlines updates, especially for multiple users. This also allows the use of Git/Github's pull method to update your local definitions.

As in Chapter 1, we store variables in a Jupyter notebook.  It is better practice to store data in files, which will be covered in Chapters 3 and 4.

In [2]:
#Update these variables as needed:
subject_id = 'Mouse_5025'
session_id = 'Session_22'
experiment_path = 'C:/Users/meowm/OneDrive/DataWarehouse/Experimenter1' #Update with your path

In [3]:
#These variables don't need to be updated

#Construct path to data directory
data_directory = experiment_path + '/' + subject_id + "/" + session_id + "/"

#Uniqely identify NWB file
identifier = subject_id + "_" + session_id #This uniquely identifies a file

#Create a uniqe name ending in '.nwb'
nwbfile_name = identifier + '.nwb'

#Navigate to data directory
os.chdir(data_directory)

## Part 1:  Reading data from NWB files

To read an NWB file, we make use of the NWBHDF5IO class using 'r' to indicate that we are read-mode.  Unlike the example in the previous chapter we did not use the 'with' method.  This means that data from the file will be available in all cells of the notebook, however and we must enter a command to close it when finished.

In [4]:
#Read the file
io = NWBHDF5IO(nwbfile_name, mode='r')
nwbfile = io.read()

The NWB API lets us retrieve data from our subject table (from the 'General' group) like so:   

In [5]:
subject = nwbfile.subject
subject

subject pynwb.file.Subject at 0x1932907376328
Fields:
  date_of_birth: 2017-04-03 11:00:00-04:00
  description: Fuzzy
  genotype: B6
  sex: F
  species: Mus musculus
  subject_id: Mouse_5025
  weight: 25g

These operations are substantially streamlined compared to the equivalent operations done with visit() and visititems() from the Hdf5 library.  

The '.' used to specify a path will be familiar to Matlab users, as it is the method used to access values stored in structured arrays (structs) which are synonomous with the group organization in hdf5. Matlab files are a type of hdf5 file, which is why the syntax is the same.

As before, we can retrieve the subject ID like so:

In [6]:
subject.subject_id

'Mouse_5025'

Timeseries data works similarly, with the caveat that we must know the name of the module and timeseries before we can retrieve data.  We begin by displaying the modules in the processing group

In [7]:
#Display the modules in the Processing group
nwbfile.processing

{'behavior': behavior pynwb.base.ProcessingModule at 0x1932907376584
 Fields:
   data_interfaces: {
     Session_22 <class 'pynwb.behavior.SpatialSeries'>
   }
   description: Spatial series containing coordinates representating the location of a mouse}

To retrieve the name of the module, we use the list() function

In [8]:
module_name = list(nwbfile.processing)

#Remove brackets from module name
module_name = module_name[0]
print(module_name)

behavior


The module name gives us access to its data interfaces, which allows us to retreive the name of our timeseries

In [9]:
module = nwbfile.processing[module_name]

#This displays a detailed description of the data interfaces in the processing module
print(module.data_interfaces)

#This retrieves the name of the timeseries from the processing module
timeseries_name = list(module.data_interfaces)

#Remove brackets from timeseries name
timeseries_name = timeseries_name[0]
print(timeseries_name)

{'Session_22': Session_22 pynwb.behavior.SpatialSeries at 0x1932907376072
Fields:
  comments: Will need to correct camera jitter
  conversion: 1.0
  data: <HDF5 dataset "data": shape (17768, 2), type "<f8">
  description: The position of a mouse in an arena is transformed into X,Y coordinates
  rate: 30.0
  reference_frame: Zero refers to the bottom left corner of the rig, when viewed from above
  resolution: -1.0
  starting_time: 0.0
  starting_time_unit: seconds
  unit: meters
}
Session_22


The names of the containers are used to create a path to the data.  To reach our data we must access the Processing group, then the behavior module, then the timeseries, and ultimately the contents of the data stored in it.  

In [10]:
#Retrieve coordinate data
timeseries_data = nwbfile.processing[module_name][timeseries_name].data[:]
print(timeseries_data)

[[ 90.  72.]
 [ 91.  67.]
 [ 89.  70.]
 ...
 [316. 192.]
 [314. 194.]
 [312. 195.]]


We are now done working with the file so we close it.

In [11]:
io.close()

## Part  2 Working with Data from DataJoint

We begin by instantiating the tables we defined in Chapter 1:

In [12]:
#Before we can use them, the tables from chapter 1 must be instantiated:
mouse = Mouse()
session = Session()
position = Position()
positionStatistics = PositionStatistics()

DataJoint provides a function that retrieves datasets as dictionaries, a python structure which stores key value pairs.  This creates a dictionary of data in the Mouse table.

In [13]:
our_data = mouse.fetch(as_dict=True)
our_data

[{'subject_id': 'Mouse_5025',
  'date_of_birth': datetime.datetime(2017, 4, 3, 11, 0),
  'genotype': 'B6',
  'sex': 'F',
  'species': 'Mus musculus',
  'subject_description': 'Fuzzy',
  'weight': '25g'}]

In order to retrieve a complete dataset, we must use the join operator * to connect our tables and store them in a variable called dataset.

In [14]:
dataset = mouse * session * position * positionStatistics

DataJoint's fetch() method returns each dataset as a dictionary, so this command can be used to programmatically access data from single or multiple sessions.

In [15]:
our_data = dataset.fetch(as_dict=True)
our_data

[{'subject_id': 'Mouse_5025',
  'name': 'Session_22',
  'date_of_birth': datetime.datetime(2017, 4, 3, 11, 0),
  'genotype': 'B6',
  'sex': 'F',
  'species': 'Mus musculus',
  'subject_description': 'Fuzzy',
  'weight': '25g',
  'comments': 'Will need to correct camera jitter',
  'description': 'The position of a mouse in an arena is transformed into X,Y coordinates',
  'experiment_description': 'Demonstrate creating a single NWB file and loading a DataJoint pipeline from an experimental session',
  'experimenter': 'Experimenter 1',
  'identifier': 'Mouse_5025_Session_22',
  'institution': 'Yale University',
  'lab': 'Tan lab',
  'rate': 30.0,
  'raw_data_link': 'C:/Users/meowm/OneDrive/DataWarehouse/Experimenter1/Mouse_5025/Session_22/raw_video.avi',
  'reference_frame': 'Zero refers to the bottom left corner of the rig, when viewed from above',
  'session_description': 'Determine if mouse has a place preference',
  'session_start_time': datetime.datetime(2020, 7, 7, 15, 36, 13),
  's

In the next chapter, our use case addresses creating NWB files and a DataJoint dataset from existing data.  It 
demonstrates bulk loading of data from diverse raw data sources in a standard directory structure.  

The row of data in our pipeline will cause an error because it we would create a duplicate entry in the database.  To address this, we clear the pipeline with mouse.drop().

In [16]:
mouse.drop()

`marikelreimer_tutorial`.`mouse` (1 tuples)
`marikelreimer_tutorial`.`session` (1 tuples)
`marikelreimer_tutorial`.`_position` (1 tuples)
`marikelreimer_tutorial`.`__position_statistics` (1 tuples)
Proceed? [yes, No]: yes
Tables dropped.  Restart kernel.


Notice that DataJoint removed all dependent tables, preventing the possiblity of orphaned data.