# Simple data extraction script (NWB data only)

This notebook goes through the steps for loading data from all **tracked ROIs** across sessions 1 to 3.

In [1]:
import glob
import os
from pathlib import Path

import h5py
import numpy as np
import pandas as pd

from util import gen_util
gen_util.extend_sys_path(Path("").resolve(), parents=1)
from sess_util import sess_gen_util, sess_trace_util

### Data directory 

The data directory should contain the session data in **NWB format** (at any depth).

### Running in a docker, or specifically in Binder
If the notebook is **running in a docker**, the dataset is downloaded in NWB format from the [Dandi archive](https://gui.dandiarchive.org/#/dandiset/000037) first, and the data directory is set accordingly.  

If the notebook is **not running in a docker**, the dataset should be downloaded manually.

In [2]:
# check where the notebook is running
running_in_docker = False
running_in_binder = False
if "jovyan" in str(Path().resolve()):
    # 'jovyan' is the name of the home directory on dockers
    running_in_docker = True
    if "binderhub" in os.getenv("JUPYTER_IMAGE"):
        running_in_binder = True

In [3]:
if running_in_docker:
    datadir_nwb = datadir = Path("..", "data")
    
    # download data needed for this notebook
    %run ../sess_util/sess_download_util --output $datadir_nwb --mouse_df ../mouse_df.csv

else:
    datadir_nwb = Path("..", "..", "data", "OSCA_NWB") # can be identical to datadir
    print(
        "Be sure to download the dataset, if needed, and update `datadir_nwb` to point the "
        f"correct location(s).\nCurrently it points to {datadir_nwb}."
    )

Identifying the URLs of dandi assets to download...
Downloading 33 assets from dandiset 000037...
PATH                                              SIZE     DONE            DONE% CHECKSUM STATUS          MESSAGE
sub-408021_ses-20180926T172917_behavior+ophys.nwb 238.0 MB 238.0 MB         100%    ok    done                   
Summary:                                          238.0 MB 238.0 MB                       1 done                 
                                                           100.00%                                               
PATH                                              SIZE     DONE            DONE% CHECKSUM STATUS          MESSAGE
sub-408021_ses-20180927T182632_behavior+ophys.nwb 209.9 MB 209.9 MB         100%    ok    done                   
Summary:                                          209.9 MB 209.9 MB                       1 done                 
                                                           100.00%                                      

### Identifying mice and sessions of interest

The following code identifies the mice for which to load the data (i.e., only the mice with tracked ROI data.)

The type of **imaging plane** to include can be specified: somatic only ("soma"), dendritic only ("dend") or both ("all").

In [4]:
PLANES = "all"

In [5]:
mouse_df_path = Path("..", "mouse_df.csv")
sess_ns = [1, 2, 3] # tracked sessions 
mouse_ns = sess_gen_util.get_sess_vals(
    mouse_df=mouse_df_path, 
    returnlab="mouse_n", 
    sess_n=sess_ns,
    plane=PLANES, 
    runtype="prod" # tracking was not done for pilot data
)

sess_str = ", ".join([str(n) for n in sess_ns])
mouse_str = ", ".join([str(n) for n in mouse_ns])
print(f"Will load data from sessions {sess_str} for mice {mouse_str}.")

Will load data from sessions 1, 2, 3 for mice 1, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13.


### Extracting tracked ROI data 
The following code **extracts the tracked ROI data (sorted into tracking order)** for the selected mice and sessions.  
The Session and Stim objects are not used, to avoid extraneous steps (loading of stimulus information, calculation of ROI scaling factors).

The extracted data can optionally be **written to an HDF5 file**. 

**Note:** The imaging data is recorded at ~31 frames per second, and each recording session last ~70 min.

In [6]:
SAVE_TO_H5 = True

In [7]:
save_str = ""
if SAVE_TO_H5:
    save_path = Path(datadir_nwb, "tracked_roi_traces.h5")
    save_str = f" and saving to {save_path}"
    if save_path.is_file():
        raise OSError(f"{save_path} already exists. Delete or choose a different file name.")

In [10]:
print(f"Loading tracked ROI traces{save_str} for...")
for mouse_n in mouse_ns:    
    # identify the line and plane in which recordings were made for this mouse
    lines, planes = sess_gen_util.get_sess_vals(
            mouse_df_path, ["line", "plane"], 
            mouse_n=mouse_n, sess_n=1, runtype="prod", unique=False
        )
    line = lines[0]
    plane = planes[0]
    print(f"    Mouse {mouse_n} ({line} {plane}):")
    
    for sess_n in sess_ns:
        # identify dandi ID, layer and plane
        dandi_id = sess_gen_util.get_sess_vals(
            mouse_df_path, ["dandi_session_id"], 
            mouse_n=mouse_n, sess_n=sess_n, runtype="prod"
        )[0]
            
        # identify nwb file
        nwb_files = glob.glob(str(Path(datadir_nwb, "**", f"*{dandi_id}*.nwb")), recursive=True)
        if len(nwb_files) == 0:
            raise OSError(f"No data found for Dandi set {dandi_id} under {datadir_nwb}.")
        
        # load tracked ROIs (sorted into tracking order)
        tracked_rois = sess_trace_util.get_tracked_rois_nwb(nwb_files)
        roi_traces = sess_trace_util.load_roi_traces_nwb(nwb_files, roi_ns=tracked_rois)
        print(f"\tSess {sess_n} ({len(roi_traces)} ROIs x {roi_traces.shape[1]} frames)")
        
        # optionally save traces, along with plane and line information.
        if SAVE_TO_H5:
            with h5py.File(save_path, "a") as f:
                subgroup = f"data/M{mouse_n}_S{sess_n}"
                f.create_dataset(f"{subgroup}/tracked_traces", data=roi_traces)
                f.create_dataset(f"{subgroup}/plane", data=plane, dtype=h5py.string_dtype())
                f.create_dataset(f"{subgroup}/line", data=line, dtype=h5py.string_dtype())

Loading tracked ROI traces and saving to ../data/tracked_roi_traces.h5 for...
    Mouse 1 (L23-Cux2 soma):
	Sess 1 (59 ROIs x 126741 frames)
	Sess 2 (59 ROIs x 126744 frames)
	Sess 3 (59 ROIs x 126730 frames)
    Mouse 3 (L23-Cux2 soma):
	Sess 1 (55 ROIs x 126743 frames)
	Sess 2 (55 ROIs x 126797 frames)
	Sess 3 (55 ROIs x 126748 frames)
    Mouse 4 (L5-Rbp4 soma):
	Sess 1 (47 ROIs x 126741 frames)
	Sess 2 (47 ROIs x 126734 frames)
	Sess 3 (47 ROIs x 126729 frames)
    Mouse 6 (L23-Cux2 dend):
	Sess 1 (136 ROIs x 126741 frames)
	Sess 2 (136 ROIs x 126739 frames)
	Sess 3 (136 ROIs x 126738 frames)
    Mouse 7 (L5-Rbp4 soma):
	Sess 1 (12 ROIs x 126727 frames)
	Sess 2 (12 ROIs x 126733 frames)
	Sess 3 (12 ROIs x 126741 frames)
    Mouse 8 (L5-Rbp4 dend):
	Sess 1 (51 ROIs x 126737 frames)
	Sess 2 (51 ROIs x 126740 frames)
	Sess 3 (51 ROIs x 126744 frames)
    Mouse 9 (L23-Cux2 dend):
	Sess 1 (118 ROIs x 127074 frames)
	Sess 2 (118 ROIs x 126728 frames)
	Sess 3 (118 ROIs x 126734 frames)
  