# Getting Experimental Metadata from DANDI
It can be helpful to view general information about the experimental sessions that produced your data. Since typically each NWB File represents one session, a dandiset's files can be examined to get an overview of each of the sessions. This can vary, depending on who produced the NWB file. In this notebook, NWB Files within one of the Allen Institute's datasets are opened and some basic information is used to make a table of the experimental sessions and their properties.

### Environment Setup
⚠️**Note: If running on a new environment, run this cell once and then restart the kernel**⚠️

In [1]:
try:
    from dandi_utils import dandi_stream_open
except:
    !git clone https://github.com/AllenInstitute/openscope_databook.git
    %cd openscope_databook
    %pip install -e .

In [2]:
import os

import fsspec
import h5py
import pandas as pd

from dandi import dandiapi
from fsspec.implementations.cached import CachingFileSystem
from pynwb import NWBHDF5IO

%matplotlib inline

### Getting Dandiset Metadata
To view other data, change `dandiset_id` to be the id of the dandiset you're interested in. If the dandiset is embargoed, set `dandi_api_key` to your DANDI API key. 

In [3]:
dandiset_id = "000021"
dandi_api_key = None

In [4]:
my_dandiset = dandiapi.DandiAPIClient(token=dandi_api_key).get_dandiset(dandiset_id)
print(f"Got dandiset {my_dandiset}")

A newer version (0.55.1) of dandi/dandi-cli is available. You are using 0.46.6


Got dandiset DANDI:000021/draft


### Get NWB Info
Below are two definitions of thefunction `get_nwb_info`. These function are tailored to our NWB Files; Our *Ophys* and our *Ecephys* datasets respectively. It retrieves a series of important metadata values from the NWB file object. It is likely that the code for accessing the fields of interest to you will be slightly different for your files. This can easily altered to extract any other information from an NWB file you want as long as you're familiar with the internal layout of your files. However, make sure to change the `columns` field in the pandas dataframe below to properly reflect any changes to the function.

In [5]:
# get experimental information from within ophys file
# getattr is used because not all nwb files have all properties. If not handled like this, errors will arise
# def get_nwb_info(nwb):
#         session_time = getattr(nwb, "session_start_time", None)

#         metadata_obj = getattr(nwb, "lab_meta_data", {})
#         metadata = metadata_obj.get("metadata", None)
#         session_id = getattr(metadata, "ophys_session_id", None)
#         experiment_id = getattr(metadata, "ophys_experiment_id", None)

#         fov_height = getattr(metadata, "field_of_view_height", None)
#         fov_width = getattr(metadata, "field_of_view_width", None)
#         imaging_depth = getattr(metadata, "imaging_depth", None)
#         group = getattr(metadata, "imaging_plane_group", None)
#         group_count = getattr(metadata, "imaging_plane_group_count", None)
#         container_id = getattr(metadata, "experiment_container_id", None)
        
#         subject = getattr(nwb, "subject", None)
#         specimen_name = getattr(subject, "subject_id", None)
#         age = getattr(subject, "age", None)
#         sex = getattr(subject, "sex", None)
#         genotype = getattr(subject, "genotype", None)
        
#         try: n_rois = nwb.processing["ophys"]["dff"].roi_response_series["traces"].data.shape[1]
#         except: n_rois = None
#         try: location = list(nwb.imaging_planes.values())[0].location
#         except: location = None
        
#         intervals = getattr(nwb, "intervals", {})
#         stim_types = set(intervals.keys())
#         stim_tables = [intervals[table_name] for table_name in intervals]
#         # gets highest value among final "stop times" of all stim tables in intervals
#         session_end = max([table.stop_time[-1] for table in stim_tables if len(table) > 1])

#         return [session_time, session_id, experiment_id, container_id, group, group_count, imaging_depth, location, fov_height, fov_width, specimen_name, sex, age, genotype, stim_types, n_rois, session_end]

In [6]:
# get experimental information from within ecephys file
# getattr is used because not all nwb files have all properties. If not handled like this, errors will arise
def get_nwb_info(nwb):
        session_time = getattr(nwb, "session_start_time", None)

        subject = getattr(nwb, "subject", None)
        specimen_name = getattr(subject, "specimen_name", None)
        age = getattr(subject, "age_in_days", None)
        sex = getattr(subject, "sex", None)
        genotype = getattr(subject, "genotype", None)

        probes = set(getattr(nwb, "devices", {}).keys())
        units = getattr(nwb, "units", [])
        n_units = len(units) if hasattr(units, '__len__') else 0        
        
        intervals = getattr(nwb, "intervals", {})
        stim_types = set(intervals.keys())
        stim_tables = [intervals[table_name] for table_name in intervals]
        # gets highest value among final "stop times" of all stim tables in intervals
        session_end = max([table.stop_time[-1] for table in stim_tables if len(table) > 1])

        return [session_time, specimen_name, sex, age, genotype, probes, stim_types, n_units, session_end]

### Getting Table
Here, each relevant file in the dandiset is streamed and opened remotely to get the information of interest using the function `get_nwb_info`, defined above, and then it is added to a table of sessions and their metadata. Since some files are for specific probes rather than entire sessions, they are skipped. Opening each NWB File can take several minutes. Depending on how many files your dandiset loops through, this step can take a very long time.

In [7]:
# set up streaming filesystem
fs = fsspec.filesystem("http")

nwb_table = []
# skip files that aren't main session files
files = [asset for asset in my_dandiset.get_assets() if "probe" not in asset.path]
# swap this with line above for one of our ophys dandisets
# files = [asset for asset in my_dandiset.get_assets() if "raw" not in asset.path]
n_files = len(files)
print(f"{n_files} files retrieved")

for i, file in enumerate(files):
    print(f"Examining file {i+1}/{n_files}: {file.identifier}")    
    # get basic file metadata
    row = [file.identifier, file.size, file.path]
    
    base_url = file.client.session.head(file.base_download_url)
    file_url = base_url.headers["Location"]

    # open and read nwb file with streaming
    with fs.open(file_url, "rb") as f:
        with h5py.File(f) as file:
            with NWBHDF5IO(file=file, mode="r", load_namespaces=True) as io:
                nwb = io.read()
                # extract experimental info from within file
                row += get_nwb_info(nwb)
                nwb_table.append(row)
                del nwb

    # don't run full loop if running in test environment
    if os.environ.get("TESTING", False):
        break


32 files retrieved
Examining file 1/32: 58703c97-c0a9-4736-b684-73c85c1a444a


  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."


Examining file 2/32: 02291b99-e583-498b-9929-b68bba2c50e2
Examining file 3/32: 224b57e5-c9a3-46ef-85db-966713f3ccbe
Examining file 4/32: c7f32379-adce-4961-9211-d07790be9cab
Examining file 5/32: b4aeeb19-cdc6-4895-ab7b-bc8a688cf6f5
Examining file 6/32: 106e84a7-1c41-43f1-a675-7df17e4aba69
Examining file 7/32: eb36f94f-d6e7-45c6-aa02-7d4ed23453d3
Examining file 8/32: 5a58bf3d-a1b9-444b-8ab0-ef5478aa42a6
Examining file 9/32: 522e8054-34ca-4579-ae80-350d0b24e0f4
Examining file 10/32: 96c200cf-29c2-457a-b2f3-99f11de5b039
Examining file 11/32: 286c7b06-3cde-4261-9090-e6fbe6c81945
Examining file 12/32: 3876c5f1-f38a-4c89-8f54-6128538f0066
Examining file 13/32: 2e6df882-f31a-440b-a572-ba717a95bf80
Examining file 14/32: 488a9bea-96fb-4028-a33a-08650f50be63
Examining file 15/32: dfc3db15-066a-4a07-b615-a4d7e85c44e1
Examining file 16/32: edf10182-5a4c-454f-ad23-47987a5ca256
Examining file 17/32: be9f8fd8-8f16-4a66-acc6-9e04697650f3
Examining file 18/32: 5f204c90-2005-4143-b880-07d80aa32b20
Exami

In [8]:
# convert table to pandas dataframe
sessions = pd.DataFrame(nwb_table, columns=("identifier", "size", "path", "session time", "sub name", "sub sex", "sub age", "sub genotype", "probes", "stim types", "# units", "session length"))
# swap this with line above for one of our ophys dandisets
# sessions = pd.DataFrame(nwb_table, columns=("identifier", "size", "path", "session time", "session id", "experiment id", "container id", "group", "group count", "imaging depth", "location", "fov height", "fov width", "specimen name", "sex", "age", "genotype", "stim types", "# rois", "session end"))
sessions

Unnamed: 0,identifier,size,path,session time,sub name,sub sex,sub age,sub genotype,probes,stim types,# units,session length
0,58703c97-c0a9-4736-b684-73c85c1a444a,2856232912,sub-699733573/sub-699733573_ses-715093703.nwb,2019-01-19 00:54:18-08:00,Sst-IRES-Cre;Ai32-386129,M,118.0,Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeA, probeC, probeB, probeF, probeE, probeD}","{gabors_presentations, natural_movie_three_pre...",2779,9135.140323
1,02291b99-e583-498b-9929-b68bba2c50e2,3071442940,sub-703279277/sub-703279277_ses-719161530.nwb,2019-01-08 16:25:16-08:00,Sst-IRES-Cre;Ai32-387858,M,122.0,Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeA, probeC, probeB, probeF, probeE, probeD}","{gabors_presentations, natural_movie_three_pre...",3232,9151.497467
2,224b57e5-c9a3-46ef-85db-966713f3ccbe,1736516600,sub-707296975/sub-707296975_ses-721123822.nwb,2019-01-08 16:25:35-08:00,Pvalb-IRES-Cre;Ai32-388521,M,125.0,Pvalb-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeA, probeC, probeB, probeF, probeE, probeD}","{gabors_presentations, natural_movie_three_pre...",1603,9420.86
3,c7f32379-adce-4961-9211-d07790be9cab,2156671312,sub-716813540/sub-716813540_ses-739448407.nwb,2019-01-08 16:27:24-08:00,C57BL/6J-404551,M,112.0,wt/wt,"{probeA, probeC, probeB, probeF, probeE, probeD}","{gabors_presentations, natural_movie_three_pre...",2370,9156.67649
4,b4aeeb19-cdc6-4895-ab7b-bc8a688cf6f5,2912508032,sub-717038285/sub-717038285_ses-732592105.nwb,2019-01-08 16:26:20-08:00,C57BL/6J-404553,M,100.0,wt/wt,"{probeC, probeB, probeF, probeE, probeD}","{gabors_presentations, natural_movie_three_pre...",3076,9143.415844
5,106e84a7-1c41-43f1-a675-7df17e4aba69,2182040332,sub-718643564/sub-718643564_ses-737581020.nwb,2018-09-25 14:03:59-07:00,C57BL/6J-404568,M,108.0,wt/wt,"{probeA, probeC, probeB, probeF, probeE, probeD}","{gabors_presentations, natural_movie_three_pre...",2055,9156.65539
6,eb36f94f-d6e7-45c6-aa02-7d4ed23453d3,2646264724,sub-719817799/sub-719817799_ses-744228101.nwb,2018-09-25 11:43:40-07:00,C57BL/6J-404569,M,122.0,wt/wt,"{probeA, probeC, probeB, probeF, probeE, probeD}","{gabors_presentations, natural_movie_three_pre...",2695,9152.19317
7,5a58bf3d-a1b9-444b-8ab0-ef5478aa42a6,1859545192,sub-719828686/sub-719828686_ses-754312389.nwb,2018-09-25 11:02:57-07:00,C57BL/6J-404570,M,140.0,wt/wt,"{probeA, probeC, probeB, probeF, probeE, probeD}","{gabors_presentations, natural_movie_three_pre...",1891,9152.415139
8,522e8054-34ca-4579-ae80-350d0b24e0f4,2120570640,sub-722882751/sub-722882751_ses-743475441.nwb,2018-10-26 12:50:58-07:00,C57BL/6J-404555,M,121.0,wt/wt,"{probeA, probeC, probeB, probeF, probeE, probeD}","{gabors_presentations, natural_movie_three_pre...",2133,9152.251155
9,96c200cf-29c2-457a-b2f3-99f11de5b039,2917686364,sub-723627600/sub-723627600_ses-742951821.nwb,2018-10-26 12:47:04-07:00,C57BL/6J-404571,M,120.0,wt/wt,"{probeA, probeC, probeB, probeF, probeE, probeD}","{gabors_presentations, natural_movie_three_pre...",3008,9152.343504


In [9]:
# output all session metadata to local CSV file
sessions.to_csv("sessions.csv")

### Selecting Files
**Pandas** syntax can be used to filter the table above and select individual sessions.

In [10]:
selected_sessions = sessions[sessions["size"] <= 2_100_000_000]
# selected_sessions = sessions[sessions["sub sex"] == "F"]
# selected_sessions = sessions[sessions["# units"] > 2900]
selected_sessions

Unnamed: 0,identifier,size,path,session time,sub name,sub sex,sub age,sub genotype,probes,stim types,# units,session length
2,224b57e5-c9a3-46ef-85db-966713f3ccbe,1736516600,sub-707296975/sub-707296975_ses-721123822.nwb,2019-01-08 16:25:35-08:00,Pvalb-IRES-Cre;Ai32-388521,M,125.0,Pvalb-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeA, probeC, probeB, probeF, probeE, probeD}","{gabors_presentations, natural_movie_three_pre...",1603,9420.86
7,5a58bf3d-a1b9-444b-8ab0-ef5478aa42a6,1859545192,sub-719828686/sub-719828686_ses-754312389.nwb,2018-09-25 11:02:57-07:00,C57BL/6J-404570,M,140.0,wt/wt,"{probeA, probeC, probeB, probeF, probeE, probeD}","{gabors_presentations, natural_movie_three_pre...",1891,9152.415139
21,47634abd-db85-48f5-9c33-01887a59d3bc,1960982972,sub-739783158/sub-739783158_ses-760345702.nwb,2019-01-19 00:36:58-08:00,Pvalb-IRES-Cre;Ai32-407972,M,103.0,Pvalb-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeA, probeC, probeB, probeE, probeD}","{gabors_presentations, natural_movie_three_pre...",1784,9152.378596
22,c9fc315f-5145-45a3-8183-3ad8fe499c4d,1862261144,sub-740268983/sub-740268983_ses-759883607.nwb,2018-10-26 12:55:45-07:00,C57BL/6J-412799,M,113.0,wt/wt,"{probeC, probeB, probeE, probeA}","{gabors_presentations, natural_movie_three_pre...",1763,9152.463399
25,cea32745-0d1a-4884-b06f-158d830bcbf3,2021557700,sub-744915196/sub-744915196_ses-762602078.nwb,2018-10-26 00:58:47-07:00,Sst-IRES-Cre;Ai32-408152,M,110.0,Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeA, probeC, probeB, probeF, probeE, probeD}","{gabors_presentations, natural_movie_three_pre...",2159,9727.980945
27,17a5432b-7193-42dd-a9ae-e9e8b3fef19c,1928749868,sub-757329617/sub-757329617_ses-773418906.nwb,2018-12-10 15:16:20-08:00,Pvalb-IRES-Cre;Ai32-410315,F,124.0,Pvalb-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeA, probeC, probeB, probeF, probeE, probeD}","{gabors_presentations, natural_movie_three_pre...",1880,9157.16352


### Downloading Selected Files
To download the files, we use the same method that is explained in [Downloading an NWB File](./download_nwb.ipynb). This can be used with the paths from the selected sessions above to just download the files of interest.  Set `download_loc` to be the relative path of where the files should be downloaded. Note that if the files are large, this can take a long time.

In [11]:
download_loc = "."

In [12]:
selected_paths = set(selected_sessions.path)
selected_paths

{'sub-707296975/sub-707296975_ses-721123822.nwb',
 'sub-719828686/sub-719828686_ses-754312389.nwb',
 'sub-739783158/sub-739783158_ses-760345702.nwb',
 'sub-740268983/sub-740268983_ses-759883607.nwb',
 'sub-744915196/sub-744915196_ses-762602078.nwb',
 'sub-757329617/sub-757329617_ses-773418906.nwb'}

In [13]:
for dandi_filepath in selected_paths:
    filename = dandi_filepath.split("/")[-1]
    file = my_dandiset.get_asset_by_path(dandi_filepath)
    file.download(f"{download_loc}/{filename}")
    print(f"Downloaded file to {download_loc}/{filename}")

Downloaded file to ./sub-740268983_ses-759883607.nwb
Downloaded file to ./sub-757329617_ses-773418906.nwb
Downloaded file to ./sub-719828686_ses-754312389.nwb
Downloaded file to ./sub-739783158_ses-760345702.nwb
Downloaded file to ./sub-707296975_ses-721123822.nwb
Downloaded file to ./sub-744915196_ses-762602078.nwb
