# Getting Experimental Metadata from DANDI
It can be helpful to view general information about the experimental sessions that produced your data. Since typically, each NWB File represents one session, a Dandiset's files can be examined to get an overview of each of the sessions. This can vary, depending on who produced the NWB file. In this notebook, each relevant NWB File within the Allen Institute's `project name` dataset, are opened and some basic information is used to make a table of the experimental sessions and their properties.

### Environment Setup

In [1]:
import os

import fsspec
import h5py
import pandas as pd

from dandi import dandiapi
from fsspec.implementations.cached import CachingFileSystem
from pynwb import NWBHDF5IO

%matplotlib inline

### Getting Dandiset Metadata
To view other data, change `dandiset_id` to be the id of the dandiset. If the dandiset is embargoed, have `authenticate` set to True, and `dandi_api_key` to your DANDI API key. 

In [2]:
dandiset_id = "000248"
authenticate = True
dandi_api_key = os.environ['DANDI_API_KEY']

In [3]:
if authenticate:
    my_dandiset = dandiapi.DandiAPIClient(token=dandi_api_key).get_dandiset(dandiset_id)
else:
    my_dandiset = dandiapi.DandiAPIClient().get_dandiset(dandiset_id)
print(f"Got dandiset {my_dandiset}")

A newer version (0.48.1) of dandi/dandi-cli is available. You are using 0.46.3


Got dandiset DANDI:000248/draft


### Get NWB Info
This function is tailored to our NWB Files. It is likely that the code for accessing the fields of interest to you will be slightly different for your files. This can easily altered to extract any other information from an NWB file you want. However, make sure to change the `column` in the Pandas dataframe below to properly reflect this.

In [4]:
# get experimental information from within nwb file
def get_nwb_info(nwb):
        session_time = nwb.session_start_time
        sub = nwb.subject
        probes = set(nwb.devices.keys())
        n_units = len(nwb.units)
        stim_types = set(nwb.intervals.keys())
        stim_tables = [nwb.intervals[table_name] for table_name in nwb.intervals]
        # gets highest value among final "stop times" of all stim tables in intervals
        session_end = max([table.stop_time[-1] for table in stim_tables if len(table) > 1])

        return [session_time, sub.specimen_name, sub.sex, sub.age_in_days, sub.genotype, probes, stim_types, n_units, session_end]

### Getting Table
Here, each relevant file in the Dandiset is streamed and opened remotely to get the information of interest using the function `get_nwb_info`, defined above, and then it is added to a table of sessions and their metadata. Since some files are for specific probes rather than entire sessions, they are skipped. Opening each NWB File can take several minutes. Depending on how many files your Dandiset loops through, this step can take a very long time.

In [5]:
# set up streaming filesystem
fs = fsspec.filesystem("http")

nwb_table = []
for file in my_dandiset.get_assets():
    # skip files that aren't main session files
    if "probe" in file.path:
        continue

    print(f"Examining file {file.identifier}")    
    # get basic file metadata
    row = [file.identifier, file.size, file.path]
    
    base_url = file.client.session.head(file.base_download_url)
    file_url = base_url.headers['Location']

    # open and read nwb file with streaming
    with fs.open(file_url, "rb") as f:
        with h5py.File(f) as file:
            with NWBHDF5IO(file=file, mode='r', load_namespaces=True) as io:
                nwb = io.read()
                # extract experimental info from within file
                row += get_nwb_info(nwb)
                nwb_table.append(row)

Examining file 67ff2b14-6f23-40f2-b811-57003aeea8e3


  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."


Examining file dbc426a0-aafa-460b-a25a-a86bb31b9ddc
Examining file 181b7651-5f5c-491b-be70-e5d0354439d4
Examining file 85bfd56c-f104-4c83-937c-be0d58fce48e
Examining file c5e97840-4988-4da8-9f57-a24fb0a4a865
Examining file 3de250b0-2fc6-40eb-ae51-2395d0062819
Examining file 8c064e94-a858-4fad-a15a-5047d303e3f9
Examining file 46a94a32-c5de-44ae-a2a5-2a38958ef0bf
Examining file ec8dabd7-f925-48ba-9dbe-ab67dd6ba83f
Examining file a2bd39ed-3f98-4f48-b34c-394db4ce15c3
Examining file a7ff352c-0b00-47d6-a49f-97027d18264e
Examining file a8bc8aaf-ccba-4c27-bb5c-f1bc3c232c84
Examining file 4e618045-9c11-48a0-9134-95e2f01b71dd


In [6]:
# convert table to pandas dataframe
sessions = pd.DataFrame(nwb_table, columns=("identifier", "size", "path", "session time", "sub name", "sub sex", "sub age", "sub genotype", "probes", "stim types", "# units", "session length"))
sessions

Unnamed: 0,identifier,size,path,session time,sub name,sub sex,sub age,sub genotype,probes,stim types,# units,session length
0,67ff2b14-6f23-40f2-b811-57003aeea8e3,2242666496,sub_1175512783/sub_1175512783sess_1187930705/s...,2022-06-29 00:00:00-07:00,619296,M,154.0,Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeE, probeB, probeD, probeA, probeC, probeF}","{ICwcfg0_presentations, ICkcfg0_presentations,...",1918,7278.15799
1,dbc426a0-aafa-460b-a25a-a86bb31b9ddc,2242666496,sub_1175512783/sub_1175512783sess_1187930705/s...,2022-06-29 00:00:00-07:00,619296,M,154.0,Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeE, probeB, probeD, probeA, probeC, probeF}","{ICwcfg0_presentations, ICkcfg0_presentations,...",1918,7278.15799
2,181b7651-5f5c-491b-be70-e5d0354439d4,2803525629,sub_1172968426/sub_1172968426sess_1182865981/s...,2022-06-08 00:00:00-07:00,625545,M,89.0,Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeE, probeB, probeD, probeA, probeC, probeF}","{ICwcfg0_presentations, ICkcfg0_presentations,...",2793,7279.234305
3,85bfd56c-f104-4c83-937c-be0d58fce48e,2372313526,sub_1172969394/sub_1172969394sess_1183070926/s...,2022-06-09 00:00:00-07:00,625555,F,90.0,Pvalb-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeE, probeB, probeD, probeA, probeC, probeF}","{ICwcfg0_presentations, ICkcfg0_presentations,...",2621,7278.592876
4,c5e97840-4988-4da8-9f57-a24fb0a4a865,2466318464,sub_1181585608/sub_1181585608sess_1194644312/s...,2022-07-27 00:00:00-07:00,630507,F,99.0,Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeE, probeB, probeD, probeA, probeC, probeF}","{ICwcfg0_presentations, ICkcfg0_presentations,...",2464,7278.96487
5,3de250b0-2fc6-40eb-ae51-2395d0062819,2451127136,sub_1186544726/sub_1186544726sess_1196157974/s...,2022-08-03 00:00:00-07:00,631510,F,99.0,Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeE, probeB, probeD, probeA, probeC, probeF}","{ICwcfg0_presentations, ICkcfg0_presentations,...",2386,7339.232324
6,8c064e94-a858-4fad-a15a-5047d303e3f9,2653386200,sub_1194090570/sub_1194090570sess_1208667752/s...,2022-09-08 00:00:00-07:00,637484,M,92.0,Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeE, probeB, probeD, probeA, probeC, probeF}","{ICwcfg0_presentations, ICkcfg0_presentations,...",2373,7349.27246
7,46a94a32-c5de-44ae-a2a5-2a38958ef0bf,2491394276,sub_1177693342/sub_1177693342sess_1189887297/s...,2022-07-06 00:00:00-07:00,620334,M,154.0,Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeE, probeB, probeD, probeA, probeC, probeF}","{ICwcfg0_presentations, ICkcfg0_presentations,...",2092,7279.915735
8,ec8dabd7-f925-48ba-9dbe-ab67dd6ba83f,2483160990,sub_1182593231/sub_1182593231sess_1192952695/s...,2022-07-20 00:00:00-07:00,630506,F,92.0,Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeE, probeB, probeD, probeA, probeC, probeF}","{ICwcfg0_presentations, ICkcfg0_presentations,...",2517,7279.167735
9,a2bd39ed-3f98-4f48-b34c-394db4ce15c3,3393216313,sub_1171903433/sub_1171903433sess_1181330601/s...,2022-06-01 00:00:00-07:00,625554,M,82.0,Pvalb-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeE, probeB, probeD, probeA, probeC, probeF}","{ICwcfg0_presentations, ICkcfg0_presentations,...",2930,7315.456085


### Selecting Files
Pandas syntax can be used to filter the table above and select individual sessions.

In [7]:
# selected_sessions = sessions[sessions["size"] >= 2_500_000_000]
# selected_sessions = sessions[sessions["sub sex"] == "F"]
selected_sessions = sessions[sessions["# units"] > 2900]
selected_sessions

Unnamed: 0,identifier,size,path,session time,sub name,sub sex,sub age,sub genotype,probes,stim types,# units,session length
9,a2bd39ed-3f98-4f48-b34c-394db4ce15c3,3393216313,sub_1171903433/sub_1171903433sess_1181330601/s...,2022-06-01 00:00:00-07:00,625554,M,82.0,Pvalb-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeE, probeB, probeD, probeA, probeC, probeF}","{ICwcfg0_presentations, ICkcfg0_presentations,...",2930,7315.456085
11,a8bc8aaf-ccba-4c27-bb5c-f1bc3c232c84,3393216313,sub_1174569641/sub_1174569641sess_1184671550/s...,2022-06-01 00:00:00-07:00,625554,M,82.0,Pvalb-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt,"{probeE, probeB, probeD, probeA, probeC, probeF}","{ICwcfg0_presentations, ICkcfg0_presentations,...",2930,7315.456085


### Downloading Selected Files
To download the files, we use the same method that is explained in [Downloading an NWB File](./download_nwb.ipynb). This can be used with the paths from the selected sessions above to just download the files of interest. Note that if the files are large, this can take a long time.

In [8]:
download_loc = "."

In [9]:
selected_paths = set(selected_sessions.path)
selected_paths

{'sub_1171903433/sub_1171903433sess_1181330601/sub_1171903433+sess_1181330601_ecephys.nwb',
 'sub_1174569641/sub_1174569641sess_1184671550/sub_1174569641+sess_1184671550_ecephys.nwb'}

In [10]:
for dandi_filepath in selected_paths:
    filename = dandi_filepath.split("/")[-1]
    file = my_dandiset.get_asset_by_path(dandi_filepath)
    file.download(f"{download_loc}/{filename}")
    print(f"Downloaded file to {download_loc}/{filename}")

Downloaded file to ./sub_1171903433+sess_1181330601_ecephys.nwb
Downloaded file to ./sub_1174569641+sess_1184671550_ecephys.nwb
