In [6]:
from hdmf.common import DynamicTable, VectorData
from hdmf.term_set import TermSet

from pynwb.resources import ExternalResources
from pynwb import NWBFile, NWBHDF5IO
from pynwb import get_type_map as tm
from pynwb.file import Subject

from datetime import datetime
from dateutil import tz
import numpy as np

# NERD and TermSet QuickStart

This tutorial focuses on getting users quickly into fold on how to use the `NERD` data structure in conjunction with the `TermSet` class. For a detailed guide that covers both various examples and the rules of `NERD`, please refer to the [NERD guide](NERD_TermSet_How_to_Guide.ipynb).

The core of `NERD` is the `ExternalResources` class, which provides a way
to organize and map user terms from their data (keys) to multiple entities
from the external resources. A typical use case for external resources is to link data stored in datasets or attributes within a `NWBFile` to ontologies and digital identifiers.


## NERD Example

In the following example, we will highlight the fact `NERD` is written separately to the `NWBFile`. This is to allow users to add metadata references to existing files. Loading in the file, we can see multiple cases where contextual metadata will be important in regards to creating and sharing FAIR data. We can map the experimenter to a digital identifier, i.e., ORCID, the electrode group location can be mapped to a brain atlas, and the `Subject` species attribute can be mapped to the NCBI Taxonomy.

In [7]:
with NWBHDF5IO("sub-Haydn_desc-train_ecephys.nwb", "r") as io:
    read_nwbfile = io.read()
read_nwbfile

It is recommended to link the instance of the `ExternalResources` class to the file in order for `ExternalResources` to correctly keep track of which files have the objects that have external references.

In [8]:
er = ExternalResources() 
read_nwbfile.link_resources(er)

  warn(_exp_warn_msg(cls))


To create the metadata linkage from the experimenter, electrode group location, and subject species to their respective external references, the user can use the `add_ref` method. The user provides:
1. file: This is an optional parameter if the `ExternalResources` instance has been linked to a file. When linked, the file will be used automatically.
2. container: This is the NWB Object that is either being linked or the object that stores the attribute being linked. In the case of "experimenter", the container is the object that stores the experimenter attriubte.
3. attribute: This is an optional field. It is set when the reference is being added to the "attribute" of a NWB object. 
4. key: This is the data defined term to represent the reference. For example, the `NWBFile` we loaded in as the value of experimenter to be "Hansem Sohn". This would be the value for key.
5. entity_id: This is the ID for the resource the user wants to use. 
6. entity_uri: This is the URI for the resource the user wants to use. 

### NWBFile Experimenter

In [9]:
er.add_ref(
    container=read_nwbfile,
    attribute="experimenter",
    key="Hansem Sohn",
    entity_id='ORCID:0000-0001-8593-7473', 
    entity_uri='https://orcid.org/0000-0001-8593-7473')

(<hdmf.common.resources.Key at 0x1247f4eb0>,
 <hdmf.common.resources.Entity at 0x12506b460>)

### ElectrodeGroup Location

In [10]:
er.add_ref(
    container=read_nwbfile.electrode_groups['electrode_group_1'],
    attribute="location",
    key="Dorsomedial frontal cortex",
    entity_id="ID", 
    entity_uri="URI", 
)

(<hdmf.common.resources.Key at 0x124edcb20>,
 <hdmf.common.resources.Entity at 0x124edcd00>)

### Subject Species

In [11]:
er.add_ref(
    container=read_nwbfile.subject,
    attribute='species',
    key='Macaca mulatta',
    entity_id='NCBI_TAXON:9544',
    entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=info&id=9544'
)

(<hdmf.common.resources.Key at 0x124edc9a0>,
 <hdmf.common.resources.Entity at 0x124edc8e0>)

We can see that the linked `ExternalResources` instance has been populated.

In [12]:
read_nwbfile.get_linked_resources()

We can visualize `ExternalResources` as a single table:

In [15]:
df=read_nwbfile.get_linked_resources().to_dataframe()
df

Unnamed: 0,file_object_id,objects_idx,object_id,files_idx,object_type,relative_path,field,keys_idx,key,entities_idx,entity_id,entity_uri
0,9c3a5c45-316c-493d-a712-03a01b662ee9,0,9c3a5c45-316c-493d-a712-03a01b662ee9,0,NWBFile,general/experimenter,,0,Hansem Sohn,0,ORCID:0000-0001-8593-7473,https://orcid.org/0000-0001-8593-7473
1,9c3a5c45-316c-493d-a712-03a01b662ee9,1,f8641805-f93c-446f-8194-5fce08d22dbb,0,ElectrodeGroup,location,,1,Dorsomedial frontal cortex,1,ID,URI
2,9c3a5c45-316c-493d-a712-03a01b662ee9,2,5ee39486-8625-4ac3-9691-ce9d724812a4,0,Subject,species,,2,Macaca mulatta,2,NCBI_TAXON:9544,https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...


As mentioned prior, `NERD` and the `NWBFile` are written separately.

In [None]:
with NWBHDF5IO("NWBfile_ER_Example.nwb", "w") as io:
    io.write(nwbfile)

In [None]:
er.to_norm_tsv(path='./')

## NERD Example with TermSet

`TermSet` allows users to create their own subset of ontological references and is built upon the resources from LinkML.

Use Cases:
1. Validation of data. Currently, validation with a `TermSet` is only supported for `Data`, but we are in the talks to expand out to, i.e., experimenters. 
2. `TermSet` streamlines the user experience for adding new references to `ExternalResources` using `add_ref_term_set`.

In order to see how to create a TermSet, refer to the [NERD guide](NERD_TermSet_How_to_Guide.ipynb) and also these relevant [LinkML resources](https://linkml.io/linkml/intro/tutorial06.html)

![title](taxon.png)

In this example, we will create a brand new `NWBFile` that stores a `DynamicTable` of species data. We create a new column, i.e a new instance of `VectorData`, that uses the optional `term_set` field. When provided a `TermSet`, the data will be validated according to that set of terms.

For more details on how we handle validation with a `TermSet` please refer to the [NERD guide](NERD_TermSet_How_to_Guide.ipynb).

In [16]:
terms = TermSet(term_schema_path='./species_term_set.yaml')

In [17]:
session_start_time = datetime(2018, 4, 25, 2, 30, 3, tzinfo=tz.gettz("US/Pacific"))

nwbfile = NWBFile(
    session_description="Mouse exploring an open field",  # required
    identifier="Mouse5_Day3",  # required
    session_start_time=session_start_time,  # required
    session_id="session_1234",  # optional
    experimenter=["Dichter, Benjamin K.", "Smith, Alex"],  # optional
    lab="My Lab Name",  # optional
    institution="University of My Institution",  # optional
    related_publications="DOI:10.1016/j.neuron.2016.12.011",  # optional
)

In [18]:
col1 = VectorData(
    name='Species_Data',
    description='species from NCBI and Ensemble',
    data=['Homo sapiens', 'Ursus arctos horribilis'],
    term_set=terms,
)

species = DynamicTable(name='species', description='My species', columns=[col1],)


The `add_ref_term_set` method streamlines the original `add_ref` method. The `key` field is removed as the data values themselves will be used as keys, the `entity_id` and `entity_uri` fields will be populated from the values in the `TermSet`. If the user linked the `NWBFile` to the `ExternalResources` instance as in the prior example, then it is further streamlined, requiring only the `container` and possibly an `attribute`.

In [19]:
er.add_ref_term_set(file=nwbfile,
                    container=species,
                    attribute='Species_Data',
                   ) 

True

In [20]:
er.to_dataframe()

Unnamed: 0,file_object_id,objects_idx,object_id,files_idx,object_type,relative_path,field,keys_idx,key,entities_idx,entity_id,entity_uri
0,9c3a5c45-316c-493d-a712-03a01b662ee9,0,9c3a5c45-316c-493d-a712-03a01b662ee9,0,NWBFile,general/experimenter,,0,Hansem Sohn,0,ORCID:0000-0001-8593-7473,https://orcid.org/0000-0001-8593-7473
1,9c3a5c45-316c-493d-a712-03a01b662ee9,1,f8641805-f93c-446f-8194-5fce08d22dbb,0,ElectrodeGroup,location,,1,Dorsomedial frontal cortex,1,ID,URI
2,9c3a5c45-316c-493d-a712-03a01b662ee9,2,5ee39486-8625-4ac3-9691-ce9d724812a4,0,Subject,species,,2,Macaca mulatta,2,NCBI_TAXON:9544,https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
3,d8b53b75-c1cd-4ebd-bc2c-8067ed91438a,3,1966081e-cf7b-4137-8f57-0319dfa31355,1,VectorData,,,3,Homo sapiens,3,NCBI_TAXON:9606,https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
4,d8b53b75-c1cd-4ebd-bc2c-8067ed91438a,3,1966081e-cf7b-4137-8f57-0319dfa31355,1,VectorData,,,4,Ursus arctos horribilis,4,NCBI_TAXON:116960,https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...
