In [43]:
from hdmf.common import DynamicTable, VectorData
from hdmf.term_set import TermSet

from pynwb.resources import ExternalResources
from pynwb import NWBFile, NWBHDF5IO
from pynwb import get_type_map as tm
from pynwb.file import Subject

from datetime import datetime
from dateutil import tz
import numpy as np

## Dev Days Note:

To run this notebook please download the nwb files under the "DynamicTermset and ExternalResources" project in the "Materials" subsection marked "NWB Files for Tutorials".

# NERD and TermSet QuickStart

The NWB External Resources Data (NERD) data structure supports annotation of NWB data files by linking terms used in the data to external resources, such as ontologies, brain atlases, and persistent digital identifiers. NERD files are external to NWB files, enabling annotation of both new and existing data without requiring modification of existing data. 

This tutorial focuses on getting users quickly into the fold on how to use the `NERD` data structure in conjunction with the `TermSet` class. For a detailed guide that covers even more examples, please refer to the [NERD guide](NERD_TermSet_How_to_Guide.ipynb).



![er_img.png](er_img.png)

## NERD Example

In the following example, we will highlight the fact `NERD` is written separately to the `NWBFile`. This is to allow users to add metadata references to existing files. Loading in the file, we can see multiple cases where contextual metadata will be important in regards to creating and sharing FAIR data. We can map the experimenter to a digital identifier (e.g., ORCID), the electrode group location can be mapped to a brain atlas, and the `Subject` species attribute can be mapped to the NCBI Taxonomy.

Checkout the following links to explore [ExternalResources](https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/common/resources.py) and [NWBFile](https://github.com/NeurodataWithoutBorders/pynwb/blob/dev/src/pynwb/file.py).

In [18]:
with NWBHDF5IO("sub-Haydn_desc-train_ecephys.nwb", "r") as io:
    read_nwbfile = io.read()
read_nwbfile

First we are going to link the ExternalResources class to the file we want to annotate to ensure we can track correctly the location of all the data objects that contain terms we want to describe via external references. This can also be accomplished by setting the `file` field as we will see later on.

In [35]:
er = ExternalResources() 
read_nwbfile.link_resources(er)

  warn(_exp_warn_msg(cls))


To create the metadata linkage from the experimenter, electrode group location, and subject species to their respective external references, the user can use the `add_ref` method from `ExternalResources`. The user provides:
1. `file`: This is an optional parameter if the `ExternalResources` instance has been linked to a file. When linked, the file will be used automatically.
2. `container`: This is the NWB Object that is either being linked or the object that stores the attribute being linked. In the case of "experimenter", the container is the object that stores the experimenter attriubte.
3. `attribute`: This is an optional field. It is set when the reference is being added for the "attribute" of a NWB object. 
4. `key`: This is the data defined term to represent the reference. For example, the `NWBFile` we loaded in as the value of experimenter to be "Hansem Sohn", which would be the value for key.
5. `entity_id`: This is the ID for the resource the user wants to use. 
6. `entity_uri`: This is the URI for the resource the user wants to use. 

### NWBFile Experimenter

In [36]:
er.add_ref(
    container=read_nwbfile,
    attribute="experimenter",
    key="Hansem Sohn",
    entity_id='ORCID:0000-0001-8593-7473', 
    entity_uri='https://orcid.org/0000-0001-8593-7473')

(<hdmf.common.resources.Key at 0x123a48160>,
 <hdmf.common.resources.Entity at 0x123a49b70>)

### ElectrodeGroup Location

In [37]:
er.add_ref(
    container=read_nwbfile.electrode_groups['electrode_group_1'],
    attribute="location",
    key="Dorsomedial frontal cortex",
    entity_id="DB09", 
    entity_uri="https://scalablebrainatlas.incf.org/macaque/DB09")

(<hdmf.common.resources.Key at 0x123a48880>,
 <hdmf.common.resources.Entity at 0x123a48a30>)

### Subject Species

In [38]:
er.add_ref(
    container=read_nwbfile.subject,
    attribute='species',
    key='Macaca mulatta',
    entity_id='NCBI_TAXON:9544',
    entity_uri='https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/id=9544')

(<hdmf.common.resources.Key at 0x123a4ac20>,
 <hdmf.common.resources.Entity at 0x123a4b430>)

We can see that the linked `ExternalResources` instance has been populated.

In [39]:
read_nwbfile.get_linked_resources()

We can visualize `ExternalResources` as a single table:

In [40]:
df=read_nwbfile.get_linked_resources().to_dataframe()
df

Unnamed: 0,file_object_id,objects_idx,object_id,files_idx,object_type,relative_path,field,keys_idx,key,entities_idx,entity_id,entity_uri
0,9c3a5c45-316c-493d-a712-03a01b662ee9,0,9c3a5c45-316c-493d-a712-03a01b662ee9,0,NWBFile,general/experimenter,,0,Hansem Sohn,0,ORCID:0000-0001-8593-7473,https://orcid.org/0000-0001-8593-7473
1,9c3a5c45-316c-493d-a712-03a01b662ee9,1,f8641805-f93c-446f-8194-5fce08d22dbb,0,ElectrodeGroup,location,,1,Dorsomedial frontal cortex,1,DB09,https://scalablebrainatlas.incf.org/macaque/DB09
2,9c3a5c45-316c-493d-a712-03a01b662ee9,2,5ee39486-8625-4ac3-9691-ce9d724812a4,0,Subject,species,,2,Macaca mulatta,2,NCBI_TAXON:9544,https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/...


As mentioned prior, `NERD` and the `NWBFile` are written separately.

In [None]:
with NWBHDF5IO("NWBfile_ER_Example.nwb", "w") as io:
    io.write(nwbfile)

In [None]:
er.to_norm_tsv(path='./')

To see the various query methods and the explicit set of rules within `NERD`, please refer to the [NERD guide](NERD_TermSet_How_to_Guide.ipynb).

## NERD Example with TermSet

`TermSet` allows users to create their own subset of terms with ontological references and is built upon the resources from LinkML.

Use Cases:
1. Validation of data. Currently, validation with a `TermSet` is only supported for `Data`, but we are discussing ways to expand this to any attribute, e.g., experimenters. 
2. `TermSet` streamlines the user experience for adding new references to `ExternalResources` using `add_ref_term_set`.

In order to see how to create a [TermSet](https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/term_set.py), refer to the [NERD guide](NERD_TermSet_How_to_Guide.ipynb) and also these relevant [LinkML resources](https://linkml.io/linkml/intro/tutorial06.html)

<img src="experimenters_schema.png" width=600 align='left'>

In this example, we will create a brand new `NWBFile` that stores a `DynamicTable` of species data. We create a new column, i.e a new instance of `VectorData`, that uses the optional `term_set` field. When provided a `TermSet`, the data will be validated according to that set of terms.

For more details on how we handle validation with a `TermSet` please refer to the [NERD guide](NERD_TermSet_How_to_Guide.ipynb).

In [44]:
terms = TermSet(term_schema_path='./experimenter_term_set.yaml')
er = ExternalResources() 

  warn(_exp_warn_msg(cls))


In [45]:
session_start_time = datetime(2018, 4, 25, 2, 30, 3, tzinfo=tz.gettz("US/Pacific"))

nwbfile = NWBFile(
    session_description="Mouse exploring an open field",  
    identifier="Mouse5_Day3",  
    session_start_time=session_start_time, 
    session_id="session_1234",  
    experimenter=["Dichter, Benjamin K.", "Rubel, Oliver"], 
    lab="My Lab Name", 
    institution="University of My Institution", 
    related_publications="DOI:10.1016/j.neuron.2016.12.011",  
)
nwbfile.subject = Subject(
    subject_id="001",
    age="P90D",
    description="mouse 5",
    species="Mus musculus",
    sex="M",
)

In [46]:
nwbfile.get_linked_resources()

As mentioned prior, the `add_ref_term_set` method streamlines the original `add_ref` method. The `key` field is removed as the data values themselves will be used as keys, the `entity_id` and `entity_uri` fields will be populated from the values within the `TermSet`. If the user linked the `NWBFile` to the `ExternalResources` instance as in the prior example, then it is further streamlined, requiring only the `container` and possibly an `attribute`.

In [47]:
er.add_ref_term_set(container=nwbfile,
                    attribute='experimenter',
                    term_set=terms
                   ) 

True

In [48]:
er.to_dataframe()

Unnamed: 0,file_object_id,objects_idx,object_id,files_idx,object_type,relative_path,field,keys_idx,key,entities_idx,entity_id,entity_uri
0,7cc452e3-925d-489d-9d64-01c6b227a906,0,7cc452e3-925d-489d-9d64-01c6b227a906,0,NWBFile,general/experimenter,,0,"Dichter, Benjamin K.",0,ORCID:0000-0001-5725-6910,https://orcid.org/0000-0001-5725-6910
1,7cc452e3-925d-489d-9d64-01c6b227a906,0,7cc452e3-925d-489d-9d64-01c6b227a906,0,NWBFile,general/experimenter,,1,"Rubel, Oliver",1,ORCID:0000-0001-9902-1984,https://orcid.org/0000-0001-9902-1984


In [None]:
schema_path = 'tests/unit/example_dynamic_term_set.yaml'
termset = TermSet(term_schema_path=schema_path, dynamic=True)

In [None]:
termset = TermSet(schemasheets_folder=folder)