## Generate a JSON events sidecar template from a BIDS dataset.

General strategy for machine-actionable annotation using HED in BIDS is
to create a single `events.json` sidecar file in the BIDS dataset root directory.
Ideally, this sidecar will contain all the annotations needed for users to
understand and analyze the data.
(See the [**BIDS annotation quickstart**](https://hed-examples.readthedocs.io/en/latest/BidsAnnotationQuickstart.html)
for additional information on this strategy.)

This notebook shows how to create a JSON sidecar template from the information in all
the event files in a BIDS dataset.
The generation constructs a dictionary of the event files in the dataset,
and then consolidates the information to extract the sidecar.

The dictionary keys are specified by a `entities` tuple lists the BIDS entity names
to include in the key.
BIDS base file names are constructed of entity *name*-*value* pairs separated
by underbars and followed by an ending *_suffix*.

For a file name `sub-001_ses-3_task-target_run-01_events.tsv`,
the tuple ('sub', 'task') gives a key of `sub-001_task-target`,
while the tuple ('sub', 'ses', 'run) gives a key of `sub-001_ses-3_run-01`.
The use of dictionaries of file names with such keys makes it
easier to associate related files in the BIDS naming structure.

To use this notebook, substitute the specifics of your BIDS
dataset for the following variables:

| Variable | Purpose |
| -------- | ------- |
| bids_root_path | Full path to root directory of dataset.|
| exclude_dirs | List of directories to exclude when constructing the list of event files. |
| entities  | Tuple of entity names used to construct a unique keys representing filenames.<br>(See [Dictionaries of filenames](https://hed-examples.readthedocs.io/en/latest/HedInPython.html#dictionaries-of-filenames-anchor) for examples of how to choose the key.)|
| skip_columns  |  List of column names in the `events.tsv` files to skip in the analysis. |
| value_columns | List of columns names in the `events.tsv` files to annotate as<br>as a whole rather than by individual column value. |


For large datasets, be sure to exclude columns such as
`onset` and `sample`, since the summary produces counts of the number of times
each unique value appears somewhere in dataset event files.

When run, the script creates a dictionary of the unique values in each column
by consolidating the information in all of the `events.tsv` files in the dataset.
It then outputs the result as a JSON string representing a JSON sidecar.

The example below uses a
[small version](https://github.com/hed-standard/hed-examples/tree/main/datasets/eeg_ds003645s_hed)
of the Wakeman-Hanson face-processing dataset available on openNeuro as
[ds003645](https://openneuro.org/datasets/ds003645/versions/2.0.0).


In [1]:
import os
import json
from hed.tools import BidsTabularDictionary, TabularSummary, get_file_list

# Variables to set for the specific dataset
bids_root_path = 'Q:/PerceptionalON'
name = 'Perceptional'
exclude_dirs = ['stimuli', 'derivatives', 'sub-patient']
entities = ('sub', 'ses', 'task', 'run')
skip_columns = ["onset", "duration", "event_sample"]
value_columns = ["response_time", "response_time2", "stimamp", "stimon", "confidence"]

# Construct the event file dictionary for the BIDS event files
event_files = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events", exclude_dirs=exclude_dirs)
file_dict = BidsTabularDictionary(name, event_files, entities=entities)

# Construct the event file value summary and generate a sidecar template representing dataset
value_summary = TabularSummary(value_cols=value_columns, skip_cols=skip_columns, name="Wakeman-Hanson test data")
value_summary.update(event_files)
sidecar_template = value_summary.extract_sidecar_template()
str_json = json.dumps(sidecar_template, indent=4)
print(str_json)
with open('d:/perception.json', 'w') as fp:
    json.dump(sidecar_template, fp, indent=4)

{
    "trial_type": {
        "Description": "Description for trial_type",
        "HED": {
            "conf": "(Label/trial_type, Label/conf)",
            "conf-resp": "(Label/trial_type, Label/conf-resp)",
            "cr": "(Label/trial_type, Label/cr)",
            "fa": "(Label/trial_type, Label/fa)",
            "hit": "(Label/trial_type, Label/hit)",
            "miss": "(Label/trial_type, Label/miss)",
            "stim-adapt": "(Label/trial_type, Label/stim-adapt)",
            "stim-thr": "(Label/trial_type, Label/stim-thr)"
        },
        "Levels": {
            "conf": "Description for conf of trial_type",
            "conf-resp": "Description for conf-resp of trial_type",
            "cr": "Description for cr of trial_type",
            "fa": "Description for fa of trial_type",
            "hit": "Description for hit of trial_type",
            "miss": "Description for miss of trial_type",
            "stim-adapt": "Description for stim-adapt of trial_type",
        