## Initial summary of event files

**Dataset** BCIT RSVP Expertise (in process)

This script does a preliminary summary of the contents of the events files.
The summary includes printing out the column names of each event file so
that they can be manually checked for differences.

The script assumes that the data is in BIDS format and that each BIDS events
file of the form `_events.tsv` has a corresponding events file with
suffix `_eventstemp.tsv` that was previously dumped from the `EEG.set` files.

This script does a preliminary summary of the contents of the events files.
The summary includes printing out the column names of each event file so
that they can be manually checked for differences.

The script assumes that the data is in BIDS format and that each BIDS events
file of the form `_events.tsv` has a corresponding events file with
suffix `_eventstemp.tsv` that was previously dumped from the `EEG.set` files.
Keys are specified by a `entities` tuple lists the BIDS entity names
to include in the key.
BIDS base file names are constructed of entity *name*-*value* pairs separated
by underbars and followed by an ending *_suffix*.

For a file name `sub-001_ses-3_task-target_run-01_events.tsv`,
the tuple ('sub', 'task') gives a key of `sub-001_task-target`,
while the tuple ('sub', 'ses', 'run) gives a key of `sub-001_ses-3_run-01`.
The use of dictionaries of file names with such keys makes it
easier to associate related files in the BIDS naming structure.

The setup requires the setting of the following variables for your dataset:

| Variable | Purpose |
| -------- | ------- |
| bids_root_path | Full path to root directory of dataset.|
| exclude_dirs | List of directories to exclude when constructing file lists. |
| entities  | Tuple of entity names used to construct a unique keys representing filenames. <br>(See [Dictionaries of filenames](https://hed-examples.readthedocs.io/en/latest/HedInPython.html#dictionaries-of-filenames-anchor) for examples of how to choose the keys.)|
| bids_skip_columns  |  List of column names in the `events.tsv` files to skip in the analysis. |
| eeg_skip_columns  | List of column names in the `eventstemp.tsv` files form EEG.events to skip in analysis.|
| log_name | Name of the log file (saved in the `code/curation_logs` subdirectory). |

In [1]:
import os
import datetime
from hed.tools import BidsTabularDictionary, get_file_list, HedLogger, TabularSummary

# Variables to set for the specific dataset
bids_root_path = '/XXX/RSVPExpertiseWorking'
exclude_dirs = ['sourcedata', 'stimuli', 'code']
entities = ('sub', 'ses', 'run')
bids_skip_columns = ['onset']
eeg_skip_columns = ['latency', 'urevent', 'imageid', 'gid', 'buttonpressduration', 'reactiontime',
                    'luminance', 'tgtsize',	'tgtdistfromcenter', 'usertags']
log_name = 'bcit_rsvp_expertise_01_initial_summary_log'

# Set up the logger
log_file_name = f"code/curation_logs/{log_name}.txt"
logger = HedLogger(name=log_name)

# Construct the event file dictionaries for the BIDS and for EEG.event files
files_bids = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events", exclude_dirs=exclude_dirs)
bids_dict = BidsTabularDictionary("Bids event files", files_bids, entities=entities)
files_eeg = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_eventstemp", exclude_dirs=exclude_dirs)
eeg_dict = BidsTabularDictionary("EEG event files", files_eeg, entities=entities)

# Output a list of files for the two cases
print(f"\n{bids_dict.report_diffs(eeg_dict, logger)}\n\n")

# Create summary dictionaries of the original BIDS events files and output column names
bids_sum_all, bids_sum =  TabularSummary.make_combined_dicts(bids_dict, skip_cols=bids_skip_columns)
print(f"\nSummary of all BIDS events files:\n{bids_sum_all}")

eeg_sum_all, eeg_sum =  TabularSummary.make_combined_dicts(eeg_dict, skip_cols=eeg_skip_columns)
print(f"\nSummary of all EEG.set events files:\n{eeg_sum_all}")

# Output and save the log
log_string = "\n\nLog output:\n" + logger.get_log_string()
error_string = "\n\nERROR Summary:\n" + logger.get_log_string(level="ERROR")
print(log_string)
print(error_string)

save_path = os.path.join(bids_root_path, log_file_name)
with open(save_path, "w") as fp:
    fp.write(f"{log_file_name} {datetime.datetime.now()}\n")
    fp.write(f"\n{bids_sum_all}\n")
    fp.write(f"\n{eeg_sum_all}\n")
    fp.write(log_string)
    fp.write(error_string)


Bids event files has 59 event files
EEG event files has 59 event files

Bids event files event files (59 files)
sub-01_ses-01_run-1: sub-01_ses-01_task-RSVPObjectRestBlink_run-1_events.tsv
sub-01_ses-02_run-1: sub-01_ses-02_task-RSVPObjectRestBlink_run-1_events.tsv
sub-01_ses-03_run-1: sub-01_ses-03_task-RSVPObjectRestBlink_run-1_events.tsv
sub-01_ses-04_run-1: sub-01_ses-04_task-RSVPObjectRestBlink_run-1_events.tsv
sub-01_ses-05_run-1: sub-01_ses-05_task-RSVPObjectRestBlink_run-1_events.tsv
sub-02_ses-01_run-1: sub-02_ses-01_task-RSVPObjectRestBlink_run-1_events.tsv
sub-02_ses-02_run-1: sub-02_ses-02_task-RSVPObjectRestBlink_run-1_events.tsv
sub-02_ses-03_run-1: sub-02_ses-03_task-RSVPObjectRestBlink_run-1_events.tsv
sub-02_ses-04_run-1: sub-02_ses-04_task-RSVPObjectRestBlink_run-1_events.tsv
sub-02_ses-05_run-1: sub-02_ses-05_task-RSVPObjectRestBlink_run-1_events.tsv
sub-03_ses-01_run-1: sub-03_ses-01_task-RSVPObjectRestBlink_run-1_events.tsv
sub-03_ses-01_run-2: sub-03_ses-01_task-