## Checks that the event codes are consistent with conditions

This script cross-checks for consistency after the initial `_eventstemp1.tsv`
files are produced by `attention_shift_02_initial_combination.ipynb` notebook.

### Checking for forbidden codes
       Codes 1 and 2 can appear anywhere
       Codes 3 through 6 should appear only in the focus condition.
       Codes 7 through 14 should appear only in the shift condition.
       Codes 199, 201, 202, and 255 are not related to condition.

The conclusions of running this script:
* sub_005_run_01 has 5 shift event codes in a focus condition.
* sub_008_run_01 has 2874 shift event codes in a focus condition.
* sub_015_run_01 has 239 focus event codes in a shift condition.
* sub_031_run_01 has 6067 cond_code values of 0.
* sub_036_run_02 has 721 focus event codes in a shift condition.

There are other issues with the data that are detected later in this process.

In [1]:
import os
import datetime
from hed.tools import BidsTabularDictionary, get_file_list, get_new_dataframe, HedLogger, TabularSummary

# Variables to set for the specific dataset
bids_root_path = '/XXX/AttentionShiftWorking'
exclude_dirs = ['sourcedata', 'stimuli', 'code']
entities = ('sub', 'run')
skip_cols = ['onset', 'duration', 'sample']
log_name = 'attention_shift_03_check_cond_consistency_log'

# Set up the logger
log_file_name = f"code/curation_logs/{log_name}.txt"
logger = HedLogger(name=log_name)

# Construct the event file dictionary and summary
bids_files = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_eventstemp1",
                           exclude_dirs=exclude_dirs)
bids_dict = BidsTabularDictionary("Bids event files", bids_files, entities=entities)

# Create summary dictionaries of the combined event files
bids_sum_all, bids_sum =  TabularSummary.make_combined_dicts(bids_dict, skip_cols=skip_cols)
print(f"\nSummary of all BIDS events files after combination:\n{bids_sum_all}")

# Find which studies have bad codes
print("Isolating the bad codes:")
for key, file, rowcount, columns in bids_dict.iter_tsv_info():
    df_bids = get_new_dataframe(file.file_path)

    focus_cond_mask = df_bids['cond_code'].map(str).isin(['1', '2'])
    shift_cond_mask = df_bids['cond_code'].map(str).isin(['3'])
    focus_event_mask = df_bids['event_code'].map(str).isin(['3', '4', '5', '6'])
    shift_event_mask = df_bids['event_code'].map(str).isin(['7', '8', '9', '10', '11', '12', '13', '14'])
    bad_focus = sum(focus_cond_mask & shift_event_mask)
    if bad_focus:
        logger.add(key, f"{bad_focus} shift event codes in a focus condition", level="WARNING")

    bad_shift = sum(shift_cond_mask & focus_event_mask)
    if bad_shift:
        logger.add(key, f"{bad_shift} focus event codes in a shift condition", level="WARNING")

    bad_cond_mask = df_bids['cond_code'].map(str).isin(['0'])
    if sum(bad_cond_mask):
        logger.add(key, f"{sum(bad_cond_mask)} cond_code values of 0", level="WARNING")

    pulse_code_mask = df_bids['event_code'].map(str).isin(['199'])
    if sum(pulse_code_mask):
        logger.add(key, f"{sum(pulse_code_mask)} event_code values of 199", level="WARNING")

    pulse_combo_count = sum(pulse_code_mask & bad_cond_mask)
    if pulse_combo_count:
        logger.add(key, f"{pulse_combo_count} event_code values of 199 with cond_code 0", level="WARNING")

    unknown_count = sum(df_bids['event_code'].map(str).isin(['255']))
    if unknown_count:
        logger.add(key, f"{unknown_count} event_code values of 255", level="WARNING")

    pause_count = sum(df_bids['event_code'].map(str).isin(['202']))
    if pause_count:
        logger.add(key, f"{pause_count} event_code values of 202", level="WARNING")

# Output and save the log
log_string = "\n\nLog output:\n" + logger.get_log_string()
error_string = "\n\nERROR Summary:\n" + logger.get_log_string(level="ERROR")
warning_string = "\n\nERROR Summary:\n" + logger.get_log_string(level="WARNING")
print(log_string)
print(error_string)
print(warning_string)

save_path = os.path.join(bids_root_path, log_file_name)
with open(save_path, "w") as fp:
    fp.write(f"{log_file_name} {datetime.datetime.now()}\n")
    fp.write(f"\n{bids_sum_all}\n")
    fp.write(log_string)
    fp.write(error_string)
    fp.write(warning_string)


Summary of all BIDS events files after combination:
Summary for column dictionary :
   Categorical columns (2):
      cond_code (4 distinct values):
         0: 6067
         1: 58184
         2: 54044
         3: 168840
      event_code (16 distinct values):
         1: 11703
         10: 4702
         11: 37548
         12: 37524
         13: 18778
         14: 18779
         2: 11701
         201: 29028
         202: 927
         3: 9296
         4: 9301
         5: 37171
         6: 37167
         7: 9406
         8: 9408
         9: 4696
   Value columns (0):
Isolating the bad codes:


Log output:
attention_shift_03_check_cond_consistency_log: Level None
sub-001_run-01:
sub-002_run-01:
sub-003_run-01:
sub-004_run-01:
sub-004_run-02:
sub-005_run-01:
sub-006_run-01:
sub-007_run-01:
sub-008_run-01:
sub-009_run-01:
sub-010_run-01:
sub-011_run-01:
sub-012_run-01:
sub-013_run-01:
sub-014_run-01:
sub-015_run-01:
sub-016_run-01:
sub-017_run-01:
sub-018_run-01:
sub-019_run-01:
sub-020_run