This notebook exists to explore the input and output for SST

`multiconds.py` contains the following outputs:

 - create_masks
 - ONE OF the following two:
     - write_bids_events, or
     - the following two:
         - write_betaseries
         - write_conditions

         
Basically, if we pass `multiconds.py` a bids path, it'll output to the bids directory; otherwise it'll write betaseries and conditions.

So I guess it has two modes and is designed to be run twice, once to generate events files to bids, and the other time to create betaseries and conditions.

## bids data

### write_bids_events

e.g.

In [1]:
bids_tsv_file_path = '/gpfs/projects/sanlab/shared/DEV/bids_data/sub-DEV085/ses-wave1/func/sub-DEV085_ses-wave1_task-SST_acq-1_events.tsv'
bids_json_file_path = '/gpfs/projects/sanlab/shared/DEV/bids_data/sub-DEV085/ses-wave1/func/sub-DEV085_ses-wave1_task-SST_acq-1_events.json'

In [2]:
import pandas as pd

In [3]:
pd.read_csv(bids_tsv_file_path,sep='\t')

Unnamed: 0,onset,duration,trial_type
0,0.00000,2.25834,failed-stop
1,2.25834,0.50000,
2,2.75834,2.00556,failed-go
3,4.76390,0.75000,
4,5.51390,0.47261,correct-go
...,...,...,...
251,412.82572,1.75000,
252,414.57572,0.54468,correct-go
253,417.08682,0.75000,
254,417.83682,1.50000,correct-stop


In [4]:
json_file_contents = pd.read_json(bids_json_file_path)

In [5]:
json_file_contents

Unnamed: 0,onset,duration,trial_type
LongName,Onset,Duration,Categorization of a response inhibition task
Description,Onset of the event measured from the beginning...,"Duration of the event, measured from onset.","Education level, self-rated by participant"
Units,s,s,
Levels,,,"{'correct-go': 'Go trial, correct response', '..."


In [6]:
json_file_contents.loc['Levels','trial_type']

{'correct-go': 'Go trial, correct response',
 'failed-go': 'Go trial, incorrect or no response',
 'correct-stop': 'No-go or stop trial, correct response',
 'failed-stop': 'No-go or stop trial, incorrect response',
 'null': 'Null trial where cue stimulus is presented for duration'}

Good, so this information in the bids folder contains data about the trials. So if we can use that to move around the output, we have exactly what we need. we just need to work out what exactly is in the betaseries files.

## nonbids data

### write_betaseries

In [29]:
multiconds_sst_dir = '/gpfs/projects/sanlab/shared/DEV/DEV_scripts/fMRI/fx/multiconds/SST/'

In [30]:
betaseries_output_filename = multiconds_sst_dir + 'betaseries/DEV085_1_SST1.mat'




In [31]:
import scipy
import scipy.io

In [32]:
dev_085_betaseries_mat = scipy.io.loadmat(
    betaseries_output_filename,
    simplify_cells=True            
)

Obviously a design file:

In [41]:
dev_085_betaseries_mat.keys()

dict_keys(['__header__', '__version__', '__globals__', 'names', 'onsets', 'durations'])

How do these look like compared to the betaseries files genreated for WTP?

#### Comparison

In [42]:
comparable_wtp_mat_filepath = '/gpfs/projects/sanlab/shared/DEV/DEV_scripts/fMRI/fx/multiconds/SST/betaseries/DEV004_1_SST1.mat'
comparable_sst_mat_filepath = '/gpfs/projects/sanlab/shared/DEV/DEV_scripts/fMRI/fx/multiconds/WTP/betaseries/DEV004_1_WTP1.mat'

In [43]:
wtp_004 = scipy.io.loadmat(comparable_wtp_mat_filepath)
wtp_004.keys()

dict_keys(['__header__', '__version__', '__globals__', 'names', 'onsets', 'durations'])

In [44]:
sst_004 = scipy.io.loadmat(comparable_sst_mat_filepath)
sst_004.keys()

dict_keys(['__header__', '__version__', '__globals__', 'durations', 'names', 'onsets'])

Seems to be the same structure. I wonder what's wrong.

### write_conditions

In [34]:
conditions_output_filename = multiconds_sst_dir + 'conditions/DEV085_1_SST1.mat'




So this MAT file lists the onsets and durations of items by their class....

In [35]:
dev_085_conditions_mat = scipy.io.loadmat(
    conditions_output_filename,
    simplify_cells=True            
)

In [36]:
dev_085_conditions_mat

{'__header__': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Tue Nov 12 12:50:21 2019',
 '__version__': '1.0',
 '__globals__': [],
 'names': array(['CorrectGo', 'CorrectStop', 'FailedStop', 'Cue', 'FailedGo'],
       dtype=object),
 'onsets': array([array([  5.5139 ,   9.02224,  12.5278 ,  15.90975,  18.35212,  24.42644,
         33.87298,  38.87993,  42.5723 ,  45.20147,  50.46121,  53.46955,
         56.03692,  58.60429,  62.60846,  65.11402,  73.50431,  76.07168,
         78.63905,  81.20642,  84.02379,  87.90574,  92.2863 , 107.81272,
        110.56828, 114.01065, 116.51482, 120.14677, 123.40094, 126.4065 ,
        137.11553, 140.5579 , 143.37527, 145.87944, 148.32181, 160.8385 ,
        166.34684, 168.78921, 171.23158, 175.04895, 177.67951, 182.18507,
        189.69619, 192.82536, 195.45731, 205.4726 , 210.73234, 216.17471,
        225.75182, 228.38377, 230.95114, 234.0817 , 240.27963, 243.41019,
        251.48173, 254.61783, 257.62893, 260.6963 , 263.57547, 273.96715,
      

In [38]:
dev_085_conditions_mat.keys()

dict_keys(['__header__', '__version__', '__globals__', 'names', 'onsets', 'durations'])

OK, so where do we generate the SPM inputs?

The general info flow for DEV is in: https://docs.google.com/presentation/d/1K-nFrZYE6rR8t0myNyacB7frBzV3B1--nMqPhVkwL8E/edit#slide=id.p


For WTP, betaseries.m generates mat files that then 

In [None]:
OK, so where do we generate the SPM inputs

## Bids data processing

In [4]:
import glob
import os
import re
import pandas as pd

In [8]:
ml_data_folderpath = "/gpfs/projects/sanlab/shared/DEV/nonbids_data/fMRI/ml"

In [6]:
bids_data_folder_path = '/gpfs/projects/sanlab/shared/DEV/bids_data/'
subject_folder_pattern = 'sub-DEV*'


# get subjects in the folder path
dataframe_list = []
subject_folderpaths = glob.glob(bids_data_folder_path + subject_folder_pattern)
for subj_folderpath in subject_folderpaths:
    wave_folder_pattern = "ses-wave*"
    #loop through waves
    subj_wave_folderpaths = glob.glob(subj_folderpath + "/" + wave_folder_pattern) 
    #get the subject ID

    subj_id = re.search('sub-(DEV\d\d\d)',subj_folderpath).group(1)
    print(subj_id,end=": ")
    
    for wave_folderpath in subj_wave_folderpaths:
        #print(wave_folderpath)
        #get the wave ID
        wave_name = re.search('ses-(wave\d+)',wave_folderpath).group(1)
        print(wave_name, end= ", ")
        
        tsv_name_pattern = 'func/sub-' + subj_id + '_ses-' + wave_name + '_task-SST_acq-*_events.tsv'
        #I think we need to ensure there aren't more than one acquisition for each SSt
        #that would indicate improper data processing in earlier steps

        acquisition_filepath_list = glob.glob(wave_folderpath + "/" + tsv_name_pattern)
        if(len(acquisition_filepath_list)==0):
            #no acquisition for this wave; pass on this wave
            #this will happen often. no need to do anything about it.
            next
        elif (len(acquisition_filepath_list)>1):
            raise Exception("more than one acquisition for subject " + subjid + " wave " + wave_name + "." +
                            "We only expect ONE SST for wave 1 and wave 2 and none for further waves." +
                            "Further waves contain SST but should not contain fMRI records of such."
                           )
        else: #len(acquisition_filepath_list)==1
            #get the file
            acquisition_filepath = acquisition_filepath_list[0]
            tsv_filesize = os.path.getsize(acquisition_filepath)
            if (tsv_filesize==0):
                warning_message = ("Filesize was zero for subject " + subj_id + " wave " + wave_name + "." +
                              " This indicates a problem with the SST behaivoral data. Unclear whether the data was not properly recorded originally " +
                              "or whether there was a problem in processing. Could have been processing problem in fx/multiconds/.../multiconds.py. " +
                              "For now, skipping this wave and continuing, but it's important to investigate further.")
                print("\n" + warning_message+ "\n")
                #raise Warning()
                break
            sst_acquisition_record = pd.read_csv(acquisition_filepath,sep='\t')
            sst_acquisition_record['subject'] = subj_id
            sst_acquisition_record['wave'] = wave_name
            #add it to the list
            dataframe_list = dataframe_list + [sst_acquisition_record]
            #print(sst_acquisition_record)
    print(";", end=" ")
    #break

    
all_sst_events = pd.concat(dataframe_list)

DEV033: wave1, wave5, wave4, wave3, wave2, ; DEV079: wave1, wave4, wave3, wave2, ; DEV017: wave1, wave3, wave2, ; DEV207: wave1, wave3, wave2, ; DEV131: wave1, ; DEV122: wave1, wave5, wave3, wave2, ; DEV115: wave1, wave5, wave4, wave3, wave2, ; DEV138: wave1, wave5, wave3, wave2, ; DEV054: wave1, wave5, wave4, wave3, wave2, ; DEV194: wave1, wave3, wave2, ; DEV147: wave1, wave5, wave3, wave2, ; DEV047: wave1, wave4, wave3, wave2, ; DEV220: wave1, ; DEV215: wave1, wave2, ; DEV224: wave1, ; DEV111: wave1, ; DEV014: wave1, wave5, wave4, wave3, wave2, ; DEV083: wave1, wave4, wave3, wave2, ; DEV190: wave1, wave4, wave3, wave2, ; DEV161: wave1, wave5, wave2, ; DEV151: wave1, wave5, wave3, wave2, ; DEV026: wave1, wave5, wave4, wave3, wave2, ; DEV225: wave1, ; DEV044: wave1, wave2, ; DEV200: wave1, wave3, wave2, ; DEV132: wave1, wave5, wave3, wave2, ; DEV143: wave1, wave3, wave2, ; DEV005: wave1, wave3, wave2, ; DEV095: wave1, ; DEV085: wave1, wave2, ; DEV195: wave1, wave4, wave3, wave2, ; DEV1

In [12]:
all_sst_events.to_csv(ml_data_folderpath + "/"+"all_sst_events.csv",index=False)

In [None]:
pd.read_csv(bids_tsv_file_path,sep='\t')