Timeseries preparation
=========================

Prior to running clustering the time-series into discrete brain stets, all timeseries were concatenated into large $N \times P$ array containing $N$ observation and $P$ features. The length of $N$ was equal to 227040 as a result of concatenating 4 sessions of dual n-back data (340 time-points) and resting state data (305 time-points) of 44 subjects. The length of $P$ was equal 400 and represented the mean signal extracted from each brain areas of Schaefer et al. (2018) brain parcellation.

By this procedure we ensured the correspondence of brain states labels across subjects, sessions and tasks.

Step 1: Data reduction
---------------------------

Before running k-means clustering algorythm, subjects with high motion or missing data in at least one session were excluded from analyses.

In [24]:
import pandas as pd
import numpy as np

# Selecting subjects for analysis
groups = pd.read_csv('data/behavioral/group_assignment.csv')

dualnback_motion = ['sub-13', 'sub-21', 'sub-23'] # higly motion subjects in one of four sessions
rest_motion = ['sub-21', 'sub-46', 'sub-47'] # higly motion subjects in one of four sessions / missing data(20-44)
rest_missing = ['sub-20', 'sub-44']

exclude = np.unique(dualnback_motion + rest_motion + rest_missing)
print(f'Subjects to exclude due to motion or missing data: {exclude}')

groups['included'] = ((groups.group == 'Experimental') | (groups.group == 'Control')) & ~groups['sub'].isin(exclude)
groups_clean = groups[groups['included'].values].reset_index()
groups_clean.to_csv("./data/behavioral/groups_clean_dualnback_rest.csv", index=False)

n_sub = groups.included.values.sum()
print(f'Number of subjects included in analyses: {n_sub}')

Subjects to exclude due to motion or missing data: ['sub-13' 'sub-20' 'sub-21' 'sub-23' 'sub-44' 'sub-46' 'sub-47']
Number of subjects included in analyses: 39


Step 2: Loading time-series
---------------------------

In [29]:
# Loading time-series data

parcellation = "schaefer"
tasks = ["dualnback", "rest"]
n_roi = 400

ts_dualnback_raw = np.load("timeseries_schaefer400_dualnback.npy")
ts_rest_raw = np.load("timeseries_schaefer400_rest.npy")

ts_dualnback = ts_dualnback_raw[groups['included'].values]
ts_rest = ts_rest_raw[groups['included'].values]

print(f'Original dualnback data shape: {ts_dualnback.shape}')
print(f'Original rest data shape: {ts_rest.shape}')

Original dualnback data shape: (39, 4, 340, 400)
Original rest data shape: (39, 4, 305, 400)


Step 3: Concatenating time-series
---------------------------

In [26]:
# Concatenating time-series
n_ses = ts_dualnback.shape[1]
n_rois = ts_dualnback.shape[3]

cts_dualnback = ts_dualnback.reshape(n_sub*n_ses*ts_dualnback.shape[2], n_rois)     #all 46 subcjects in one vector
cts_rest = ts_rest.reshape(n_sub*n_ses*ts_rest.shape[2], n_rois)

# Concatenating task and rest
cts_all = np.zeros((cts_dualnback.shape[0] + cts_rest.shape[0] , n_rois))
cts_all[0:cts_dualnback.shape[0],:] = cts_dualnback 
cts_all[cts_dualnback.shape[0]:, :] = cts_rest

np.save("./data/neuroimaging/timeseries_concat_all_schaefer400.npy", cts_all)

print(f"Shape of dualnback timeseries: {cts_dualnback.shape}")
print(f"Shape of rest timeseries: {cts_rest.shape}")
print(f"Shape of all timeseries: {cts_all.shape}")

Shape of dualnback timeseries: (53040, 400)
Shape of rest timeseries: (47580, 400)
Shape of all timeseries: (100620, 400)
