# Extracting time-series features from resting-state fMRI data

In [6]:
import os
import numpy as np
import nibabel as nib
import pandas as pd
from nilearn import plotting
from nilearn.image import math_img, resample_img, index_img, threshold_img
from matplotlib import pyplot as plt

%load_ext rpy2.ipython

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


## Loading in the data

We will start by computing the mean framewise displacement (FD) for each participant so that we know which participant(s) to exclude from further analysis.
FD is computed using the method from [Power et al. (2012)](https://doi.org/10.1016/j.neuroimage.2011.10.018) and we apply the 'lenient' threshold described in [Parkes et al. (2018)](https://doi.org/10.1016/j.neuroimage.2017.12.073).
FD data should be organized in a `.txt` file with two columns (comma-delimited): one for the `Sample_ID` and one for the `Mean_FD_Power`.


In [2]:
UCLA_CNP_mean_FD = pd.read_table('data/input_data/UCLA_CNP_Mean_FD_Power.txt', delimiter=',', header=0)
ABIDE_mean_FD = pd.read_table('data/input_data/ABIDE_Mean_FD_Power.txt', delimiter=',', header=0)

# Print the first five rows of the UCLA CNP Mean FD
UCLA_CNP_mean_FD.head()

Unnamed: 0,Sample_ID,Mean_FD_Power
0,sub-10159,0.11049
1,sub-10171,0.21475
2,sub-10189,0.20426
3,sub-10206,0.064667
4,sub-10217,0.062641


We will drop any participants with a mean FD > 0.55mm per the 'lenient' threshold criteria and save the list of participants to drop to a .txt file:

In [26]:
# Identify any participants with Mean_FD_Power > 0.55 and write their sample IDs to a text file
UCLA_CNP_mean_FD[UCLA_CNP_mean_FD['Mean_FD_Power'] > 0.55]['Sample_ID'].to_csv('data/input_data/UCLA_CNP_participants_to_drop_lenient.txt', index=False, header=False)
ABIDE_mean_FD[ABIDE_mean_FD['Mean_FD_Power'] > 0.55]['Sample_ID'].to_csv('data/input_data/ABIDE_participants_to_drop_lenient.txt', index=False, header=False)


We will start with our resting-state fMRI data stored in a [`.feather` file](https://arrow.apache.org/docs/python/feather.html) (for easy conversion between R and Python).
Data should be organized in a long format, such that there is one row for each brain region and timepoint per participant.

In [3]:
# Define input time-series feather files for the two datasets

UCLA_CNP_input_time_series_data = pd.read_feather('data/input_data/UCLA_CNP_AROMA_2P_GMR_fMRI_TS.feather')
ABIDE_input_time_series_data = pd.read_feather('data/input_data/ABIDE_ASD_FC1000_fMRI_TS.feather')


Let's print out the first five rows of this time-series dataset for the UCLA CNP cohort:

In [4]:
UCLA_CNP_input_time_series_data.head()

Unnamed: 0,Sample_ID,Noise_Proc,Brain_Region,timepoint,values
0,sub-10159,AROMA+2P+GMR,ctx-lh-bankssts,1,4.071963
1,sub-10159,AROMA+2P+GMR,ctx-lh-caudalanteriorcingulate,1,1.078613
2,sub-10159,AROMA+2P+GMR,ctx-lh-caudalmiddlefrontal,1,-0.191693
3,sub-10159,AROMA+2P+GMR,ctx-lh-cuneus,1,0.888717
4,sub-10159,AROMA+2P+GMR,ctx-lh-entorhinal,1,-4.071331


The data should be structured such that there are five columns:
1. `Sample_ID`: The unique ID mapping to an individual participant.
2. `Noise_Proc`: Name of the noise processing procedure, useful for when multiple noise processing pipelines are evaluated.
3. `Brain_Region`: The name of the brain region.
4. `timepoint`: The timepoint corresponding to the BOLD frame.
5. `values`: The BOLD signal amplitude at the given timepoint.

## Extracting univariate time-series features

First, we will extract 25 univariate time-series features comprising the [`catch22`](https://doi.org/10.1007/s10618-019-00647-x) feature set, mean, standard deviation, and fractional amplitude of low-frequency fluctuations (fALFF).
The `catch22` features, mean, and SD can all be computed in R using the [`theft`](https://cran.r-project.org/web/packages/theft/vignettes/theft.html) package (collectively referred to as the `catch24` feature set), while the fALFF will be computed in Matlab.
Computing the `catch24` features will take several minutes, so feel free to hit play on the next code chunk and grab a coffee ☕️
(Alternatively, you can run this on a high-performance computing cluster if you prefer.)

In [30]:
%%R -i UCLA_CNP_input_time_series_data,ABIDE_input_time_series_data -o UCLA_CNP_catch24_features,ABIDE_catch24_features
# Load the theft and tidyr packages
library(theft)
library(tidyr)

# We can define a helper function to compute the `catch24` time-series features using the `theft` package
catch24_all_samples <- function(full_TS_data,
                                output_column_names = c("Output"),
                                unique_columns = c("Sample_ID", "Brain_Region", "Noise_Proc")) {
  
  
  # Merge columns into unique ID
  full_TS_data <- full_TS_data %>%
    tidyr::unite("Unique_ID", unique_columns, sep="__")
  
  # Compute the set of 24 time-series features using theft
  TS_catch24 <- theft::calculate_features(data = full_TS_data, 
                                          id_var = "Unique_ID", 
                                          time_var = "timepoint", 
                                          values_var = "values", 
                                          feature_set = "catch22",
                                          catch24 = TRUE)[[1]] %>%
    tidyr::separate("id", c(output_column_names), sep="__")

  # Return the resulting set of 24 features computed per brain region
  return(TS_catch24)
    
}

# Compute the 24 time-series features for UCLA CNP and ABIDE time-series data
UCLA_CNP_catch24_features <- catch24_all_samples(UCLA_CNP_input_time_series_data,
                                                 output_column_names = c("Sample_ID", "Brain_Region", "Noise_Proc"))
ABIDE_catch24_features <- catch24_all_samples(ABIDE_input_time_series_data,
                                                output_column_names = c("Sample_ID", "Brain_Region", "Noise_Proc"))
                                  


    an issue that caused a segfault when used with rpy2:
    https://github.com/rstudio/reticulate/pull/1188
    Make sure that you use a version of that package that includes
    the fix.
    

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
No IDs removed. All value vectors good for feature extraction.
Running computations for catch22...

Calculations completed for catch22.
No IDs removed. All value vectors good for feature extraction.
Running computations for catch22...

Calculations completed for catch22.
1: Using an external vector in selections was deprecated in tidyselect 1.1.0.
ℹ Please use `all_of()` or `any_of()` instead.
  # Was:
  data %>% select(unique_columns)

  # Now:
  data %>% select(all_of(unique_columns))

See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
generated. 
ℹ In argument: `Rcatch22::catch22_all(.data$values, catch24 = catch24)`.
ℹ In group 1: `id = "sub-10159__Left-Accumbens-area__AROMA+2P+GMR"`.
! As of 0.1.14 the feature 'CO_f1ecac' returns a double instead of int
3: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `refra

We will separately compute the fractional amplitude of low-frequency fluctuations (fALFF) using the Matlab script `compute_regional_fALFF.m` as follows (note: Matlab license is required to run this):

In [2]:
os.getcwd()

'/Users/abry4213/Library/CloudStorage/OneDrive-TheUniversityofSydney(Students)/github/fMRI_FeaturesDisorders'

In [5]:
# First, we need to convert our time-series feather file to a Matlab .mat file to be read in properly
UCLA_CNP_time_series_file_base = 'data/input_data/UCLA_CNP_AROMA_2P_GMR_fMRI_TS'
ABIDE_time_series_file_base = 'data/input_data/ABIDE_ASD_FC1000_fMRI_TS'

# Run the feather_to_mat.py script with the file base as the input argument, indicating that the output file should be a mat file
os.system(f"python3 code/feature_extraction/feather_to_mat.py {UCLA_CNP_time_series_file_base} mat")
os.system(f"python3 code/feature_extraction/feather_to_mat.py {ABIDE_time_series_file_base} mat")

# Run the compute_regional_fALFF.m script -- note that you might need to update your matlab path here
TS_mat_file = "data/input_data/UCLA_CNP_AROMA_2P_GMR_fMRI_TS.mat"
output_mat_file = "data/time_series_features/UCLA_CNP_fALFF.mat"
current_dir = os.getcwd()
os.system(f"cd {current_dir}/code/feature_extraction")
# os.system(f"cd {current_dir}/code/feature_extraction; /Applications/MATLAB_R2023a.app/bin/matlab -nodisplay -singleCompThread -r 'compute_regional_fALFF $TS_mat_file $output_mat_file; exit'; cd ../..")

0

In [18]:
%%bash 

# Run the feather_to_mat.py script with the file base as the input argument, indicating that the output file should be a mat file
python3 code/feature_extraction/feather_to_mat.py data/input_data/UCLA_CNP_AROMA_2P_GMR_fMRI_TS mat
python3 code/feature_extraction/feather_to_mat.py data/input_data/ABIDE_ASD_FC1000_fMRI_TS mat

# Define the path to the data
data_path=$(echo $(pwd) | tr -d ' ')

# Run the compute_regional_fALFF.m script -- note that you might need to update your matlab path here
cd code/feature_extraction

# # UCLA CNP
# TS_mat_file="$data_path/data/input_data/UCLA_CNP_AROMA_2P_GMR_fMRI_TS.mat"
# output_mat_file="data/time_series_features/UCLA_CNP_fALFF.mat"
# /Applications/MATLAB_R2023a.app/bin/matlab -nodisplay -singleCompThread -r "compute_regional_fALFF $data_path $TS_mat_file $output_mat_file; exit"

# # ABIDE
# TS_mat_file="data/input_data/ABIDE_ASD_FC1000_fMRI_TS.mat"
# output_mat_file="data/time_series_features/ABIDE_fALFF.mat"
# /Applications/MATLAB_R2023a.app/bin/matlab -nodisplay -singleCompThread -r "compute_regional_fALFF $data_path $TS_mat_file $output_mat_file; exit"

# Convert the mat file back to feather for fALFF
python3 feather_to_mat.py ${data_path}/data/time_series_features/UCLA_CNP_fALFF feather
python3 feather_to_mat.py ${data_path}/data/time_series_features/ABIDE_fALFF feather

In [32]:
# We can save the results as .feather files

UCLA_CNP_catch24_features.reset_index().to_feather('data/time_series_features/UCLA_CNP_catch24_features.feather')
ABIDE_catch24_features.reset_index().to_feather('data/time_series_features/ABIDE_catch24_features.feather')