# Session 3: Group analysis

In this session, we will learn how to aggregate results from all the runs in each subject as well as all the subjects in a group.

## Tools Weâ€™ll Use

### **Nilearn**
Nilearn is still the tool that help us achieve our goals in this session.

Please run the cell below for the runtime of each notebook. It will import necessary packages that you require to go through this tutorial.

In [None]:
# --- Basic Setup (always run this first) ---

# Install dependencies 
%pip install -q gdown
%pip install -q git+https://github.com/Yuan-fang/fMRI-tutorial.git

# Import essential packages
import warnings
from pathlib import Path
import shutil
from nilearn import image, plotting
from nilearn.image import index_img
from nilearn.glm.first_level import FirstLevelModel
from nilearn.glm.contrasts import compute_fixed_effects
from nilearn.glm.second_level import SecondLevelModel
from nilearn.plotting import plot_design_matrix
from bids import BIDSLayout
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt

from tutorial.utils.paths import PathManager

warnings.filterwarnings("ignore")

We also need to set up data directories. You can change to other directories and names according to your preference for your own project. But for this tutorial, let's stick with the same data directories.

In [None]:
# --- Set up data directories ---

DATASET = "Haxby2001" # name of the dataset
BASE_DIR = Path.home() / "fmri_tutorial"   # base directory for the tutorial
DATA_DIR = BASE_DIR /  DATASET # data directory for the dataset
DERIV_DIR = BASE_DIR / "derivatives" # derivatives directory for processed data

for p in (DATA_DIR, DERIV_DIR):
    p.mkdir(parents=True, exist_ok=True)

# Print out the data directories
print("Data directory:       ", DATA_DIR.resolve())
print("Derivatives directory:", DERIV_DIR.resolve())

In this session, you need to download data from the link on Canvas.

In [None]:
# Create a BIDSLayout object to interface with the BIDS dataset
layout = BIDSLayout(DATA_DIR, validate=False)  
print(f"BIDS dataset with {len(layout.get_subjects())} subjects loaded.")

In [None]:
# Create a PathManager object 'fsl_manager', which is specifically for managing file paths related to FSL processing.
fsl_manager = PathManager(
    BIDSlayout=layout,
    deriv_base=DERIV_DIR,
    pipeline="fsl_preproc"
    )

print("fsl_manager is set up.")

In [None]:
# Initialize the model
fmri_glm = FirstLevelModel(
    t_r=2.5,
    hrf_model='glover',
    drift_model=None,          # Already high-pass filtered
    high_pass=None,            # No additional filtering
    smoothing_fwhm=None,       # Already smoothed
)

In [None]:
# Get the file path of the preprocessed functional image for subject "sub-1", run "1"
func_run_1 = fsl_manager.find_path(subject="1", run="1", proc="clean", extension=".nii.gz")[0]
print("Preprocessed functional files for subject 1, run 1:", func_run_1)

# Get the event timing file for subject "sub-1", run "1"
event_run_1 = layout.get(subject="1", suffix="events", extension=".tsv", run="1", return_type="file")[0]
print("Event timing file for subject 1, run 1:", event_run_1)

# Fit the model. For simplicity, we do not include any confounds here.
fmri_glm_run_1 = fmri_glm.fit(func_run_1, events=event_run_1)

# Get contrast map between 'house' and 'bottle' conditions for run 1 (subject 1)
# Note that here we specify output_type='all' to get all the maps (z, t, effect size, variance)
# If we do not specify output_type, only z-map will be returned by default
contrast_run_1 = fmri_glm.compute_contrast('house - bottle',
                                             output_type='all')

# Visualize the contrast map between 'house' and 'bottle' conditions
plotting.plot_stat_map(
    contrast_run_1['z_score'],
    threshold=3.1, # z=3.1 correspond to p = 0.001 (1-tailed), uncorrected
    title='Contrast Map: House vs Bottle for run 1 (subject 1)'
)

#### ðŸ¤” Do it yourself: 
Get and display constrast map between 'house' and 'bottle' conditions for run 1 (subject 2)

_Type your answer in the cell below._

In [None]:
# Write and execute your code below to display the constrast map between 'house' and 'bottle' conditions for run 2 (subject 1)
# --- YOUR CODE HERE ---



We can also get the contrast map for run 2 and run 3 for subject 1 using similar code as above

In [None]:
#--- Get the contrast map for run 2 for subject 1 ---

# Get the file path of the preprocessed functional image for subject "sub-1", run "2"
func_run_2 = fsl_manager.find_path(subject="1", run="2", proc="clean", extension=".nii.gz")[0]
print("Preprocessed functional files for subject 1, run 2:", func_run_2)

# Get the event timing file for subject "sub-1", run "2"
event_run_2 = layout.get(subject="1", suffix="events", extension=".tsv", run="2", return_type="file")[0]
print("Event timing file for subject 1, run 2:", event_run_2)

# Fit the model. For simplicity, we do not include any confounds here.
fmri_glm_run_2 = fmri_glm.fit(func_run_2, events=event_run_2)

# Get contrast map between 'house' and 'bottle' conditions for run 2 (subject 1)
# Note that here we specify output_type='all' to get all the maps (z, t, effect size, variance)
# If we do not specify output_type, only z-map will be returned by default
contrast_run_2 = fmri_glm.compute_contrast('house - bottle',
                                             output_type='all')

# Visualize the contrast map between 'house' and 'bottle' conditions
plotting.plot_stat_map(
    contrast_run_2['z_score'],
    threshold=3.1, # z=3.1 correspond to p = 0.001 (1-tailed), uncorrected
    title='Contrast Map: House vs Bottle for run 2 (subject 1)'
)

In [None]:
#--- Get the contrast map for run 3 for subject 1 ---

# Get the file path of the preprocessed functional image for subject "sub-1", run "3"
func_run_3 = fsl_manager.find_path(subject="1", run="3", proc="clean", extension=".nii.gz")[0]
print("Preprocessed functional files for subject 1, run 3:", func_run_3) 
  
# Get the event timing file for subject "sub-1", run "3"
event_run_3 = layout.get(subject="1", suffix="events", extension=".tsv", run="3", return_type="file")[0]
print("Event timing file for subject 1, run 3:", event_run_3)

# Fit the model. For simplicity, we do not include any confounds here.
fmri_glm_run_3 = fmri_glm.fit(func_run_3, events=event_run_3)

# Get contrast map between 'house' and 'bottle' conditions for run 3 (subject 1)
# Note that here we specify output_type='all' to get all the maps (z, t, effect size, variance)
# If we do not specify output_type, only z-map will be returned by default
contrast_run_3 = fmri_glm.compute_contrast('house - bottle',
                                             output_type='all') 

# Visualize the contrast map between 'house' and 'bottle' conditions
plotting.plot_stat_map(
    contrast_run_3['z_score'],
    threshold=3.1, # z=3.1 correspond to p = 0.001 (1-tailed), uncorrected
    title='Contrast Map: House vs Bottle for run 3 (subject 1)'
)

#### ðŸ¤” Do it yourself: 
Get and display constrast maps between 'house' and 'bottle' conditions for run 2 and run 3 (subject 2)

_Type your answer in the cell below._

In [None]:
# Write and execute your code below to display the constrast map between 'house' and 'bottle' conditions for run 2 (subject 1)
# --- YOUR CODE HERE ---



As you can see, for the same contrast (house vs bottle), the activation patterns vary across different runs, even within the same subject. This variability can be due to several factors, including differences in attention, fatigue, or other cognitive states during each run. It highlights the importance of considering multiple runs and subjects when analyzing fMRI data to obtain robust and generalizable results.

Next, we will use fixed effects model to combine the results from multiple runs within the same subject. To compute fixed effects, we will need effect sizes and variances from all runs

In [None]:
# --- Compute fixed effects across the three runs for subject 1 ---

# get the list of effect size maps from the three runs
contrast_list = [contrast_run_1["effect_size"], contrast_run_2["effect_size"], contrast_run_3["effect_size"]]

# get the list of variance maps from the three runs
variance_list = [contrast_run_1["effect_variance"], contrast_run_2["effect_variance"], contrast_run_3["effect_variance"]]

# Compute fixed effects
# The argument return_z_score=True will make fixed_effects_stat to be z-score map
# _ is used to ignore the fourth output (not needed here)
fixed_effects_size, fixed_effects_variance, fixed_effects_stat, _ = compute_fixed_effects(contrast_list, variance_list, return_z_score=True)

# Visualize the fixed effects contrast map
plotting.plot_stat_map(
    fixed_effects_stat,
    threshold=3.1, # z=3.1 correspond to p = 0.001 (1-tailed), uncorrected
    title='Fixed Effects Contrast Map: House vs Bottle (subject 1)'
)

In [None]:
# --- Save the fixed effects results ---

# Create output directory for fixed effects results
output_dir = DERIV_DIR / "fsl_preproc" / "sub-1" / "fixed_effects"
output_dir.mkdir(parents=True, exist_ok=True)

# Save the fixed effects maps for subject 1
fixed_effects_size.to_filename(output_dir / "fixed_effects_size.nii.gz")
fixed_effects_variance.to_filename(output_dir / "fixed_effects_variance.nii.gz")
fixed_effects_stat.to_filename(output_dir / "fixed_effects_stat.nii.gz")

#### ðŸ¤” Do it yourself: 
Compute fixed effects across the three runs for subject 2 and save the fixed effect results

_Type your answer in the cell below._

In [None]:
# --- YOUR CODE HERE ---



After computing fixed effects for all subjects, we can proceed to group-level (second-level) analysis. For group-level analysis, it's similar to first-level analysis. We now treat each subject's fixed-effect constrast image (i.e., the beta image, or effect size image) as one data point in a group GLM.

The code below shows that how to get a list of all subjects' fixed effect size maps. Those maps are read from the fixed effect size maps we've saved in each subject's fixed effect file folders.

In [None]:
# --- Get a list of all subjects' fixed effect size maps ---

# first, get the list of all subjects
subject_list = layout.get_subjects()

# Initialize lists to hold fixed effects maps from all subjects
fixed_effects_size_list = []

# Load fixed effects results for all subjects
# Only use the fixed effects size maps for group-level analysis
# We use a loop to load the fixed effects size maps from all subjects
for subj in range(1, 5):
    fixed_effects_dir = DERIV_DIR / "fsl_preproc" / f"sub-{subj}" / "fixed_effects"
    size_map = image.load_img(fixed_effects_dir / "fixed_effects_size.nii.gz")
    fixed_effects_size_list.append(size_map) # append to the list
    
print(f"Loaded fixed effects size maps for {len(fixed_effects_size_list)} subjects.")

### Design matrix in second-level analysis

The `fixed_effects_size_list` is simply a collection of subject-level 3D contrast images (one image per subject). Each image serves as a single data point in the group-level GLM.

Just like the first-level GLM, the group model also requires a design matrix. The key difference is that at the group level each row corresponds to a subject, not a time point. This makes the design matrix much simpler: for a basic one-sample 
t-test, it reduces to a single column of ones (an intercept). Additional columns can be included to model between-subject variables such as group membership or behavioral scores.

In [None]:
# --- Create design matrix for group-level analysis ---

# For one-sample t-test, the design matrix contains only a single column of ones (an intercept)
# pd.DataFrame is a convenient way to create design matrices
design_matrix = pd.DataFrame({'intercept': [1] * len(subject_list)})

# Visualize the design matrix
plot_design_matrix(design_matrix)
plt.title("Second-level design matrix")
plt.ylabel("Subjects")
plt.show()

The design matrix shown above is the simplest case, capturing only the group average. But you may have other questions. For example, participants may fall into two groups that you want to compare, or you might be interested in the influence of a continuous covariate like age or a behavioral score. To address these questions, you simply add additional columns to the design matrix to model those effects.

In [None]:
# If you want to compare two groups (e.g., patients vs controls), you can create a design matrix with an additional column indicating group membership.
design_matrix = pd.DataFrame({
    'intercept': [1] * len(subject_list),
    'group': [0, 0, 0, 1, 1, 1]  # assuming first three subjects are controls (0) and last three are patients (1)
})

# Visualize the design matrix
plot_design_matrix(design_matrix)
plt.title("Second-level design matrix with group")
plt.ylabel("Subjects")
plt.show()

In [None]:
# If you want to look at the effect of a continuous covariate (e.g., age, behavioral score), you can add that as an additional column in the design matrix.
design_matrix = pd.DataFrame({
    'intercept': [1] * len(subject_list),
    'age': [25, 30, 22, 45, 50, 48]  # example ages
})

# Visualize the design matrix
plot_design_matrix(design_matrix)
plt.title("Second-level design matrix with age")
plt.ylabel("Subjects")
plt.show()

### Run the second-level GLM

In [None]:
# As here we are only interested in the group average, we will use the intercept-only design matrix for the group-level analysis.
design_matrix = pd.DataFrame({'intercept': [1] * 4})

# Create a second-level model object
second_level_model = SecondLevelModel()

# Fit the second-level model with the fixed effects size maps and the design matrix as inputs
second_level_model = second_level_model.fit(fixed_effects_size_list, design_matrix=design_matrix)

# To get the group-level contrast map for the intercept (i.e., group average)
# We can just provide the name of the column in the design matrix, which is 'intercept' here
# The output_type='z_score' will return the z-score map
group_contrast = second_level_model.compute_contrast('intercept', output_type='z_score')

In [None]:
# Let's visualize the group-level contrast map
plotting.plot_stat_map(
    group_contrast,
    threshold=1.96, # z=1.96 correspond to p = 0.05 (2-tailed), uncorrected
    title='Group-level Contrast Map: Average response to House vs Bottle'
)