## Download dataset

The developmental fMRI dataset available through Nilearn contains functional MRI (fMRI) data from over 150 children and adults who watched a short Pixar film titled Partly Cloudy during scanning. This dataset was originally collected by Rebecca Saxeâ€™s lab at MIT and is hosted on OpenNeuro.



**Pixar film:** https://www.youtube.com/watch?v=Hb7yykqb85U

**Dataset:** https://openneuro.org/datasets/ds000228/versions/1.0.0

**Nilearn:** https://nilearn.github.io/dev/modules/generated/nilearn.datasets.fetch_development_fmri.html



In [13]:
import os
import numpy as np
import pandas as pd
from nilearn import datasets
from nilearn.maskers import NiftiMapsMasker 
from nilearn.connectome import ConnectivityMeasure


# -------------------------------
# 1. LOAD DEVELOPMENT FMRI DATASET
# -------------------------------
path_folder = '/home/jaizor/jaizor/xtra/data/nilearn_data'
dataset = datasets.fetch_development_fmri(verbose=0,data_dir=path_folder)
fmri_files = dataset.func          # List of 4D fMRI filenames (155 subjects)
confound_files = dataset.confounds # List of confound files (155)
pheno = pd.DataFrame(dataset.phenotypic)

print(f"Loaded {len(fmri_files)} subjects.")
print("Phenotypic columns:", pheno.columns.tolist())

Loaded 155 subjects.
Phenotypic columns: ['participant_id', 'Age', 'AgeGroup', 'Child_Adult', 'Gender', 'Handedness']


In [14]:
pheno.head()

Unnamed: 0,participant_id,Age,AgeGroup,Child_Adult,Gender,Handedness
154,sub-pixar155,26.0,Adult,adult,M,R
122,sub-pixar123,27.06,Adult,adult,F,R
123,sub-pixar124,33.44,Adult,adult,M,R
124,sub-pixar125,31.0,Adult,adult,M,R
125,sub-pixar126,19.0,Adult,adult,F,R


In [15]:
# -------------------------------
# 2. SET UP ATLAS & CONNECTIVITY
# -------------------------------

# Atlas
difumo = datasets.fetch_atlas_difumo(dimension=64, resolution_mm=2)

# Masker 
masker = NiftiMapsMasker(
    maps_img=difumo.maps,
    standardize="zscore_sample",     # Standardize BOLD time series (per region)
    standardize_confounds=True,      # Standardize each confound regressor (mean=0, std=1)
    memory="nilearn_cache",
    verbose=0
)

# Connectivity
connectome_measure = ConnectivityMeasure(
    kind="correlation",        # Compute Pearson correlation between region time series
    vectorize=True,            # Convert each subject's NxN matrix into a 1D feature vector                             
    discard_diagonal=True      # Exclude the diagonal (self-connections), which are always 1.0 
                               
)

In [16]:
# -------------------------------
# 3. EXTRACT FEATURES FOR ALL SUBJECTS
# -------------------------------

all_features = []
for i, (fmri, conf) in enumerate(zip(fmri_files, confound_files)):
    print(f"Processing subject {i+1}/{len(fmri_files)}")
    
    # In fMRI, confounds are non-neural signals that can distort your brain activity estimates
    confounds = pd.read_csv(conf, sep='\t').values  # shape: (n_time, 15)
    

    # Extract denoised time series
    time_series = masker.fit_transform(fmri, confounds=confounds)  # (n_time, 64)
    
    # Compute connectivity vector
    conn_vec = connectome_measure.fit_transform([time_series])[0]  # (2016,)
    all_features.append(conn_vec)


# Convert to numpy array
X_features = np.array(all_features)  # Shape: (155, 2016)
print("âœ… Feature matrix shape:", X_features.shape)

Processing subject 1/155


  data_to_wrap = f(self, X, *args, **kwargs)


Processing subject 2/155
Processing subject 3/155
Processing subject 4/155
Processing subject 5/155
Processing subject 6/155
Processing subject 7/155
Processing subject 8/155
Processing subject 9/155
Processing subject 10/155
Processing subject 11/155
Processing subject 12/155
Processing subject 13/155
Processing subject 14/155
Processing subject 15/155
Processing subject 16/155
Processing subject 17/155
Processing subject 18/155
Processing subject 19/155
Processing subject 20/155
Processing subject 21/155
Processing subject 22/155
Processing subject 23/155
Processing subject 24/155
Processing subject 25/155
Processing subject 26/155
Processing subject 27/155
Processing subject 28/155
Processing subject 29/155
Processing subject 30/155
Processing subject 31/155
Processing subject 32/155
Processing subject 33/155
Processing subject 34/155
Processing subject 35/155
Processing subject 36/155
Processing subject 37/155
Processing subject 38/155
Processing subject 39/155
Processing subject 4

In [17]:
# -------------------------------
# 4. SAVE OUTPUT
# -------------------------------
# Match subject IDs
subject_ids = pheno['participant_id'].values
assert len(subject_ids) == X_features.shape[0]

# Save features + metadata
np.savez_compressed(
    'development_fmri_connectivity_features.npz',
    X=X_features,
    subject_ids=subject_ids,
    labels=pheno[['Child_Adult', 'Age', 'Gender']].to_records(index=False)
)

# Also save phenotypic data separately (for easy loading)
pheno.to_csv('development_fmri_pheno.csv', index=False)

print("ðŸ’¾ Features and phenotypes saved successfully!")

ðŸ’¾ Features and phenotypes saved successfully!
