## Meta matching v1.0
This jupyter notebook demonstrates you how to load and use meta-matching algorithm. In this demonstration, we performed meta-matching with 20 example subjects.

Package needed (and version this jupyter notebook tested):
* Numpy (1.19.2)
* Scipy (1.5.2)
* PyTorch (1.7.1)
* Scikit-learn (0.23.2)


### Step 0. Setup
Please modify the `path_repo` below to your repo position:


In [2]:
path_repo = './'

In [3]:
# initialization and random seed set

import os
import sys
import random
import scipy
import torch
import pickle
import sklearn
import numpy as np

seed = 42
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)

import warnings
warnings.filterwarnings("ignore")

### Step 1. load data
Load the example fake data that we provided, it contains
* Example input structural MRI T1 `x` with size of (20, 182x218x182)
    * 20 is number of subjects
    * 182x218x182 is dimension of 3D T1 data
* Example output phenotypes `y` with size of (20, 2)
    * 2 is number of phenotypes.
* Example icv data `icv` with size of (20, 1)
    * 1 is dimension of icv data.

In [4]:
data_path = os.path.join(path_repo, 'data')
model_path = os.path.join(path_repo, 'model')

from CBIG_model_pytorch import znorm_icv

npz = np.load(os.path.join(data_path, 'meta_matching_v1.0_data.npz'))
x_input = npz['x']
y_input = npz['y']
icv_input = npz['icv']
icv_input = znorm_icv(icv_input)
print(x_input.shape, y_input.shape, icv_input.shape)

(20, 182, 218, 182) (20, 2) (20, 1)


### Step 2. Split data
Here, we also split 20 subjects to 80/20, where 80 for training, and 20 for test.

In [5]:
from sklearn.model_selection import train_test_split
from CBIG_model_pytorch import mics_z_norm

x_train, x_test, icv_train, icv_test, y_train, y_test = train_test_split(x_input, icv_input, y_input, test_size=0.2, random_state=42)
n_subj_train, n_subj_test = x_train.shape[0], x_test.shape[0]
y_train, y_test, _, _ = mics_z_norm(y_train, y_test)
print(x_train.shape, x_test.shape, icv_train.shape, icv_test.shape, y_train.shape, y_test.shape)

(16, 182, 218, 182) (4, 182, 218, 182) (16, 1) (4, 1) (16, 2) (4, 2)


### Step 3. Meta-matching models predict
Here we apply the model pretrained on large source dataset (UK Biobank) to predict source phenotypes on `x_train` and `x_test`. We will get the predicted 67 source phenotypes on both 16 training subjects and 4 test subjects.

In [8]:
from CBIG_model_pytorch import metamatching_infer

y_train_pred = metamatching_infer(x_train, icv_train, y_train, model_path)
y_test_pred = metamatching_infer(x_test, icv_test, y_test, model_path)

print(y_train_pred.shape, '\n', y_train_pred)
print(y_test_pred.shape, '\n', y_test_pred)

./model/CBIG_ukbb_dnn_run_0_epoch_98.pkl_torch
./model/CBIG_ukbb_dnn_run_0_epoch_98.pkl_torch


RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: invalid header or archive is corrupted

### Step 4. Stacking
Perform stacking with `y_train_pred`, `y_test_pred`, `y_train`, where we use the prediction of 16 subjects `y_train_pred` (input) and real data `y_train` (output) to train the stacking model, then we applied the model to `y_test_pred` to get final prediction of 2 phenotypes on 4 subjects. Here
for simplicity of the example code, we use all 67 outputs from pretrained model as the input of stacking KRR model, if you want to select the top K outputs please see our [CBIG repo](https://github.com/ThomasYeoLab/CBIG/tree/master/stable_projects/predict_phenotypes/Naren2024_MMT1) for more details.

#### Hyperparameter Tuning
In `stacking()` function, we set the range of `alpha` as `[5, 10, 15, 20, 25, 30, 35, 40, 45, 50]`. You are weclomed to modify the range of `alpha` to get better performance on your own data.

In [None]:
from CBIG_model_pytorch import stacking

y_test_final_arr = np.zeros((y_test_pred.shape[0], y_train.shape[1]))
y_train_final_arr = np.zeros((y_train_pred.shape[0], y_train.shape[1]))
for i in range(y_train.shape[1]):
    # For each test phenotype, perform stacking by developing a KRR model
    y_test_final, y_train_final = stacking(y_train_pred, y_test_pred, y_train[:,i])
    y_test_final_arr[:,i] = y_test_final
    y_train_final_arr[:,i] = y_train_final
print(y_test_final_arr.shape, '\n', y_test_final_arr)

### Step 5. Evaluation
Evaluate the prediction performance.

In [None]:
from scipy.stats.stats import pearsonr

corr = np.zeros((y_train.shape[1]))
for i in range(y_train.shape[1]):
    corr[i] = pearsonr(y_test_final_arr[:, i], y_test[:, i])[0]
print(corr)

### Step 6. Haufe transform predictive network features (PNFs) computation
Here we compute the PNF for stacking we just performed. It computes the covariance between 2 phenotype predicitons and each voxel of 3D T1 data on the 16 training subjects. The final PNF is in shape of (87571, 2), where 87571 is number of voxel after crop, and 2 is number of phenotypes.

In [None]:
from CBIG_model_pytorch import covariance_rowwise, load_3D_input

x_train = load_3D_input(x_train)
cov = covariance_rowwise(x_train, y_train_final_arr)
print(cov, '\n', cov.shape)