# Fit models to be transfered
This notebook will fit and save a HBR model on a small dataset. 

In [1]:
from pcntoolkit import (
    load_fcon1000,
    NormData,
)
from modelspec import shashb1

In [3]:
# Download the dataset
norm_data: NormData = load_fcon1000(save_path="../data")

# Select only a few features
features_to_model = [
    "WM-hypointensities",
    "Right-Lateral-Ventricle",
    "Right-Amygdala",
    "CortexVol",
]
norm_data = norm_data.sel({"response_vars": features_to_model})

# Leave two sites out for doing transfer and extend later
transfer_sites = ["Milwaukee_b", "Oulu"]
transfer_data, fit_data = norm_data.split_batch_effects(
    {"site": transfer_sites}, names=("transfer", "fit")
)

# Split into train and test sets
train, test = fit_data.train_test_split()

Process: 37199 - 2025-05-20 10:41:12 - Dataset "fcon1000" created.
    - 1078 observations
    - 1078 unique subjects
    - 1 covariates
    - 217 response variables
    - 2 batch effects:
    	sex (2)
	site (23)
    
Process: 37199 - 2025-05-20 10:41:12 - Dataset "transfer" created.
    - 148 observations
    - 148 unique subjects
    - 1 covariates
    - 4 response variables
    - 2 batch effects:
    	sex (2)
	site (2)
    
Process: 37199 - 2025-05-20 10:41:12 - Dataset "fit" created.
    - 930 observations
    - 930 unique subjects
    - 1 covariates
    - 4 response variables
    - 2 batch effects:
    	sex (2)
	site (21)
    


In [4]:
# Load the modelspec from the file
model = shashb1
model.fit_predict(train, test)

Process: 37199 - 2025-05-20 10:41:19 - Fitting models on 4 response variables.
Process: 37199 - 2025-05-20 10:41:19 - Fitting model for WM-hypointensities.


Progress,Draws,Divergences,Step Size,Gradients/Draw
,2000,0,0.01,1023
,2000,0,0.01,1023
,2000,0,0.01,1023
,2000,0,0.01,1023


Process: 37199 - 2025-05-20 10:44:28 - Fitting model for Right-Lateral-Ventricle.


Progress,Draws,Divergences,Step Size,Gradients/Draw
,2000,0,0.01,1023
,2000,0,0.02,255
,2000,0,0.02,255
,2000,0,0.01,1023


Process: 37199 - 2025-05-20 10:47:04 - Fitting model for Right-Amygdala.


Progress,Draws,Divergences,Step Size,Gradients/Draw
,2000,0,0.01,255
,2000,0,0.01,1023
,2000,0,0.01,1023
,2000,0,0.01,767


Process: 37199 - 2025-05-20 10:49:34 - Fitting model for CortexVol.


Progress,Draws,Divergences,Step Size,Gradients/Draw
,2000,0,0.01,767
,2000,0,0.01,1023
,2000,0,0.01,255
,2000,0,0.02,255


Sampling: []


Process: 37199 - 2025-05-20 10:51:48 - Making predictions on 4 response variables.
Process: 37199 - 2025-05-20 10:51:48 - Computing z-scores for 4 response variables.
Process: 37199 - 2025-05-20 10:51:48 - Computing z-scores for WM-hypointensities.


Sampling: []


Process: 37199 - 2025-05-20 10:51:56 - Computing z-scores for Right-Amygdala.


Sampling: []


Process: 37199 - 2025-05-20 10:52:03 - Computing z-scores for CortexVol.


Sampling: []


Process: 37199 - 2025-05-20 10:52:10 - Computing z-scores for Right-Lateral-Ventricle.


Sampling: []


Process: 37199 - 2025-05-20 10:52:18 - Computing centiles for 4 response variables.
Process: 37199 - 2025-05-20 10:52:18 - Computing centiles for WM-hypointensities.


Sampling: []
Sampling: []
Sampling: []
Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 10:52:55 - Computing centiles for Right-Amygdala.


Sampling: []
Sampling: []
Sampling: []
Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 10:53:31 - Computing centiles for CortexVol.


Sampling: []
Sampling: []
Sampling: []
Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 10:54:07 - Computing centiles for Right-Lateral-Ventricle.


Sampling: []
Sampling: []
Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 10:54:44 - Computing log-probabilities for 4 response variables.
Process: 37199 - 2025-05-20 10:54:44 - Computing log-probabilities for WM-hypointensities.
Process: 37199 - 2025-05-20 10:54:45 - Computing log-probabilities for Right-Amygdala.
Process: 37199 - 2025-05-20 10:54:46 - Computing log-probabilities for CortexVol.
Process: 37199 - 2025-05-20 10:54:47 - Computing log-probabilities for Right-Lateral-Ventricle.


Sampling: []


Process: 37199 - 2025-05-20 10:54:48 - Dataset "synthesized" created.
    - 150 observations
    - 150 unique subjects
    - 1 covariates
    - 4 response variables
    - 2 batch effects:
    	sex (2)
	site (21)
    
Process: 37199 - 2025-05-20 10:54:48 - Synthesizing data for 4 response variables.
Process: 37199 - 2025-05-20 10:54:48 - Synthesizing data for WM-hypointensities.


Sampling: []


Process: 37199 - 2025-05-20 11:02:11 - Synthesizing data for Right-Lateral-Ventricle.


Sampling: []


Process: 37199 - 2025-05-20 11:02:13 - Synthesizing data for Right-Amygdala.


Sampling: []


Process: 37199 - 2025-05-20 11:02:15 - Synthesizing data for CortexVol.


Sampling: []


Process: 37199 - 2025-05-20 11:02:16 - Computing centiles for 4 response variables.
Process: 37199 - 2025-05-20 11:02:16 - Computing centiles for WM-hypointensities.


Sampling: []
Sampling: []
Sampling: []
Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:02:24 - Computing centiles for Right-Amygdala.


Sampling: []
Sampling: []
Sampling: []
Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:02:33 - Computing centiles for CortexVol.


Sampling: []
Sampling: []
Sampling: []
Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:02:41 - Computing centiles for Right-Lateral-Ventricle.


Sampling: []
Sampling: []
Sampling: []
Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:02:49 - Harmonizing data on 4 response variables.
Process: 37199 - 2025-05-20 11:02:49 - Harmonizing data for WM-hypointensities.


Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:03:03 - Harmonizing data for Right-Lateral-Ventricle.


Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:03:17 - Harmonizing data for Right-Amygdala.


Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:03:32 - Harmonizing data for CortexVol.


Sampling: []
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  non_be_df["marker"] = ["Other data"] * len(non_be_df)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  non_be_df["marker"] = ["Other data"] * len(non_be_df)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  non_be_df["marker"] = ["Other data"] * len(non_be_df)
A value is trying to be set on

Process: 37199 - 2025-05-20 11:03:47 - Saving model to:
	../models/model_to_transfer.
Process: 37199 - 2025-05-20 11:03:47 - Making predictions on 4 response variables.
Process: 37199 - 2025-05-20 11:03:47 - Computing z-scores for 4 response variables.
Process: 37199 - 2025-05-20 11:03:47 - Computing z-scores for WM-hypointensities.


Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:03:50 - Computing z-scores for Right-Amygdala.


Sampling: []


Process: 37199 - 2025-05-20 11:03:52 - Computing z-scores for CortexVol.


Sampling: []


Process: 37199 - 2025-05-20 11:03:54 - Computing z-scores for Right-Lateral-Ventricle.


Sampling: []


Process: 37199 - 2025-05-20 11:03:56 - Computing centiles for 4 response variables.
Process: 37199 - 2025-05-20 11:03:56 - Computing centiles for WM-hypointensities.


Sampling: []
Sampling: []
Sampling: []
Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:04:06 - Computing centiles for Right-Amygdala.


Sampling: []
Sampling: []
Sampling: []
Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:04:16 - Computing centiles for CortexVol.


Sampling: []
Sampling: []
Sampling: []
Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:04:26 - Computing centiles for Right-Lateral-Ventricle.


Sampling: []
Sampling: []
Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:04:36 - Computing log-probabilities for 4 response variables.
Process: 37199 - 2025-05-20 11:04:36 - Computing log-probabilities for WM-hypointensities.
Process: 37199 - 2025-05-20 11:04:37 - Computing log-probabilities for Right-Amygdala.
Process: 37199 - 2025-05-20 11:04:37 - Computing log-probabilities for CortexVol.
Process: 37199 - 2025-05-20 11:04:38 - Computing log-probabilities for Right-Lateral-Ventricle.


Sampling: []


Process: 37199 - 2025-05-20 11:04:39 - Dataset "synthesized" created.
    - 150 observations
    - 150 unique subjects
    - 1 covariates
    - 4 response variables
    - 2 batch effects:
    	sex (2)
	site (21)
    
Process: 37199 - 2025-05-20 11:04:39 - Synthesizing data for 4 response variables.
Process: 37199 - 2025-05-20 11:04:39 - Synthesizing data for WM-hypointensities.


Sampling: []


Process: 37199 - 2025-05-20 11:04:41 - Synthesizing data for Right-Lateral-Ventricle.


Sampling: []


Process: 37199 - 2025-05-20 11:04:42 - Synthesizing data for Right-Amygdala.


Sampling: []


Process: 37199 - 2025-05-20 11:04:44 - Synthesizing data for CortexVol.


Sampling: []


Process: 37199 - 2025-05-20 11:04:46 - Computing centiles for 4 response variables.
Process: 37199 - 2025-05-20 11:04:46 - Computing centiles for WM-hypointensities.


Sampling: []
Sampling: []
Sampling: []
Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:04:54 - Computing centiles for Right-Amygdala.


Sampling: []
Sampling: []
Sampling: []
Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:05:02 - Computing centiles for CortexVol.


Sampling: []
Sampling: []
Sampling: []
Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:05:11 - Computing centiles for Right-Lateral-Ventricle.


Sampling: []
Sampling: []
Sampling: []
Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:05:19 - Harmonizing data on 4 response variables.
Process: 37199 - 2025-05-20 11:05:19 - Harmonizing data for WM-hypointensities.


Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:05:23 - Harmonizing data for Right-Lateral-Ventricle.


Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:05:27 - Harmonizing data for Right-Amygdala.


Sampling: []
Sampling: []


Process: 37199 - 2025-05-20 11:05:31 - Harmonizing data for CortexVol.


Sampling: []
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  non_be_df["marker"] = ["Other data"] * len(non_be_df)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  non_be_df["marker"] = ["Other data"] * len(non_be_df)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  non_be_df["marker"] = ["Other data"] * len(non_be_df)
A value is trying to be set on

Process: 37199 - 2025-05-20 11:05:36 - Saving model to:
	../models/model_to_transfer.
