# Pybids transformers and generating BIDS stats models

Adapted from the slides made by Jeanette Mumford:
https://docs.google.com/presentation/d/1Bsfx9K4jz-YveUA4JpmqK-s1LnFNqFaGemEDbgFBoOc/edit#slide=id.g1269976e58a_0_85

In general see the the pybids transformers specifications for more information
https://docs.google.com/document/d/1uxN6vPWbC7ciAx2XWtT5Y-lBrdckZKpPdNUNpwRxHoU/edit#heading=h.kuzdziksbkpm

The pybids transformers allow you to perform a wide range of operation on the many "variables" that exist in a BIDS datasets. 
The transformers are mostly used by the BIDS stats model and it is rare that you will have to use them exatly as described below.

However this notebook may help you how they are used within context of the BIDS stats model.


In [69]:
%load_ext autoreload
%autoreload 2

---

Transformers act on the `collections` that you can get from the layout of a dataset.

In [70]:
from os.path import join
from bids import BIDSLayout
from bids.tests import get_test_data_path

layout_path = join(get_test_data_path(), "7t_trt")
layout = BIDSLayout(layout_path)

In [71]:
# get a collection at the dataset level
dataset = layout.get_collections("dataset", merge=True)
dataset.variables

{'age_at_first_scan_years': <bids.variables.variables.SimpleVariable at 0x7fd2e3e0daf0>,
 'number_of_scans_before': <bids.variables.variables.SimpleVariable at 0x7fd2e2b751f0>,
 'handedness': <bids.variables.variables.SimpleVariable at 0x7fd2e2b76340>,
 'sex': <bids.variables.variables.SimpleVariable at 0x7fd2e2cd0130>}

In [72]:
# that you can more easily view by turning them into a pandas dataframe
dataset.to_df()

Unnamed: 0,subject,age_at_first_scan_years,handedness,number_of_scans_before,sex,suffix
0,1,29,100,17,F,participants
1,2,23,100,6,F,participants
2,3,25,86,18,M,participants
3,4,26,100,8,M,participants
4,5,27,-84,28,M,participants
5,6,23,100,27,F,participants
6,7,29,100,9,M,participants
7,8,25,90,28,M,participants
8,9,25,100,37,M,participants
9,10,24,89,30,M,participants


In [73]:
# you can do the same with at the session, subject, run level
session_df = layout.get_collections(level='session', merge=True).to_df()
session_df.head()

subject_df = layout.get_collections(level='subject', merge=True).to_df()
subject_df.head()

Unnamed: 0,session,subject,CCPT_FN_count,CCPT_FP_count,CCPT_avg_FN_RT,CCPT_avg_FP_RT,CCPT_avg_succ_RT,CCPT_succ_count,caffeine_daily,diastolic_blood_pressure_left,...,specific_vague,subject_id,surroundings,systolic_blood_pressure_left,systolic_blood_pressure_right,thirst,vigilance,vigilance_nyc-q,words,suffix
0,1,1,0.0,1.0,,507.0,500.770833,96.0,0.5,64,...,95,1,0,108,109,9,9,100,100,sessions
1,1,2,0.0,5.0,,297.6,351.729167,96.0,0.0,65,...,100,2,70,99,100,2,7,100,100,sessions
2,1,3,0.0,1.0,,441.0,426.71875,96.0,1.0,69,...,100,3,10,122,128,3,8,100,0,sessions
3,1,4,0.0,1.0,,443.0,417.90625,96.0,0.1,74,...,80,4,0,130,110,6,5,100,85,sessions
4,1,5,0.0,2.0,,355.5,372.114583,96.0,0.0,69,...,75,5,80,105,117,7,7,60,30,sessions


---

For the sake of this notebook we will however create some collection starting from pandas dataframe

In [74]:
import pandas as pd
from bids.variables.collections import BIDSVariableCollection

In [113]:
dataset_df = pd.DataFrame({
    "particiant_id": ["sub-01", "sub-02", "sub-03", "sub-04",],
    "sex": ["M", "M", "F", "F"],
    "age": [25, 18, 22, 25]
})

dataset_df

Unnamed: 0,particiant_id,sex,age
0,sub-01,M,25
1,sub-02,M,18
2,sub-03,F,22
3,sub-04,F,25


In [114]:
dataset = BIDSVariableCollection.from_df(dataset_df, source="dataset")

dataset.to_df()

Unnamed: 0,index,age,particiant_id,sex
0,amplitude,25,sub-01,M


We will also create a collection that would correspond to the content of an `events.tsv`

In [77]:
run_df = pd.DataFrame({
    "onset": [20, 37.5, 60, 180, 182.5, 230],
    "duration": [2, 2, 2, 2, 2, 2],
    "trial_type": ["word", "word", "word", "pseudoword", "pseudoword", "pseudoword"],
    "rt_pretend": [0.5, 0.6, 0.55, 0.5, 0.7, 0.8],
})

run_df

Unnamed: 0,onset,duration,trial_type,rt_pretend
0,20.0,2,word,0.5
1,37.5,2,word,0.6
2,60.0,2,word,0.55
3,180.0,2,pseudoword,0.5
4,182.5,2,pseudoword,0.7
5,230.0,2,pseudoword,0.8


Add amplitude as it seems necessary

In [78]:
run_df["amplitude"] = [1, 1, 1, 1, 1, 1]
run_df

Unnamed: 0,onset,duration,trial_type,rt_pretend,amplitude
0,20.0,2,word,0.5,1
1,37.5,2,word,0.6,1
2,60.0,2,word,0.55,1
3,180.0,2,pseudoword,0.5,1
4,182.5,2,pseudoword,0.7,1
5,230.0,2,pseudoword,0.8,1


In [79]:
run = BIDSVariableCollection.from_df(run_df)
run.variables

{'onset': <bids.variables.variables.SimpleVariable at 0x7fd2c4256f10>,
 'duration': <bids.variables.variables.SimpleVariable at 0x7fd2c4256340>,
 'trial_type': <bids.variables.variables.SimpleVariable at 0x7fd30e3c7f10>,
 'rt_pretend': <bids.variables.variables.SimpleVariable at 0x7fd2c4256f70>,
 'amplitude': <bids.variables.variables.SimpleVariable at 0x7fd2c42414c0>}

Let's create 2 functions to easily reuse those later:

In [103]:
def dataset_collection():
    dataset_df = pd.DataFrame({
        "particiant_id": ["sub-01", "sub-02", "sub-03", "sub-04"],
        "sex": ["M", "M", "F", "F"],
        "age": [25, 18, 22, 25]
    })
    return BIDSVariableCollection.from_df(dataset_df)

dataset = dataset_collection()
dataset.to_df()

Unnamed: 0,index,age,particiant_id,sex
0,amplitude,25,sub-01,M


In [101]:
def run_collection():
    run_df = pd.DataFrame({
        "onset": [20, 37.5, 60, 180, 182.5, 230],
        "duration": [2, 2, 2, 2, 2, 2],
        "trial_type": ["word", "word", "word", "pseudoword", "pseudoword", "pseudoword"],
        "rt_pretend": [0.5, 0.6, 0.55, 0.5, 0.7, 0.8],
    })
    run_df["amplitude"] = [1, 1, 1, 1, 1, 1]
    return BIDSVariableCollection.from_df(run_df)
    
run = run_collection()
run.to_df()

Unnamed: 0,index,amplitude,duration,onset,rt_pretend,trial_type
0,amplitude,1,2,20.0,0.5,word


---

## Factor

```json
{"Instructions":
 [
     {"Name": "Factor",
      "Input": "sex"}
 ]
}
```

In [81]:
from bids.modeling.transformations.munge import Factor

In [82]:
Factor?

[0;31mInit signature:[0m [0mFactor[0m[0;34m([0m[0mcollection[0m[0;34m,[0m [0mvariables[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m      <no docstring>
[0;31mFile:[0m           ~/github/pybids/bids/modeling/transformations/munge.py
[0;31mType:[0m           ABCMeta
[0;31mSubclasses:[0m     


In [118]:
layout_path = join(get_test_data_path(), 'ds005')
layout = BIDSLayout(layout_path)
c = layout.get_collections('dataset', merge=True)

Factor(c, 'sex')

c.to_df().head()

Unnamed: 0,subject,age,sex.0,sex.1,suffix
0,1,28.0,1.0,0.0,participants
1,2,21.0,0.0,1.0,participants
2,3,27.0,0.0,1.0,participants
3,4,25.0,1.0,0.0,participants
4,5,20.0,0.0,1.0,participants


## Factor and product

```json
{"Instructions":
 [
     {"Name": "Factor",
      "Input": "sex"},
     {"Name": "Product",
      "Input": ["sex.1", "age"],
      "Output": "ageM"},
     {"Name": "Product",
      "Input": ["sex.0", "age"],
      "Output": "ageF"}
 ]
}
```

In [84]:
from bids.modeling.transformations.compute import Product

In [85]:
Product?

[0;31mInit signature:[0m [0mProduct[0m[0;34m([0m[0mcollection[0m[0;34m,[0m [0mvariables[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m      <no docstring>
[0;31mFile:[0m           ~/github/pybids/bids/modeling/transformations/compute.py
[0;31mType:[0m           ABCMeta
[0;31mSubclasses:[0m     


In [116]:
layout_path = join(get_test_data_path(), 'ds005')
layout = BIDSLayout(layout_path)
c = layout.get_collections('dataset', merge=True)

Factor(c, 'sex')
Product(c, ["sex.1", "age"], output="ageM")
Product(c, ["sex.0", "age"], output="ageF")

c.to_df().head()

Unnamed: 0,subject,age,ageF,ageM,sex.0,sex.1,suffix
0,1,28.0,28.0,0.0,1.0,0.0,participants
1,2,21.0,0.0,21.0,0.0,1.0,participants
2,3,27.0,0.0,27.0,0.0,1.0,participants
3,4,25.0,25.0,0.0,1.0,0.0,participants
4,5,20.0,0.0,20.0,0.0,1.0,participants


## Threshold

```json
{"Instructions":
 [
     {"Name": "Threshold",
      "Input": "age",
      "Threshold": 20,
      "binarize": true,
      "Output": "age_gt_20"}
 ]
}
```

In [87]:
from bids.modeling.transformations.compute import Threshold

In [88]:
Threshold?

[0;31mInit signature:[0m [0mThreshold[0m[0;34m([0m[0mcollection[0m[0;34m,[0m [0mvariables[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
Threshold and/or binarize a variable.

Parameters
----------
data :obj:`pandas.Series` or :obj:`pandas.DataFrame`
    The pandas structure to threshold.
threshold : float
    The value to binarize around (values above will
    be assigned 1, values below will be assigned 0).
binarize : bool
    If True, binarizes all non-zero values (i.e., every
    non-zero value will be set to 1).
above : bool
    Specifies which values to retain with respect to the
    cut-off. If True, all value above the threshold will be kept; if
    False, all values below the threshold will be kept. Defaults to
    True.
signed : bool
    Specifies whether to treat the threshold as signed
    (default) or unsigned. For example, when passing above=True and
    threshold

If “Binarize” is False (default) it zeros values below threshold.  Can also add “Above”: true if you want to reverse the threshsold. 

In [89]:
layout_path = join(get_test_data_path(), 'ds005')
layout = BIDSLayout(layout_path)
c = layout.get_collections('dataset', merge=True)

Threshold(c, "age", threshold=20, binarize=True, output="age_gt_20")

c.to_df().head()

Unnamed: 0,subject,age,age_gt_20,sex,suffix
0,1,28,1,0,participants
1,2,21,1,1,participants
2,3,27,1,1,participants
3,4,25,1,0,participants
4,5,20,1,1,participants


## Scale

```json
{"Instructions":
 [
     {"Name": "Scale",
      "Input": "age",
      "Output": "age_centered_scaled"},
     {"Name": "Scale",
      "Input": "age",
      "Demean": true,
      "Rescale": false,
      "Output": "age_centered_not_scaled"},
 ]
}
```

In [90]:
from bids.modeling.transformations.compute import Scale

In [91]:
Scale?

[0;31mInit signature:[0m [0mScale[0m[0;34m([0m[0mcollection[0m[0;34m,[0m [0mvariables[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
Scale a variable.

Parameters
----------
data : :obj:`pandas.Series` or :obj:`pandas.DataFrame`
    The variables to scale.
demean : bool
    If True, demean each column.
rescale : bool
    If True, divide variables by their standard deviation.
replace_na : str
    Whether/when to replace missing values with 0. If
    None, no replacement is performed. If 'before', missing values are
    replaced with 0's before scaling. If 'after', missing values are
    replaced with 0 after scaling.

Notes
-----
If a constant column is passed in, and replace_na is None or 'before', an
exception will be raised.
[0;31mFile:[0m           ~/github/pybids/bids/modeling/transformations/compute.py
[0;31mType:[0m           ABCMeta
[0;31mSubclasses:[0m     


In [92]:
layout_path = join(get_test_data_path(), 'ds005')
layout = BIDSLayout(layout_path)
c = layout.get_collections('dataset', merge=True)

Scale(c, "age", output="age_centered_scaled")
Scale(c, "age", demean=True, rescale=False, output="age_centered_not_scaled")

c.to_df().head()

Unnamed: 0,subject,age,age_centered_not_scaled,age_centered_scaled,sex,suffix
0,1,28.0,5.9375,2.073992,0.0,participants
1,2,21.0,-1.0625,-0.371135,1.0,participants
2,3,27.0,4.9375,1.724688,1.0,participants
3,4,25.0,2.9375,1.02608,0.0,participants
4,5,20.0,-2.0625,-0.720439,1.0,participants


## And / Or / Not

```json
{"Instructions":
 [
     {"Name": "Factor",
      "Input": "sex"},
     {"Name": "Threshold",
      "Input": "age",
      "Threshold": 20,
      "binarize": true,
      "Output": "age_gt_20"}
 ]
}
```


In [93]:
from bids.modeling.transformations.compute import And_
from bids.modeling.transformations.compute import Or_
from bids.modeling.transformations.compute import Not

In [94]:
And_?

[0;31mInit signature:[0m [0mAnd_[0m[0;34m([0m[0mcollection[0m[0;34m,[0m [0mvariables[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
Logical AND on two or more variables.

Parameters
----------
dfs : list of :obj:`pandas.DataFrame`
    variables to enter into the conjunction.
[0;31mFile:[0m           ~/github/pybids/bids/modeling/transformations/compute.py
[0;31mType:[0m           ABCMeta
[0;31mSubclasses:[0m     


In [95]:
layout_path = join(get_test_data_path(), 'ds005')
layout = BIDSLayout(layout_path)
c = layout.get_collections('dataset', merge=True)

Factor(c, 'sex')
Threshold(c, "age", threshold=20, binarize=True, output="age_gt_20")

And_(c, ["sex.1", "age_gt_20"], output="men_older_than_20")

c.to_df().head()

Unnamed: 0,subject,age,age_gt_20,men_older_than_20,sex.0,sex.1,suffix
0,1,28.0,1.0,0.0,1.0,0.0,participants
1,2,21.0,1.0,1.0,0.0,1.0,participants
2,3,27.0,1.0,1.0,0.0,1.0,participants
3,4,25.0,1.0,0.0,1.0,0.0,participants
4,5,20.0,1.0,1.0,0.0,1.0,participants


## Generating BIDS stats models

pybids has the possibily to build a "default" model for a given dataset to help you get started and so you do not have to build yours from scratch.

In [96]:
from bids.modeling import auto_model
import json

layout_path = join(get_test_data_path(), 'ds005')
layout = BIDSLayout(layout_path)

# because the test datasets of pybids have no images 
# we need to give it a dummy scan_length for this to run
model = auto_model(layout, scan_length=600, one_vs_rest=True)

with open("model-ds005_smdl.json", "w") as outfile:
    json.dump(model[0], outfile)

In [97]:
!cat model-ds005_smdl.json

{"Name": "ds005_mixedgamblestask", "Description": "Autogenerated model for the mixedgamblestask task from ds005", "Input": {"Task": "mixedgamblestask"}, "Nodes": [{"Level": "Run", "Name": "Run", "Transformations": [{"Name": "Factor", "Input": ["trial_type"]}, {"Name": "Convolve", "Input": ["trial_type.parametric gain"]}], "Model": {"X": ["trial_type.parametric gain"]}, "Contrasts": [{"Name": "run_parametric gain", "ConditionList": ["trial_type.parametric gain"], "Weights": [1.0], "Test": "t"}]}, {"Level": "Subject", "Name": "Subject", "Model": {"X": ["run_parametric gain"]}, "Contrasts": [{"Name": "subject_run_parametric gain", "ConditionList": ["run_parametric gain"], "Weights": [1], "Test": "FEMA"}]}, {"Level": "Dataset", "Name": "Dataset", "Model": {"X": ["subject_run_parametric gain"]}, "Contrasts": [{"Name": "dataset_subject_run_parametric gain", "ConditionList": ["subject_run_parametric gain"], "Weights": [1], "Test": "t"}]}]}