### OCI Data Science - Useful Tips
<details>
<summary><font size="2">Check for Public Internet Access</font></summary>

```python
import requests
response = requests.get("https://oracle.com")
assert response.status_code==200, "Internet connection failed"
```
</details>
<details>
<summary><font size="2">Helpful Documentation </font></summary>
<ul><li><a href="https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm">Data Science Service Documentation</a></li>
<li><a href="https://docs.cloud.oracle.com/iaas/tools/ads-sdk/latest/index.html">ADS documentation</a></li>
</ul>
</details>
<details>
<summary><font size="2">Typical Cell Imports and Settings for ADS</font></summary>

```python
%load_ext autoreload
%autoreload 2
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)

import ads
from ads.dataset.factory import DatasetFactory
from ads.automl.provider import OracleAutoMLProvider
from ads.automl.driver import AutoML
from ads.evaluations.evaluator import ADSEvaluator
from ads.common.data import ADSData
from ads.explanations.explainer import ADSExplainer
from ads.explanations.mlx_global_explainer import MLXGlobalExplainer
from ads.explanations.mlx_local_explainer import MLXLocalExplainer
from ads.catalog.model import ModelCatalog
from ads.common.model_artifact import ModelArtifact
```
</details>
<details>
<summary><font size="2">Useful Environment Variables</font></summary>

```python
import os
print(os.environ["NB_SESSION_COMPARTMENT_OCID"])
print(os.environ["PROJECT_OCID"])
print(os.environ["USER_OCID"])
print(os.environ["TENANCY_OCID"])
print(os.environ["NB_REGION"])
```
</details>

# A notebook to test running RBAs at the voxel level on fMRI data.
Brainhack-Aus 2022 project

Authors:
Gang Chen @afni-gangc
Christopher Nolan @crnolan
Kelly Garner @kel-github 
Lea Waller @HippocampusGirl
Daniel Tomasz @danieltomasz
Megan Campbell @meganEJcampbell
Preetom Pal @preqon
Adam @a-manoogian *
Bella @isabellaorlando *
Darin Leiter @dsleiter *
Arshiyan @Arshiyasan *
Judy Zhu @jd-zhu 


Modelling task-based fMRI data often involves performing a GLM at each voxel and then correcting for many many many multiple comparisons.

Here instead, we try performing a single hierarchical mixed effects model on all the voxels at once.

This provides advantages typical of Bayesian hiearchal modelling; information at upper levels of the hierarchy (e.g. across voxels) can help inform estimates at lower levels (the estimate for each voxel) - aka shrinkage - and we avoid the multiple comparisons problem by instead providing the strength of evidenve for the effect of interest at each voxel.

For a comprehensive introduction to this approach, see this paper and this paper by Gang Chen.

Here we test the feasibility of running Bayesian hierarchal modelling at the voxel level, by determining compute time across varying data sizes; both randomly generated and fMRI data.

# Running the notebook

To build the environment to run this notebook, follow the instructions [here](https://github.com/crnolan/pyrba)

# Import modules

In [1]:
import arviz as az
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import seaborn as sns
import numpy as np
import pandas as pd
import pymc as pm
import xarray as xr
import bambi as bmb
sns.set_theme(style='white')

# Generate data

Here is an example of how to generate a test dataset. We are simulating voxel data from a task with 2 conditions (e.g. go and stop conditions of a stop signal task), with additive values per voxel and per subject. 

In [2]:
# define the number of voxels and subjects for which you wish to simulate data
nvoxels = 1000
nsubs = 30
mean_cond_a = .5
mean_cond_b = .3
sd_cond_a = .15
sd_cond_b = .15

In [3]:
#create voxel noise parameters
noise_list = []
for i in range(0,nvoxels): #hardcoded
    mean =  np.random.normal(0, .08) #voxel noise paramerer - normal distribution
    sd =  abs(np.random.normal(0, .05)) #voxel noise paramerer - uniform distribution or absolute normal distribution
    noise_list.append([mean, sd])


In [4]:
noise_list

[[-0.07953100439618851, 0.04545793670293904],
 [-0.00724696417374766, 0.007833597661327556],
 [0.0359430574324949, 0.0287049113669399],
 [-0.05547943199512097, 0.068634687734844],
 [0.02985574036490054, 0.02871853492902675],
 [0.04920993146015775, 0.09201911025111985],
 [0.06550086473800425, 0.023149538246841288],
 [-0.03632252820133593, 0.018024362096888784],
 [-0.043934737384787786, 0.048308785836757615],
 [0.005405758926370034, 0.06090926437245822],
 [0.050556212404230395, 0.020342639744276938],
 [0.07032466007448636, 0.016021492545644177],
 [-0.16780343193155392, 0.05437774114603064],
 [0.10620306487153773, 0.13939220489913562],
 [-0.05025138008708896, 0.009835124877944988],
 [0.04787705354084043, 0.02406457613713628],
 [-0.15506923547597237, 0.034333399504129856],
 [0.08137417460284656, 0.03611597683825958],
 [-0.008554096728544291, 0.07482178373054187],
 [0.13000277256930812, 0.02422292044933326],
 [0.07977945985087896, 0.007179363479408453],
 [-0.03441417095397666, 0.01830636009

In [5]:
#define function to generate random voxel values
def generate_random_voxels(mean, sd, noise_list, length=nvoxels): #hardcoded
    # mean [1 value] - reflects the mean of the condition + subject
    # sd [as above, but sd]
    # noise_list arr[nvoxels, 2] rfx for each voxel
    voxels = []
    for v in range(length):
        mean = mean + noise_list[v][0] + np.random.normal(0, 0.2)# mean = condition + subrfx + voxel mean + residual noise
        sd = sd + noise_list[v][1] # as above but with sd
        voxels.append(np.random.normal(mean,sd)) 
    return voxels

In [6]:
participants = []
conditions = []

In [7]:
#create multi level index matrix
for i in range(nsubs):
    participants.append(i)
    participants.append(i)
    conditions.append(0)
    conditions.append(1)
arrays = [participants, conditions]
tuples = list(zip(*arrays))
multi_index = pd.MultiIndex.from_tuples(tuples, names=["participant", "condition"])

In [8]:
#initiate voxel list
data = np.zeros((nsubs*2,nvoxels)) #hardcoded to be *2 participants (for 2 conditions)
df = pd.DataFrame(data, index = multi_index)

In [9]:
data.shape

(60, 1000)

In [10]:
#populate the multi level index matrix
for participant in range(nsubs):
    # unique number for each paritipcant
    random_effect_mean = np.random.normal(0, .1) # drawn from common distribution
    random_effect_sd = abs(np.random.normal(0, .05))
    
    for condition in range(2): # 2 conditions
        if condition == 0:
            mean = mean_cond_a + random_effect_mean
            sd = sd_cond_a + random_effect_sd
            df.loc[participant, condition] = generate_random_voxels(mean,sd,noise_list) 
        if condition == 1:
            mean = mean_cond_b + random_effect_mean
            sd = sd_cond_b + random_effect_sd
            df.loc[participant, condition] = generate_random_voxels(mean,sd,noise_list) 

In [11]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,0,1,2,3,4,5,6,7,8,9,...,990,991,992,993,994,995,996,997,998,999
participant,condition,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
0,0,0.930762,0.978346,1.964742,0.915747,1.653046,1.995382,2.729353,2.023593,2.594368,2.700282,...,9.806326,48.338443,-1.72584,56.723521,-52.331836,65.467132,-2.96548,-5.36153,24.444902,40.343169
0,1,0.202021,0.790483,0.576306,0.824341,0.877298,1.030965,-0.249118,1.059485,0.610556,0.578149,...,-23.808601,6.211899,40.314585,-48.226948,69.463941,-19.458385,65.364787,-11.294869,30.19667,19.531155
1,0,0.670959,0.512161,0.079051,0.853072,0.448462,1.290849,1.626115,0.288019,1.521675,1.197015,...,-49.791734,-59.530882,-10.27804,-11.942479,-8.100754,-12.03872,-22.435984,-21.8954,1.873444,2.109579
1,1,0.038567,0.106768,0.414091,-0.064505,0.313942,-0.239963,1.338032,0.294052,0.872823,-0.48854,...,-9.511364,51.400172,-50.412902,-15.945365,-37.986043,38.733728,9.200616,-42.663034,-66.797297,-27.728152
2,0,0.785029,0.518225,0.498828,0.452685,0.386068,0.582072,0.481246,0.336648,1.137976,0.557137,...,38.917203,5.632842,-18.967563,74.932299,-79.708225,26.611696,-1.862321,-34.693676,52.571692,-77.816508
2,1,0.689356,0.941617,1.895802,1.75805,0.951013,0.4591,1.192534,1.267307,1.191887,0.950657,...,19.275353,71.681844,9.985358,35.114833,-21.968536,8.218871,-46.642605,36.725675,-35.06137,6.339713
3,0,1.075865,1.618171,1.823876,0.970041,0.826132,0.559834,1.084697,0.78764,1.126394,-0.174851,...,-56.1867,4.114319,29.120245,-4.229992,-37.419427,19.86401,13.797715,-11.923691,-70.456812,111.704775
3,1,0.142324,-0.193734,-0.16892,-0.038075,-0.240468,-0.559589,0.770021,-0.121628,1.238955,1.40997,...,27.602489,-12.501849,-87.855239,75.349079,58.170046,-30.964553,-34.462328,54.840857,-0.144583,-31.874501
4,0,0.598515,0.738268,0.63381,0.239825,0.949515,0.68077,1.119247,1.286659,0.761067,0.924098,...,54.657097,42.298509,34.573479,-35.617696,-12.175672,45.939745,21.234119,-28.953549,-10.896534,40.856722
4,1,0.091647,0.451676,0.514468,0.626707,0.265814,0.916883,-0.015165,-0.439699,-0.440372,-0.324254,...,-16.615263,-39.170897,-73.725782,-71.335826,-7.783456,-11.934665,21.463439,-33.46287,45.87493,19.759441


In [12]:
#melt to satisfy bambi long form
df = pd.melt(df, ignore_index=False, var_name="voxel_id", value_name = "BOLD")

df


Unnamed: 0_level_0,Unnamed: 1_level_0,voxel_id,BOLD
participant,condition,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0,0,0.930762
0,1,0,0.202021
1,0,0,0.670959
1,1,0,0.038567
2,0,0,0.785029
...,...,...,...
27,1,999,-9.583776
28,0,999,-2.823201
28,1,999,14.581491
29,0,999,35.614322


In [13]:
#write to csv
tmp = df


In [14]:
tmp.head(5)

Unnamed: 0_level_0,Unnamed: 1_level_0,voxel_id,BOLD
participant,condition,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0,0,0.930762
0,1,0,0.202021
1,0,0,0.670959
1,1,0,0.038567
2,0,0,0.785029


In [15]:
tmp.to_csv('df2.txt', sep=' ')

In [16]:
df = pd.read_csv('df2.txt', delimiter = ' ')
df.head(5)

Unnamed: 0,participant,condition,voxel_id,BOLD
0,0,0,0,0.930762
1,0,1,0,0.202021
2,1,0,0,0.670959
3,1,1,0,0.038567
4,2,0,0,0.785029


Now I want to define the following model to apply to the data:


In [17]:
model = bmb.Model("BOLD ~ condition + (1|participant) + (1|voxel_id)", data=df)

In [None]:
%%time

fitted = model.fit(tune=4000, 
                   draws=1000, 
                   chains=16, 
                   method='nuts_numpyro',
                   nuts_kwargs=dict(max_tree_depth=100))

warmup:   2%|▏         | 33/1500 [02:37<2:20:21,  5.74s/it]

In [None]:
fitted = model.fit(tune=4000, 
                   draws=1000, 
                   chains=16, 
                   method='nutpie',
                   nuts_kwargs=dict(max_tree_depth=100))

In [None]:
model.graph()

In [None]:
az.plot_trace(fitted, figsize=(20, 35))
az.summary(fitted)