# Make BData

This notebook provides a hand-on tutorial for making BData from public datasets.
As an example dataset, we use THING-fMRI.

GitHub: <https://github.com/KamitaniLab/bdata-datasets>

## 0. Setup

In [None]:
import bdpy
import numpy as np
import pandas as pd


## 1. Examine the dataset

Checkbox:

- [ ] Raw data or preprocessed? BOLD signals or beta?
- [ ] What format (Nifti? Original format?)
- [ ] What additional data besides brain activity is available (stimulus, behaviour, physiological measures, ...)
- [ ] ROI or brain parcellation available?

## 2. Downlaod data

<https://plus.figshare.com/articles/dataset/THINGS-data_fMRI_Single_Trial_Responses_table_format_/20492835>

This notebook supposes that THING-fMRI data is downloaded and deployed in `src` as below:

```
src
└── fMRI-Single-Trial-Responses-table-format
    ├── betas_csv
    │   ├── sub-01_ResponseData.h5
    │   ├── sub-01_StimulusMetadata.csv
    │   ├── sub-01_VoxelMetadata.csv
    │   ├── sub-02_ResponseData.h5
    │   ├── sub-02_StimulusMetadata.csv
    │   ├── sub-02_VoxelMetadata.csv
    │   ├── sub-03_ResponseData.h5
    │   ├── sub-03_StimulusMetadata.csv
    │   └── sub-03_VoxelMetadata.csv
    └── betas_csv.tar.gz
```

## 3. Load THINGS-fMRI datast

In [None]:
voxel_file = "./src/fMRI-Single-Trial-Responses-table-format/betas_csv/sub-01_ResponseData.h5"
stim_file  = "./src/fMRI-Single-Trial-Responses-table-format/betas_csv/sub-01_StimulusMetadata.csv"
meta_file  = './src/fMRI-Single-Trial-Responses-table-format/betas_csv/sub-01_VoxelMetadata.csv'

voxel_data  = pd.read_hdf(voxel_file)
stim_data  = pd.read_csv(stim_file)
meta_data = pd.read_csv(meta_file)


In [None]:
display(voxel_data)
display(stim_data)
display(meta_data)


THING-fMRI:

- We will make BData for GLM-beta values.
- Each sample have the following attributes:
    - trial_type (str)
    - session (int)
    - run (int)
    - subject_id (str)
    - trial_id (int)
    - stimulus (str)
  - Each voxel have the following attributes (all numerical):
    - voxel_id, subject_id
    - voxel_x, voxel_y, voxel_z
    - nc_singletrial, nc_testset, splithalf_uncorrected, splithalf_corrected
    - pRF information: prf-eccentricity, prf-polarangle, prf-rsquared, prf-size
    - ROI flags: V1, V2, V3, hV4, VO1, VO2, LO1 (prf), LO2 (prf), TO1, TO2, V3b, V3a, lEBA, rEBA, lFFA, rFFA, lOFA, rOFA, lSTS, rSTS, lPPA, rPPA, lRSC, rRSC, lTOS, rTOS, lLOC, rLOC

## 4. Make data for BData

In [None]:
# Arrays to hold the data
n_voxels = len(voxel_data)
n_stimuli = len(stim_data)

voxel_data_ary = np.zeros([n_stimuli, n_voxels])

session_array = np.zeros([n_stimuli,])
run_ary       = np.zeros([n_stimuli,])
trial_array   = np.zeros([n_stimuli,])
stimulus_list  = []

# Iterate over the stimuli and fill the arrays
n_stimuli = len(stim_data)
for i in range(n_stimuli):
    voxel_data_ary[i, :] = voxel_data[i].values
    session_array[i] = stim_data['session'][i]
    run_ary[i]       = stim_data['run'][i]
    trial_array[i]     = stim_data['trial_id'][i]
    stimulus_list.append(stim_data['stimulus'][i])


In [None]:
# Convert stimulus_array from str to int

display(len(stimulus_list))

stimulus_set = np.unique(stimulus_list)
display(stimulus_set.shape)

stimulus_name_vmap     = {i: s for i, s in enumerate(stimulus_set)}
stimulus_name_vmap_rev = {s: i for i, s in enumerate(stimulus_set)}

display(stimulus_name_vmap)

stimulus_array = np.array([stimulus_name_vmap_rev[s] for s in stimulus_list])
display(stimulus_array)

In [None]:
# ROIs

rois = ["V1", "V2", "V3"]
roi_masks = {}

for roi in rois:
    roi_mask = meta_data[roi].values
    print(f"{roi}: {np.sum(roi_mask == 1)} / {roi_mask.size}")
    roi_masks.update({roi: roi_mask})

## 5. Create an initial empty BData

In [None]:
bdata = bdpy.BData()
display(bdata.dataset.shape)

## 6. Add data into BData

In [None]:
# Add dataset

bdata.add(voxel_data_ary, "VoxelData")

display(bdata.dataset.shape)

bdata.add(session_array, "Session")
bdata.add(run_ary, "Run")
bdata.add(trial_array, "Trial")
bdata.add(stimulus_array, "stimulus_name")

display(bdata.dataset.shape)

display(bdata.select("VoxelData").shape)

In [None]:
# Add vmap

display(bdata.select("stimulus_name"))

bdata.add_vmap("stimulus_name", stimulus_name_vmap)

display(bdata.get_labels("stimulus_name"))

In [None]:
# Add metadata (ROIs)

for roi in rois:
    bdata.add_metadata(roi, roi_masks[roi], description=f"Mask for {roi}", where="VoxelData")

display(bdata.select("V1").shape)

## 7. Save BData

In [None]:
bdata.save("sub-01_betas.h5")