# String labels in BData

The main body of BData, `dataste`, is a numpy array of numerical values and cannot contain strings. Instead, BData provides alternative trick to keep string information such as stimulus labels in BData: *value-label map* or *vmap*.

Vmap defines mappings between values in a specified column in Bdata.dataset and strings.

## Example

Here, the example data (`./data/sample_vmap.h5`) contains a column 'Label', which represents stimulus labels of each sample as numerical values. The original labels are given as strings such as 'label-01', 'label-02', ....

In [1]:
import bdpy

bdata = bdpy.BData('./data/sample_vmap.h5')  # Example data

Column 'Label' in the BData contains stimulus labels as numbers.

In [2]:
bdata.get('Label')

array([[1.],
       [3.],
       [4.],
       [2.],
       [3.],
       [1.],
       [2.],
       [4.]])

You can get the value-label map for the column 'Label' by `BData.get_vmap`.

In [3]:
bdata.get_vmap('Label')

{1.0: 'label-01', 2.0: 'label-02', 3.0: 'label-03', 4.0: 'label-04'}

`BData.get_label` returns values in the specified column as string labels, converting the value to label based on the vmap.

In [4]:
bdata.get_label('Label')

['label-01',
 'label-03',
 'label-04',
 'label-02',
 'label-03',
 'label-01',
 'label-02',
 'label-04']

You can regard the function as an alias of the following code.

In [5]:
[bdata.get_vmap('Label')[v] for v in bdata.get('Label').flatten()]

['label-01',
 'label-03',
 'label-04',
 'label-02',
 'label-03',
 'label-01',
 'label-02',
 'label-04']

## How to add vmap in BData

### Bdata created by `create_bdata_fmriprep`

When `bdpy.mri.create_bdata_fmriprep` (version 0.14rc2 or later) creates BData from fmriprep outputs, it automatically defines vmap in the resulting Bdata based on given `label_mapper` (label to value mapping specified as a tsv file).

### Manually defining vmap in BData

You can manually define a new vmap in Bdata by using `BData.add_vmap`.

In [6]:
import bdpy

bdata = bdpy.BData('./data/sample_vmap_nomap.h5')  # Bdata without vmap

bdata.get_label('Label') # This should cause an error since no vmap is defined in the BData

UnboundLocalError: local variable 'vmap' referenced before assignment

In [7]:
# Define value-label mapping as a dictionary
# Note that the key should be numerical values included in the specified column.
label_map = {1: 'label-01',
             2: 'label-02',
             3: 'label-03',
             4: 'label-04'}

# Add vmap
bdata.add_vmap('Label', label_map)  # Define vmap for column 'Label'

# Get labels
bdata.get_label('Label')

['label-01',
 'label-03',
 'label-04',
 'label-02',
 'label-03',
 'label-01',
 'label-02',
 'label-04']

## Appendix: creating sample data

In [None]:
import bdpy
import numpy as np

x = np.random.rand(8, 10)
labels = np.array([1, 3, 4, 2, 3, 1, 2, 4]).reshape(8, 1)

value_label_map = {1: 'label-01',
                   2: 'label-02',
                   3: 'label-03',
                   4: 'label-04'}

bdata = bdpy.BData()
bdata.add(x, 'Data')
bdata.add(labels, 'Label')

bdata.save('./data/sample_vmap_nomap.h5')

bdata.add_vmap('Label', value_label_map)

bdata.save('./data/sample_vmap.h5')

bdata.get_label('Label')