# BICAMS z-normalization

**Objective**: Normalize raw scores on subtests of the Brief International Cognitive Assessment for Multiple Sclerosis (BICAMS) with respect to a Belgian, Dutch-speaking population

**Development**: Artificial Intelligence and Modelling in clinical Sciences (AIMS) lab, Vrije Universiteit Brussel (VUB)

**Reference**: [Costers et al. 2017](https://doi.org/10.1016/j.msard.2017.08.018)

**More information**: 
- Entire project: `README.md`
- [Input data, output data and conversion tables](#Additional-information)

***

# Preparation

Reminder: please make sure to have the input data in following format and in `data` folder:
- File name: 'data_to_transform.xlsx'
- Column names in the following order:
    - age
    - sex
    - education
    - sdmt
    - bvmt
    - cvlt

Please note that of the latter 3 columns, only 1 is an absolute requirement.

Import libraries

In [1]:
import pandas as pd

# Project-specific functions
from functions import normalization_pipeline

# Load data and conversion tables for raw-scaled conversion
from load_data import InputData
from load_data import ConversionTable

Load data (either mock data or your data)

In [2]:
input_data = InputData().data_all
demographics = InputData().demographics
cognitive_raw = InputData().cognitive

Load the conversion tables

In [3]:
conversion_table_dict = {'sdmt': ConversionTable().sdmt,
                         'bvmt': ConversionTable().bvmt,
                         'cvlt': ConversionTable().cvlt}

View head of your original dataframe

In [4]:
print(input_data.head())

   age  age^2  sex  education  sdmt  bvmt  cvlt
0   52   2704    1         12    42    25    50
1   19    361    2         21    60    20    30
2   32   1024    2         17    85    35    70
3   76   5776    1         12    75    10    60
4   50   2500    1         15    65    15    60


***

# Z normalization pipeline

Choose the z-score cutoff where below, you want to declare an impaired score

In [5]:
z_cutoff = -1.5

Fill transform_matrix, containing z scores and impairment booleans (1: impaired, 0: preserved) per test included

In [6]:
transform_matrix = []

for subject in range(input_data.shape[0]):

    # Initiations per subject
    z_row = []
    imp_row = []

    for test in cognitive_raw.columns:

        # Extract raw data from dataframe
        raw_scores = cognitive_raw[test]

        # Get correct conversion table
        conv_table = conversion_table_dict.get(test)

        # Calculate z-score and whether it is impaired or not for the test and subject
        z_score, imp_bool = normalization_pipeline(data_vector = demographics.iloc[subject],
                                                   raw_score= raw_scores.iloc[subject],
                                                   test = test,
                                                   conversion_table= conv_table,
                                                   z_cutoff= z_cutoff)
        # Append lists
        z_row.append(z_score)
        imp_row.append(imp_bool)

    # Append to general matrix
    transform_matrix.append(z_row + imp_row)

Convert to pandas dataframe with new column names

In [7]:
# Define new columnnames for dataframe
z_score_columns = [element + '_z' for element in cognitive_raw.columns]
imp_columns = [element + '_imp' for element in cognitive_raw.columns]
new_columns = z_score_columns + imp_columns

# Convert matrix to pandas dataframe
transform_matrix = pd.DataFrame(data=transform_matrix,
                                columns=new_columns)

***

# Merge calculations with original data

Concatenate original data with the z-scores and impairment boolean columns

In [8]:
transformed_data = pd.concat([input_data, transform_matrix], axis = 1)

Save the total dataframe to data folder

In [9]:
transformed_data.to_excel('data/transformed_data.xlsx')

View head of your new dataframe

In [10]:
print(transformed_data.head())

   age  age^2  sex  education  sdmt  bvmt  cvlt    sdmt_z    bvmt_z    cvlt_z  \
0   52   2704    1         12    42    25    50 -0.977061 -0.175797  0.644770   
1   19    361    2         21    60    20    30 -2.450896 -3.232009 -2.078900   
2   32   1024    2         17    85    35    70  1.974552  1.135338  2.546591   
3   76   5776    1         12    75    10    60  2.891039 -3.759040  1.493038   
4   50   2500    1         15    65    15    60  0.955914 -2.663802  1.351303   

   sdmt_imp  bvmt_imp  cvlt_imp  
0         0         0         0  
1         1         1         1  
2         0         0         0  
3         0         1         0  
4         0         1         0  


***

# Additional information

### Description of the input data

In [11]:
print(InputData().description)

------------------------------------------------------
Input data to be transformed: 'data_to_transform.xlsx'
------------------------------------------------------
Required format of the input data:

The input data 'data_to_transform.csv' should contain the following features and data types:
- age: int, age in years
- sex: int
    - 1: Male
    - 2: Female
- education: int, amount of years education
    - 6 years: Finished primary school
    - 12 years: Finished high school
    - 13 years: Professional education
    - 15 years: BSc
    - 17 years: MSc
    - 21 years: PhD

Furthermore, the following features are optional (but at least 1 required):
- sdmt: int, raw sdmt score to be normalised
- bvmt: int, raw bvmt score to be normalised
- cvlt: int, raw cvlt score to be normalised


### Description of the output data

In [12]:
print(open('data_descriptions/transformed_data_description.txt', 'r').read())

-------------------------------------------------
Output data (tranformed): 'transformed_data.xlsx'
-------------------------------------------------
Data description:

demographics:
- age: int, age in years
- age^2: int, age column squared
- sex: int
    - 1: Male
    - 2: Female
- education: int, amount of years education
    - 12 years: Finished high school
    - 15 years: BSc
    - 17 years: MSc
    - 21 years: PhD

raw cognitive scores:
- sdmt: int, raw sdmt score to be normalised
- bvmt: int, raw bvmt score to be normalised
- cvlt: int, raw cvlt score to be normalised

z-scores:
- sdmt_z: float, z-normalized score of sdmt
- bvmt_z: float, z-normalized score of bvmt
- cvlt_z: float, z-normalized score of cvlt

impairment booleans:
- sdmt_imp: 1 (impaired), 0 (preserved)
- bvmt_imp: 1 (impaired), 0 (preserved)
- cvlt_imp: 1 (impaired), 0 (preserved)



### Description of the conversion tables

In [13]:
print(ConversionTable().description)

-------------------------------------------------
Conversion tables to convert raw to scaled scores
-------------------------------------------------

Every conversion table accords with one of the 3 cognitive tests from BICAMS (SDMT, BVMT-R and CVLT-II).
It consists of the following columns:
- scaled_score: Categorical variable, the scaled score that accords with a raw score within the following interval lower and upper bound.
- lower bound: lower bound of the raw score to yield a certain scaled score
- upper bound: upper bound of the raw score to yield a certain scaled score

Thus: scaled_score accords with lower_bound <= raw_score <= upper_bound
Note: Also 'equal to' belongs to the interval between the lower and upper bounds!

