# Initial study

This notebook identifies PPMI subjects to reproduce the following paper:


<div class="alert alert-block alert-success">
Scherfler, Christoph, et al. <a href=https://onlinelibrary.wiley.com/doi/pdf/10.1002/ana.22245>White and gray matter abnormalities in idiopathic rapid eye movement sleep behavior disorder: a diffusion‐tensor imaging and voxel‐based morphometry study.</a> Annals of neurology 69.2 (2011): 400-407. 
</div>

This study recruited 34 patients with iRBD and confirmed iRBD diagnosis with polysomnography (PSG). Patients had no PD or dementia at time of MRI. The demographics parameters were as follows (table extracted from the paper):

<img src="demographics.png"/>

# PPMI metadata download

Let's download the required metadata files from PPMI:

In [1]:
import os
import ppmi_metadata

data_dir = 'data'
required_files = ['Age_at_visit.csv',  'Demographics.csv',  'Magnetic_Resonance_Imaging__MRI_.csv', 
                   'REM_Sleep_Behavior_Disorder_Questionnaire.csv']

missing_files = [x for x in required_files if not os.path.exists(os.path.join(data_dir, x))]

if len(missing_files) > 0:
    ppmi = ppmi_metadata.PPMIMetaDataDownloader('<ppmi_email>', '<ppmi_pass>')
    ppmi.download_metadata(missing_files, destination_dir=data_dir, headless=False, timeout=600)


# PPMI subjects

PPMI subjects underwent the REM Sleep Behavior Disorder Screening Questionnaire [[1]](https://movementdisorders.onlinelibrary.wiley.com/doi/10.1002/mds.21740). While this tool was developed to assess RBD, there has been concerns about its use for de novo Parkinson's Disease patients [[2]](https://movementdisorders.onlinelibrary.wiley.com/doi/10.1002/mdc3.12591). We will use a cut-off score of 5 to identify RBD subjects among non-PD subjects. The cutoff score of 6 suggested in [[3]](https://www.sciencedirect.com/science/article/pii/S138994571100164X) applies to PD patients.

## Get RBD subjects from PPMI

In [2]:
# Load PPMI RSBDQ data
import pandas as pd
import os.path as op

df = pd.read_csv(op.join('data', 'REM_Sleep_Behavior_Disorder_Questionnaire.csv'))

In [3]:
# Compute and add RBDSQ score
df['RBDSQ'] = (df['DRMVIVID'] +  # Q1
               df['DRMAGRAC'] +  # Q2
               df['DRMNOCTB'] +  # Q3
               df['SLPLMBMV'] +  # Q4
               df['SLPINJUR'] +  # Q5
               df['DRMVERBL'] + df['DRMFIGHT'] + df['DRMUMV'] + df['DRMOBJFL'] + # Q6
               df['MVAWAKEN'] +  # Q7
               df['DRMREMEM'] +  # Q8
               df['SLPDSTRB'] +  # Q9
               df[['BRNINFM', 'DEPRS', 'EPILEPSY', 'HETRA',  # Q10
                   'NARCLPSY', 'PARKISM', 'RLS', 'STROKE']].max(axis=1))

df['Q6'] = df['DRMVERBL'] + df['DRMFIGHT'] + df['DRMUMV'] + df['DRMOBJFL']

# Note: CNSOTHCM isn't present in data

In [4]:
# Check that max RBDSQ score is <= 13
assert(df['RBDSQ'].max() <= 13)

In [5]:
# Filter RBD subjects with no Parkinsonism
# Keep only evaluations done at BL
# Non-PD subjects have RBD if RBDSQ >= 5. For PD subjects, the cutoff score should be 6
rbd_nopark = df[(df['RBDSQ'] >= 5) & (df['PARKISM'] == 0) & (df['EVENT_ID'] == 'BL')]

## Filter subjects with no MRI at BL

In [6]:
mri = pd.read_csv(op.join('data', 'Magnetic_Resonance_Imaging__MRI_.csv'))
mri = mri[(mri['EVENT_ID'] == 'BL') & (mri['MRICMPLT'] ==1)]

In [7]:
rbd_nopark_mri = rbd_nopark[rbd_nopark['PATNO'].isin(mri['PATNO'])]

## Add age and sex

In [8]:
dem = pd.read_csv(op.join('data', 'Demographics.csv'))
age = pd.read_csv(op.join('data', 'Age_at_visit.csv'))
age = age[age['EVENT_ID']=='BL']
dem = dem[['PATNO', 'SEX']]

In [9]:
rbd_age_sex = pd.merge(rbd_nopark_mri, dem, on='PATNO').merge(age, on='PATNO')
rbd_age_sex = rbd_age_sex[['PATNO', 'RBDSQ', 'SEX', 'AGE_AT_VISIT']]
rbd_age_sex['RBD_group'] = 1

## Controls


In [10]:
# Keep subjects with an RBDSQ <=5 and 0 at Q6.1-6.4. See email thread with Madeleine Sharp and Ron Postuma 
no_rbd_nopark = df[(df['RBDSQ'] < 5) & (df['Q6'] == 0) & (df['PARKISM'] == 0) & (df['EVENT_ID'] == 'BL')]

In [11]:
no_rbd_nopark_mri = no_rbd_nopark[no_rbd_nopark['PATNO'].isin(mri['PATNO'])]
norbd_age_sex = pd.merge(no_rbd_nopark_mri, dem, on='PATNO').merge(age, on='PATNO')
norbd_age_sex = norbd_age_sex[['PATNO', 'RBDSQ', 'SEX', 'AGE_AT_VISIT']]
norbd_age_sex['RBD_group'] = 0

## Cohort matching

We implemented the following function to match RBD and non-RBD groups for age and sex.

In [12]:
def nn_match(sample1, df_2, n2, cat_variables, num_variables, random_state=0):
    '''
    Find len(sample1) rows in df_2 such that variables are matched with sample1.
    
    sample1: samples in group1
    df_2: dataframe with subjects in group 2
    n2: desired sample size for group 2
    cat_variables: categorical variables to match
    num_variables: numerical variables to match
    '''

    def nn(x, df, variables):
        '''
        Find index of nearest neighbor of x in df
        
        * x: a dataframe row
        * df: a dataframe
        * variables: variables to match. Should be normalized.
        '''
        df['dist'] = sum((df[var]-x[var])**2 for var in variables)
        df.sort_values('dist', inplace=True)
        return df.head(1).index[0]  ## there's probably a better way to do it but it works

    # Check assumptions
    n1 = len(sample1)
    assert(n1 <= n2)
    for v in num_variables + cat_variables:
        assert(v in sample1 and v in df_2)
    
    # Copy original dataframe to leave them untouched
    df_2_ = df_2.copy()
    sample1_ = sample1.copy()
    
    # Normalize variables to match to compute meaningful distances
    for v in num_variables:
        m = df_2_[v].mean()
        s = df_2_[v].std()
        for df in (df_2_, sample1_):
            df[v] = (df[v] - m)/s
        
    # For each subject in sampled group 1, 
    # find one or more subject in sampled group 2, without replacement.
    indices = []
    for i in range(n2):
        j = i % n1  # loop over sample1
        df_2_cat = df_2_.copy()
        for c in cat_variables:
            df_2_cat = df_2_cat[df_2_cat[c] == sample1_.iloc[j][c]]
        assert(len(df_2_cat) > 0)
        index = nn(sample1_.iloc[j], df_2_cat, num_variables)
        df_2_.drop(index=index, inplace=True)
        indices.append(index)
    
    sample2 = df_2[df_2.index.isin(indices)]
    
    return sample2
    


In [13]:
# Randomly select 4 control women and 10 control men, to reproduce F/M balance in original paper
controls = pd.concat([norbd_age_sex[norbd_age_sex['SEX']==0].sample(n=4),
                      norbd_age_sex[norbd_age_sex['SEX']==1].sample(n=10)])

# Match with RBD subjects
rbds = nn_match(controls, rbd_age_sex, 26, ['SEX'], ['AGE_AT_VISIT'], random_state=1)

In [14]:
import os
print("\t\t\t| iRBD Patients\t| Controls" + os.linesep +
      f"Subjects, No. \t\t| {len(rbds)} \t\t| {len(controls)}" + os.linesep + 
      f"F/M, No. \t\t| {len(rbds[rbds['SEX']==0])}/{len(rbds[rbds['SEX']==1])} \t\t| {len(controls[controls['SEX']==0])}/{len(controls[controls['SEX']==1])}" + os.linesep +
      f"Age, mean +/- SD \t| {round(rbds['AGE_AT_VISIT'].mean(),1)} +/- {round(rbds['AGE_AT_VISIT'].std(),1)} \t| {round(controls['AGE_AT_VISIT'].mean(),1)} +/- {round(controls['AGE_AT_VISIT'].std(),1)}"
)

			| iRBD Patients	| Controls
Subjects, No. 		| 26 		| 14
F/M, No. 		| 8/18 		| 4/10
Age, mean +/- SD 	| 60.7 +/- 13.1 	| 60.4 +/- 13.2


The demographics parameters of the selected PPMI subjects seem comparable to the ones in the initial study. Gender balance looks better in our cohort.