* **Original paper:** [*Tessa C, Lucetti C, Giannelli M, Diciotti S, Poletti M, Danti S, Baldacci F, Vignali C, Bonuccelli U, Mascalchi M, Toschi N. Progression of brain atrophy in the early stages of Parkinson's disease: a longitudinal tensor-based morphometry study in de novo patients without cognitive impairment. Hum Brain Mapp. 2014 Aug;35(8):3932-44. doi: 10.1002/hbm.22449. Epub 2014 Jan 22. PMID: 24453162; PMCID: PMC6868950.*](https://onlinelibrary.wiley.com/doi/abs/10.1002/hbm.22449)

# Introduction

* **Cohort quote from the paper:** "*Overall, 22 patients (4 women and 18 men, mean age 61.5 ± 8.8) and 17 control subjects (8 women and 9 men, mean age 59.1 ± 8.5 years) completed the study and underwent a second MRI examination. The mean (± standard deviation) follow-up time for patients and controls was 2.8 ± 0.6 (range 2–4) years and 3.9 ± 2.2 (range 2–7) years, respectively. Differences for age between PD patients and control subjects were not significant (P = 0.48, MannWhitney U-test)*"


* Demographics for the PD patients (table taken from the paper):
<img src="./images/original-cohort.png" alt= “” width="30%" height="30%">


* At the baseline evaluation "*No difference in local volume between patients and control subjects was revealed*".


* A (very) brief summary of the main results of the logtitudinal evaluation is:


  * Control subjects: Baseline versus follow-up
    * Control subjects experienced atrophy in several white matter and grey matter regions (Fig. 1a), and cerebrospinal fluid enlargement. There were atrophy clusters involved mainly in white matter, and were more widespread in the frontal lobe.
    
    
  * PD patients: Baseline versus follow-up
    * PD patients showed clusters of reduced white and grey matter volume. These were more evident in the white matter, specially the frontal lobe (Fig. 1b), and showed cerebrospinal fluid enlargement. Grey matter involvement was more widespread than in the control subjects.
    <img src="./images/original-fig1.png" alt= “” width="50%" height="50%">
    
  * PD patients versus control subjects
    * "*PD patients developed bilateral clusters of increased atrophy*" (Fig. 2).
    <img src="./images/original-fig2.png" alt= “” width="50%" height="50%">
    
    
  * Correlation analyses
    * "*In PD patients, no significant correlation between warprates and motor or neuropsychological test scores or their average changes per year between baseline and follow-up were identified*".

# Setup

In [1]:
# mporting the modules we need

import numpy as np
import pandas as pd
import os

import livingpark_utils
from livingpark_utils import download
from livingpark_utils import clinical
from livingpark_utils.scripts import run
from livingpark_utils.scripts import mri_metadata

This notebook was run on 2023-03-17 21:38:59 UTC +0000
['COR', 'Coronal', 'Cal Head 24', 'Transverse', 'tra_T1_MPRAGE', 'TRA']
['AX', 'Ax', 'axial', 'Phantom', 'T2']
{'Screening': 'SC', 'Baseline': 'BL', 'Month 6': 'V02', 'Month 12': 'V04', 'Month 24': 'V06', 'Month 36': 'V08', 'Month 48': 'V10', 'Symptomatic Therapy': 'ST', 'Unscheduled Visit 01': 'U01', 'Unscheduled Visit 02': 'U02', 'Premature Withdrawal': 'PW'}
Saved in MRI_info.csv


In [2]:
# Notebook data initialization
inputs_dir = os.path.join(os.getcwd(), "inputs/study_files")
outputs_dir = os.path.join(os.getcwd(), "outputs")
data_dir = os.path.join(os.getcwd(), "data")

utils = livingpark_utils.LivingParkUtils()
downloader = download.ppmi.Downloader(utils.study_files_dir)
#random_seed = 1
utils.notebook_init()

This notebook was run on 2023-03-17 21:38:59 UTC +0000


# PPMI Cohort Data Acquisition

In [3]:
# PPMI study data files to reconstruct cohort and demographics table

required_files = [
    "Demographics.csv",                                   # Sex
    "Age_at_visit.csv",                                   # Age
    "Montreal_Cognitive_Assessment__MoCA_.csv",           # MMSE
    "PD_Diagnosis_History.csv",                           # Disease duration
    "Cognitive_Categorization.csv",                       # MCI diagnosis
    "Participant_Status.csv",                             # Parkinson's vs healthy diagnosis
    "Primary_Clinical_Diagnosis.csv",                     # Subjects with no PD nor other neurological disorder
    "Geriatric_Depression_Scale__Short_Version_.csv",     # GDSS - depression screening
    "Family_History.csv",                                 # PD familial history
    "General_Physical_Exam.csv",                          # Cardio-vascular dysfunction (exclusion)
    "Magnetic_Resonance_Imaging__MRI_.csv",               # Baseline & ~24 month follow-up T1w images
    #"MDS-UPDRS_Part_I.csv",                              # UPDRS (unsure which of all the files I need)
    #"MDS_UPDRS_Part_II__Patient_Questionnaire.csv",      # ""
    "MDS-UPDRS_Part_IV__Motor_Complications.csv",         # ""
    #"MDS-UPDRS_Part_I_Patient_Questionnaire.csv",        # ""
    "MDS-UPDRS_Part_III.csv",                             # UPDRS III, and Hoehn and Yahr scores
    "Medical_Conditions_Log.csv"                          # Depression diagnosis (and other neuro/psych conditions)
]

utils.get_study_files(required_files, default=downloader)

Download skipped: No missing files!


In [4]:
# Reading in all the demographic info files for the relevant variables

# Age
age_df = pd.read_csv(os.path.join(inputs_dir, "Age_at_visit.csv"), usecols=["PATNO", "EVENT_ID", "AGE_AT_VISIT"])


# Sex
sex_df = pd.read_csv(os.path.join(inputs_dir,"Demographics.csv"), usecols=["PATNO", "SEX"])


# Diagnosis
diagnosis_df = pd.read_csv(os.path.join(inputs_dir, "Primary_Clinical_Diagnosis.csv"), usecols=["PATNO", "EVENT_ID", "PRIMDIAG"])


# Disease duration
disease_duration_df = pd.read_csv(os.path.join(inputs_dir,"PD_Diagnosis_History.csv"), usecols=["PATNO", "EVENT_ID", "PDDXDT"])


# Cognitive categorization (MCI/Dementia/Healthy)
cog_cat_df = pd.read_csv(os.path.join(inputs_dir, "Cognitive_Categorization.csv"), usecols=["PATNO", "EVENT_ID", "COGSTATE"])


# UPDRS3 and HY
updrs3_df = pd.read_csv(os.path.join(inputs_dir,"MDS-UPDRS_Part_III.csv"), usecols=["PATNO", "EVENT_ID", "NP3TOT", "NHY"])


# Medical condition (Depression)
med_cond_df = pd.read_csv(os.path.join(inputs_dir, "Medical_Conditions_Log.csv"), usecols=["PATNO", "EVENT_ID", "MHCAT"]).groupby(['PATNO', 'EVENT_ID'])[['MHCAT']].aggregate(lambda x: tuple(set(x))) # aggregate all codes in a tuple
dep = []
for x in med_cond_df['MHCAT']:
    if 115 in x: #115 = depression
        dep.append(1)
    else:
        dep.append(0)     
new_med = med_cond_df
new_med['Depression'] = dep
new_med = new_med.reset_index()
new_med = new_med.drop(["EVENT_ID", "MHCAT"], axis=1) 


# MoCA --> MMSE
moca_df = pd.read_csv(os.path.join(inputs_dir,"Montreal_Cognitive_Assessment__MoCA_.csv"), usecols=["PATNO", "EVENT_ID", "MCATOT"])
moca_df["MMSETOT"] = moca_df["MCATOT"].apply(clinical.moca2mmse)


# Parkinson's (COHORT=1) vs healthy control (COHORT=2)
part_stat_df = pd.read_csv(os.path.join(inputs_dir, "Participant_Status.csv"), usecols=["PATNO", "COHORT"])


# GDSS
gdsshort_df = pd.read_csv(os.path.join(inputs_dir,"Geriatric_Depression_Scale__Short_Version_.csv"))
gdsshort_df = gdsshort_df.drop(["REC_ID","PAG_NAME", "INFODT","ORIG_ENTRY","LAST_UPDATE"], axis=1)
gds = gdsshort_df.iloc[:, 2:] # Calculate GDS score for each patient
gds = gds.agg(['sum'], axis="columns").rename(columns={"sum": "GDSTOT"})
gdsshort_df = pd.concat([gdsshort_df[['PATNO', 'EVENT_ID']], gds], axis=1) #Add gds score to df


# Physical Examination (For now just need PESEQ=6 for cardiovascular...Might need neurological too?)
physical_df = pd.read_csv(os.path.join(inputs_dir, "General_Physical_Exam.csv"), usecols=["PATNO", "PESEQ", "ABNORM"])
physical_df_mod = physical_df.loc[(physical_df['PESEQ'] == 6)]
physical_df_mod = physical_df_mod.drop('PESEQ', axis=1)

# MRI availability
run.mri_metadata()
mri_df = pd.read_csv(os.path.join(inputs_dir,"MRI_info.csv"))
mri_df["EVENT_ID"] = mri_df["Visit code"]
mri_df["PATNO"] = mri_df["Subject ID"]
mri_df["Sex"] = mri_df["Sex"].map({"F": 0, "M": 1})
mri_df = mri_df.drop(["Subject ID", "Visit code", "Visit", "Age", "Sex", "Description", "Imaging Protocol"], axis=1)


# Family PD history - Probably not needed since PPMI healthy controls were checked for PD family history
#fam_df = pd.read_csv(os.path.join(inputs_dir, "Family_History.csv"), usecols=["PATNO", "EVENT_ID", "ANYFAMPD"])

0.0


# Data aggregation

In [5]:
# Merging all of the relevant study file variables into one big dataframe

result = pd.merge(age_df, sex_df, on=["PATNO"], how="outer")
result = pd.merge(result, part_stat_df, on=["PATNO"], how="outer")
result = pd.merge(result, diagnosis_df, on=["PATNO", "EVENT_ID"], how="outer")
result = pd.merge(result, disease_duration_df, on=["PATNO", "EVENT_ID"], how="outer")
result = pd.merge(result, cog_cat_df, on=["PATNO", "EVENT_ID"], how="outer")
result = pd.merge(result, updrs3_df, on=["PATNO", "EVENT_ID"], how="outer")
result = pd.merge(result, moca_df, on=["PATNO", "EVENT_ID"], how="outer")
result = pd.merge(result, gdsshort_df, on=["PATNO", "EVENT_ID"], how="outer")
result = pd.merge(result, physical_df_mod, on=["PATNO"], how="outer")
result = pd.merge(result, new_med, on=["PATNO"], how="outer") 
result = pd.merge(result, mri_df, on=["PATNO", "EVENT_ID"], how="outer")

# Cohort Matching

We now attempt to reconstruct a cohort which is as similar to the original study as possible using the PPMI data.

* ### Inclusion/Exclusion criteria I will use to replicate this study:


  * Baseline and ~24 month follow-up T1-weighted MRIs available and usable for TBM. (MRI_info.csv: **Study Date**)
  
  
  * Disease duration ~1 year at baseline (PD_Diagnosis_History.csv: **PDDXDT**).
  
  
  * No MCI or dementia (Cognitive_Categorization.csv: **COGSTATE** != 3 for dementia, 2 for MCI)
  
  
  * No depression (Medical_Conditions_Log.csv: **MHCAT** != 115 --> **Depression** != 1 in aggregation )
  
  
  * No cardio-vascular autonomic dysfunction (General_Physical_Exam.csv: **PESAQ** == 6 && **ABNORM** == 0 --> **ABNORM** != 1 in aggregation)
  
  
  * Subjects: 4 PD women and 18 PD men (Participant_Status.csv: **COHORT**==1 PD)
  * Controls: 8 HC women and 9 HC men (Participant_Status.csv: **COHORT**==2 HC)
  * Demographics.csv: **SEX**==1 Male, **SEX**==0 Female
  
  
  * Controls: a normal neurological examination (Unsure: Primary_Clinical_Diagnosis.csv: **PRIMDIAG==17**, Or Neurological_exam.csv battery, or just Participant_Status.csv: **COHORT**==2?), and no relatives with PD (*See the notes below on how this differs from the original study*).
  
* ### Replication Limitations, and other points to consider


  * For the control subjects, the original study states that they must have "No history of familial or personal neurological diseases". I can't find this detailed familial data in PPMI, the existing familial data is only with regard to PD.
  
  
  * We are using the PPMI's cognitive state diagnosis for MCI instead of the original paper's battery of standardized neuropsychological tests.
  
  
  * While it's not an inclusion/exclusion criterion, the paper states that  "At follow-up examination, all patients were receiving L-dopa". It makes no mention of L-dopa at baseline.

In [6]:
#PD Subjects query

result = result.drop('PATNO', axis=1) #Drop for display/privacy
cohort = result.loc[(result['ABNORM'] == 0.0) &
           (result['COGSTATE'] != 2.0) & 
           (result['COGSTATE'] != 3.0) &
           (result['Depression'] != 1.0) &
           (result['Study Date'].notnull()) &
           (result['COHORT'] == 1.0)
            # lack disease duration, mri follow-up availability, sex in query
          ]

In [7]:
#Control Group query

# Cohort Summary Statistics

In [8]:
# PD Subjects Cohort

In [9]:
# Control Group Cohort