* **Original paper:** [*Tessa C, Lucetti C, Giannelli M, Diciotti S, Poletti M, Danti S, Baldacci F, Vignali C, Bonuccelli U, Mascalchi M, Toschi N. Progression of brain atrophy in the early stages of Parkinson's disease: a longitudinal tensor-based morphometry study in de novo patients without cognitive impairment. Hum Brain Mapp. 2014 Aug;35(8):3932-44. doi: 10.1002/hbm.22449. Epub 2014 Jan 22. PMID: 24453162; PMCID: PMC6868950.*](https://onlinelibrary.wiley.com/doi/abs/10.1002/hbm.22449)

# Introduction

* **Cohort quote from the paper:** "*Overall, 22 patients (4 women and 18 men, mean age 61.5 ± 8.8) and 17 control subjects (8 women and 9 men, mean age 59.1 ± 8.5 years) completed the study and underwent a second MRI examination. The mean (± standard deviation) follow-up time for patients and controls was 2.8 ± 0.6 (range 2–4) years and 3.9 ± 2.2 (range 2–7) years, respectively. Differences for age between PD patients and control subjects were not significant (P = 0.48, MannWhitney U-test)*"


* Demographics for the PD patients (table taken from the paper):
<img src="./images/original-cohort.png" alt= “” width="30%" height="30%">


* At the baseline evaluation "*No difference in local volume between patients and control subjects was revealed*".


* A (very) brief summary of the main results of the logtitudinal evaluation is:


  * Control subjects: Baseline versus follow-up
    * Control subjects experienced atrophy in several white matter and grey matter regions (Fig. 1a), and cerebrospinal fluid enlargement. There were atrophy clusters involved mainly in white matter, and were more widespread in the frontal lobe.
    
    
  * PD patients: Baseline versus follow-up
    * PD patients showed clusters of reduced white and grey matter volume. These were more evident in the white matter, specially the frontal lobe (Fig. 1b), and showed cerebrospinal fluid enlargement. Grey matter involvement was more widespread than in the control subjects.
    <img src="./images/original-fig1.png" alt= “” width="50%" height="50%">
    
  * PD patients versus control subjects
    * "*PD patients developed bilateral clusters of increased atrophy*" (Fig. 2).
    <img src="./images/original-fig2.png" alt= “” width="50%" height="50%">
    
    
  * Correlation analyses
    * "*In PD patients, no significant correlation between warprates and motor or neuropsychological test scores or their average changes per year between baseline and follow-up were identified*".

# Setup

In [1]:
# Importing the modules we need

import numpy as np
import pandas as pd
import os
from datetime import datetime
import statistics
from scipy.stats import mannwhitneyu

import livingpark_utils
from livingpark_utils import download
from livingpark_utils import clinical
from livingpark_utils.scripts import run
from livingpark_utils.scripts import mri_metadata

This notebook was run on 2023-03-29 00:39:44 UTC +0000
['COR', 'Coronal', 'Cal Head 24', 'Transverse', 'tra_T1_MPRAGE', 'TRA']
['AX', 'Ax', 'axial', 'Phantom', 'T2']
{'Screening': 'SC', 'Baseline': 'BL', 'Month 6': 'V02', 'Month 12': 'V04', 'Month 24': 'V06', 'Month 36': 'V08', 'Month 48': 'V10', 'Symptomatic Therapy': 'ST', 'Unscheduled Visit 01': 'U01', 'Unscheduled Visit 02': 'U02', 'Premature Withdrawal': 'PW'}
Saved in MRI_info.csv


In [2]:
# Notebook data initialization
inputs_dir = os.path.join(os.getcwd(), "inputs/study_files")
outputs_dir = os.path.join(os.getcwd(), "outputs")
data_dir = os.path.join(os.getcwd(), "data")

utils = livingpark_utils.LivingParkUtils()
downloader = download.ppmi.Downloader(utils.study_files_dir)
#random_seed = 1
utils.notebook_init()

This notebook was run on 2023-03-29 00:39:44 UTC +0000


# PPMI Cohort Data Acquisition

In [3]:
# PPMI study data files to reconstruct cohort and demographics table

required_files = [
    "Demographics.csv",                                   # Sex
    "Age_at_visit.csv",                                   # Age
    "Montreal_Cognitive_Assessment__MoCA_.csv",           # MMSE
    "PD_Diagnosis_History.csv",                           # Disease duration
    "Cognitive_Categorization.csv",                       # MCI diagnosis
    "Participant_Status.csv",                             # Parkinson's vs healthy diagnosis
    "Primary_Clinical_Diagnosis.csv",                     # Subjects with no PD nor other neurological disorder
    "Geriatric_Depression_Scale__Short_Version_.csv",     # GDSS - depression screening
    "Family_History.csv",                                 # PD familial history
    "General_Physical_Exam.csv",                          # Cardio-vascular dysfunction (exclusion)
    "Magnetic_Resonance_Imaging__MRI_.csv",               # Baseline & ~24 month follow-up T1w images
    #"MDS_UPDRS_Part_II__Patient_Questionnaire.csv",      # UPDRS II
    "MDS-UPDRS_Part_III.csv",                             # UPDRS III, and Hoehn and Yahr scores
    "Medical_Conditions_Log.csv"                          # Depression diagnosis (and other neuro/psych conditions)
]

utils.get_study_files(required_files, default=downloader)

Download skipped: No missing files!


In [4]:
# Reading in all the demographic info files for the relevant variables

# Age
age_df = pd.read_csv(os.path.join(inputs_dir, "Age_at_visit.csv"), usecols=["PATNO", "EVENT_ID", "AGE_AT_VISIT"])


# Sex
sex_df = pd.read_csv(os.path.join(inputs_dir,"Demographics.csv"), usecols=["PATNO", "SEX"])


# Diagnosis
diagnosis_df = pd.read_csv(os.path.join(inputs_dir, "Primary_Clinical_Diagnosis.csv"), usecols=["PATNO", "EVENT_ID", "PRIMDIAG"])


# Disease duration
disease_duration_df = pd.read_csv(os.path.join(inputs_dir,"PD_Diagnosis_History.csv"), usecols=["PATNO", "EVENT_ID", "PDDXDT"])
disease_duration_df = disease_duration_df.drop(["EVENT_ID"], axis=1) 


# Cognitive categorization (MCI/Dementia/Healthy)
cog_cat_df = pd.read_csv(os.path.join(inputs_dir, "Cognitive_Categorization.csv"), usecols=["PATNO", "EVENT_ID", "COGSTATE"])


# UPDRS3 and HY
updrs3_df = pd.read_csv(os.path.join(inputs_dir,"MDS-UPDRS_Part_III.csv"), usecols=["PATNO", "EVENT_ID", "NP3TOT", "NHY"])


# Medical condition (Depression)
med_cond_df = pd.read_csv(os.path.join(inputs_dir, "Medical_Conditions_Log.csv"), usecols=["PATNO", "EVENT_ID", "MHCAT"]).groupby(['PATNO', 'EVENT_ID'])[['MHCAT']].aggregate(lambda x: tuple(set(x))) # aggregate all codes in a tuple
dep = []
for x in med_cond_df['MHCAT']:
    if 115 in x: #115 = depression
        dep.append(1)
    else:
        dep.append(0)     
new_med = med_cond_df
new_med['Depression'] = dep
new_med = new_med.reset_index()
new_med = new_med.drop(["EVENT_ID", "MHCAT"], axis=1) 


# MoCA --> MMSE
moca_df = pd.read_csv(os.path.join(inputs_dir,"Montreal_Cognitive_Assessment__MoCA_.csv"), usecols=["PATNO", "EVENT_ID", "MCATOT"])
moca_df["MMSETOT"] = moca_df["MCATOT"].apply(clinical.moca2mmse)


# Parkinson's (COHORT=1) vs healthy control (COHORT=2)
part_stat_df = pd.read_csv(os.path.join(inputs_dir, "Participant_Status.csv"), usecols=["PATNO", "COHORT"])


# GDSS
gdsshort_df = pd.read_csv(os.path.join(inputs_dir,"Geriatric_Depression_Scale__Short_Version_.csv"))
gdsshort_df = gdsshort_df.drop(["REC_ID","PAG_NAME", "INFODT","ORIG_ENTRY","LAST_UPDATE"], axis=1)
gds = gdsshort_df.iloc[:, 2:] # Calculate GDS score for each patient
gds = gds.agg(['sum'], axis="columns").rename(columns={"sum": "GDSTOT"})
gdsshort_df = pd.concat([gdsshort_df[['PATNO', 'EVENT_ID']], gds], axis=1) #Add gds score to df


# Physical Examination (For now just need PESEQ=6 for cardiovascular...Might need neurological too?)
physical_df = pd.read_csv(os.path.join(inputs_dir, "General_Physical_Exam.csv"), usecols=["PATNO", "PESEQ", "ABNORM"])
physical_df_mod = physical_df.loc[(physical_df['PESEQ'] == 6)]
physical_df_mod = physical_df_mod.drop('PESEQ', axis=1)

# MRI availability
run.mri_metadata()
mri_df = pd.read_csv(os.path.join(inputs_dir,"MRI_info.csv"))
mri_df["EVENT_ID"] = mri_df["Visit code"]
mri_df["PATNO"] = mri_df["Subject ID"]
mri_df["Sex"] = mri_df["Sex"].map({"F": 0, "M": 1})
mri_df = mri_df.drop(["Subject ID", "Visit code", "Visit", "Age", "Sex", "Description", "Imaging Protocol"], axis=1)

0.0


# Data aggregation

In [5]:
# Merging all of the relevant study file variables into one big dataframe

result = pd.merge(age_df, sex_df, on=["PATNO"], how="outer")
result = pd.merge(result, part_stat_df, on=["PATNO"], how="outer")
result = pd.merge(result, diagnosis_df, on=["PATNO", "EVENT_ID"], how="outer")
result = pd.merge(result, cog_cat_df, on=["PATNO", "EVENT_ID"], how="outer")
result = pd.merge(result, updrs3_df, on=["PATNO", "EVENT_ID"], how="outer")
result = pd.merge(result, moca_df, on=["PATNO", "EVENT_ID"], how="outer")
result = pd.merge(result, gdsshort_df, on=["PATNO", "EVENT_ID"], how="outer")
result = pd.merge(result, physical_df_mod, on=["PATNO"], how="outer")
result = pd.merge(result, new_med, on=["PATNO"], how="outer") 
result = pd.merge(result, disease_duration_df, on=["PATNO"], how="outer")
result = pd.merge(result, mri_df, on=["PATNO", "EVENT_ID"], how="outer")

# Cohort Matching

We now attempt to reconstruct a cohort which is as similar to the original study as possible using the PPMI data.

* ### Inclusion/Exclusion criteria I will use to replicate this study:


  * Baseline and ~36 month follow-up T1-weighted MRIs available and usable for TBM. (MRI_info.csv: **Study Date**)
  
  
  * Disease duration ~1 year at baseline (PD_Diagnosis_History.csv: **PDDXDT**).
  
  
  * No MCI or dementia (Cognitive_Categorization.csv: **COGSTATE** != 3 for dementia, 2 for MCI)
  
  
  * No depression (Medical_Conditions_Log.csv: **MHCAT** != 115 --> **Depression** != 1 in aggregation )
  
  
  * No cardio-vascular autonomic dysfunction (General_Physical_Exam.csv: **PESAQ** == 6 && **ABNORM** == 0 --> **ABNORM** != 1 in aggregation)
  
  
  * Subjects: 4 PD women and 18 PD men (Participant_Status.csv: **COHORT**==1 PD)
  * Controls: 8 HC women and 9 HC men (Participant_Status.csv: **COHORT**==2 HC)
  * Demographics.csv: **SEX**==1 Male, **SEX**==0 Female
  
  
  * Controls: a normal neurological examination (Unsure: Primary_Clinical_Diagnosis.csv: **PRIMDIAG==17**, Or Neurological_exam.csv battery, or just Participant_Status.csv: **COHORT**==2?), and no relatives with PD (*See the notes below on how this differs from the original study*).
  
* ### Replication Limitations, and other points to consider


  * For the control subjects, the original study states that they must have "No history of familial or personal neurological diseases". I can't find this detailed familial data in PPMI, the existing familial data is only with regard to PD.
  
  
  * We are using the PPMI's cognitive state diagnosis for MCI instead of the original paper's battery of standardized neuropsychological tests.
  
  
  * While it's not an inclusion/exclusion criterion, the paper states that  "At follow-up examination, all patients were receiving L-dopa". It makes no mention of L-dopa at baseline.
  
  * There are not enough female healthy controls in PPMI that meet our criteria, so we are using 6 female controls instead of 8, and using 11 men instead of 9.

In [6]:
#PD Subjects query

# ~36 month followup, range 24-48 months within ~1yr of diagnoses (needs EVENT_ID = BL)
visit2month = {"V06": 24, "V10": 48} # Visits with MRI scans

#Initial filtering for inclusion/exclusion criteria
PD_query = result.loc[(result['ABNORM'] == 0.0) &
           (result['COGSTATE'] != 2.0) & 
           (result['COGSTATE'] != 3.0) &
           (result['Depression'] != 1.0) &
           (result['Study Date'].notnull()) &
           (result['COHORT'] == 1.0)] #Need PRIMDIAG == 1, but doesn't seem necessary upon inspecting 'result'.

PATNO_list = PD_query["PATNO"].unique() #Get unique python list of PATNOs to iterate over.
events = visit2month.keys() #Get list of Event_IDs to form combinations
PD_followup = pd.DataFrame(columns=PD_query.columns.values.tolist()) #empty dataframe to store followup queries

#Make a dataframe containing a single entry for the BL, and followup EVENT_IDs
for PATNO_curr in PATNO_list: #iterate over all patients, check if desired EVENT_ID combination exists
        PD_temp = pd.DataFrame(columns=PD_query.columns.values.tolist())
        event1 = PD_query.loc[(PD_query['PATNO'] == PATNO_curr) & (PD_query['EVENT_ID'] == 'BL')] #Baseline
        PD_temp = pd.concat([PD_temp, event1])
        for c in events: #Iterate over followup EVENT_IDs
            event2 = PD_query.loc[(PD_query['PATNO'] == PATNO_curr) & (PD_query['EVENT_ID'] == c)] #Followups
            PD_temp = pd.concat([PD_temp, event2])
            if (event1.empty == False and event2.empty == False): # follow up exists
                PD_followup = pd.concat([PD_followup, PD_temp])

#Drop duplicate entries for same Event_ID within a patient
PD_followup = PD_followup.drop_duplicates(subset=['PATNO', 'EVENT_ID'], keep="first")                

#Randomly choose 18 males, 4 females at Baseline
PD_cohort_M = PD_followup.loc[(PD_followup['SEX'] == 1.0) & 
                              (PD_followup['EVENT_ID'] == "BL")].sample(n=18, random_state=1)
PD_cohort_F = PD_followup.loc[(PD_followup['SEX'] == 0.0) &
                              (PD_followup['EVENT_ID'] == "BL")].sample(n=4, random_state=1)
PD_cohort_final = pd.concat([PD_cohort_M, PD_cohort_F])

#Append followup event to final cohort: first try v06, then v10
used_PATNO_list = []
final_PATNO_list = PD_cohort_final["PATNO"].unique()
for PATNO_curr in final_PATNO_list:
    followup = PD_followup.loc[(PD_followup['PATNO'] == PATNO_curr) & (PD_followup['EVENT_ID'] == "V06")]
    PD_cohort_final = pd.concat([PD_cohort_final, followup])
    if followup.empty == False:
        used_PATNO_list.append(PATNO_curr)
    if(len(used_PATNO_list) >= 13): #Gets us closest to the 2.8yr follow-up average with range 2-4yrs
        break    
not_used_PATNO_list = list(set(final_PATNO_list) - set(used_PATNO_list))
for PATNO_curr in not_used_PATNO_list: #Use some V10 visits too
    followup = PD_followup.loc[(PD_followup['PATNO'] == PATNO_curr) & (PD_followup['EVENT_ID'] == "V10")]
    PD_cohort_final = pd.concat([PD_cohort_final, followup])

#display(PD_cohort_final.sort_values(['PATNO', 'EVENT_ID']))
print("Number of scans in PD cohort: " + str(len(PD_cohort_final.index)))

Number of scans in PD cohort: 44


In [7]:
#Control Group query

# ~47 month followup, range 24-84 months within ~1yr of diagnoses (needs EVENT_ID = BL)
visit2month = {"V06": 24, "V10": 48} # Visits with MRI scans

#Initial filtering for inclusion/exclusion criteria
HC_query = result.loc[(result['ABNORM'] == 0.0) &
           (result['COGSTATE'] != 2.0) & 
           (result['COGSTATE'] != 3.0) &
           (result['Depression'] != 1.0) &
           (result['Study Date'].notnull()) &
           (result['COHORT'] == 2.0)] #Need PRIMDIAG == 17, but seems unnecessary upon inspecting 'results'

PATNO_list_HC = HC_query["PATNO"].unique() #Get unique python list of PATNOs to iterate over.
events = visit2month.keys() #Get list of Event_IDs to form combinations
HC_followup = pd.DataFrame(columns=HC_query.columns.values.tolist()) #empty dataframe to store followup queries

for PATNO_curr in PATNO_list_HC: #iterate over all controls, check if desired EVENT_ID combination exists
        HC_temp = pd.DataFrame(columns=HC_query.columns.values.tolist())
        event1 = HC_query.loc[(HC_query['PATNO'] == PATNO_curr) & (HC_query['EVENT_ID'] == 'BL')] #Baseline
        HC_temp = pd.concat([HC_temp, event1])
        for c in events: #Iterate over followup EVENT_IDs
            event2 = HC_query.loc[(HC_query['PATNO'] == PATNO_curr) & (HC_query['EVENT_ID'] == c)] #Followups
            HC_temp = pd.concat([HC_temp, event2])
            if (event1.empty == False and event2.empty == False): # follow up exists
                HC_followup = pd.concat([HC_followup, HC_temp])

#Drop duplicate entries for same Event_ID within a patient
HC_followup = HC_followup.drop_duplicates(subset=['PATNO', 'EVENT_ID'], keep="first")                

#Randomly choose 11 males, 6 females at Baseline - note that we are not replicating the original gender ratio!
HC_cohort_M = HC_followup.loc[(HC_followup['SEX'] == 1.0) & 
                              (HC_followup['EVENT_ID'] == "BL")].sample(n=11, random_state=1)
HC_cohort_F = HC_followup.loc[(HC_followup['SEX'] == 0.0) &
                              (HC_followup['EVENT_ID'] == "BL")].sample(n=6, random_state=1)
HC_cohort_final = pd.concat([HC_cohort_M, HC_cohort_F])

#Append followup event to final cohort: first v10, then v06, to get us closer to their ~47 month followup mean
used_PATNO_list = []
final_PATNO_list = HC_cohort_final["PATNO"].unique()
for PATNO_curr in final_PATNO_list:
    followup = HC_followup.loc[(HC_followup['PATNO'] == PATNO_curr) & (HC_followup['EVENT_ID'] == "V10")]
    HC_cohort_final = pd.concat([HC_cohort_final, followup])
    if followup.empty == False:
        used_PATNO_list.append(PATNO_curr)  
not_used_PATNO_list = list(set(final_PATNO_list) - set(used_PATNO_list))
for PATNO_curr in not_used_PATNO_list: 
    followup = HC_followup.loc[(HC_followup['PATNO'] == PATNO_curr) & (HC_followup['EVENT_ID'] == "V06")]
    HC_cohort_final = pd.concat([HC_cohort_final, followup])

#display(HC_cohort_final.sort_values(['PATNO', 'EVENT_ID']))
print("Number of scans in HC cohort: " + str(len(HC_cohort_final.index)))

Number of scans in HC cohort: 34


In the original study the authors use a Mann-Whitney U test to ensure that the difference in ages between PD subjects and controls is not significant.
Our null hypothesis is that the distribution of PD subject ages is the same as the distribution of HC ages.
Say we want a confidence level of 95% to reject the null hypothesis, then:

In [8]:
PD_BL_age_df = PD_cohort_final.loc[(PD_cohort_final['EVENT_ID'] == "BL")]
PD_BL_ages = PD_BL_age_df["AGE_AT_VISIT"].tolist() #List of PD subject ages at baseline

HC_BL_age_df = HC_cohort_final.loc[(HC_cohort_final['EVENT_ID'] == "BL")]
HC_BL_ages = HC_BL_age_df["AGE_AT_VISIT"].tolist() #List of Healthy Control ages at baseline

U1, p = mannwhitneyu(PD_BL_ages, HC_BL_ages, method="exact")
print(p) #p-value for the alternative hypothesis

0.26665385103212025


The probability of an age value being as or more extreme than the other group by chance exceeds 5%.
Therefore the null hypothesis is not rejected, and we do not consider the difference in ages between the groups statistically significant.

# Cohort Summary Statistics

While we do not have all of the same demographic data for our cohort as the initial study, this gives some similar statistics to their 'Table 1'.

In [9]:
#Function to calculate disease duration at visit (in years, like the original study)

def disease_duration_at_visit(pddxdt, StudyDate):
    temp = StudyDate.split("/")
    d1 = datetime.strptime(pddxdt, "%m/%Y")
    d2 = datetime.strptime(StudyDate, "%m/%d/%Y")
    delta = d2 - d1
    return delta.days/365.0 #days into years

In [10]:
# PD Subjects Cohort

PD_BL_df = PD_cohort_final.loc[(PD_cohort_final['EVENT_ID'] == "BL")]
PD_BL_df = PD_BL_df.drop(["PATNO", "SEX", "COHORT", "PRIMDIAG", "COGSTATE", "ABNORM", "Depression"], axis=1)
PD_FU_df = PD_cohort_final.loc[(PD_cohort_final['EVENT_ID'] != "BL")]
PD_FU_df = PD_FU_df.drop(["PATNO", "SEX", "COHORT", "PRIMDIAG", "COGSTATE", "ABNORM", "Depression"], axis=1)

duration_list = []
for index, row in PD_BL_df.iterrows():
    dur = disease_duration_at_visit(row['PDDXDT'], row['Study Date'])
    duration_list.append(dur)

print("PD Subject Cohort")
print("\nAt baseline: \tmean (std)~~~~~~~~~~~~~~~~~~~~~~")
print("Age at Visit: \t" + str(PD_BL_df["AGE_AT_VISIT"].mean()) + " (" + str(PD_BL_df["AGE_AT_VISIT"].std()) + ")")
print("PD Duration yr: " + str(statistics.mean(duration_list)) + " (" + str(statistics.stdev(duration_list)) + ")")
print("UPDRS III: \t" + str(PD_BL_df["NP3TOT"].mean()) + " (" + str(PD_BL_df["NP3TOT"].std()) + ")")
print("Hoehn & Yahr: \t" + str(PD_BL_df["NHY"].mean()) + " (" + str(PD_BL_df["NHY"].std()) + ")")
#print("MoCA: \t" + str(PD_BL_df["MCATOT"].mean()) + " (" + str(PD_BL_df["MCATOT"].std()) + ")")
print("MMSE: \t\t" + str(PD_BL_df["MMSETOT"].mean()) + " (" + str(PD_BL_df["MMSETOT"].std()) + ")")
print("GDS: \t\t" + str(PD_BL_df["GDSTOT"].mean()) + " (" + str(PD_BL_df["GDSTOT"].std()) + ")")
print("\nAt follow-up: \tmean (std)~~~~~~~~~~~~~~~~~~~~~")
print("Age at Visit: \t" + str(PD_FU_df["AGE_AT_VISIT"].mean()) + " (" + str(PD_FU_df["AGE_AT_VISIT"].std()) + ")")
print("UPDRS III: \t" + str(PD_FU_df["NP3TOT"].mean()) + " (" + str(PD_FU_df["NP3TOT"].std()) + ")")
print("Hoehn & Yahr: \t" + str(PD_FU_df["NHY"].mean()) + " (" + str(PD_FU_df["NHY"].std()) + ")")
#print("MCATOT: \t" + str(PD_FU_df["MCATOT"].mean()) + " (" + str(PD_FU_df["MCATOT"].std()) + ")")
print("MMSE: \t\t" + str(PD_FU_df["MMSETOT"].mean()) + " (" + str(PD_FU_df["MMSETOT"].std()) + ")")
print("GDS: \t\t" + str(PD_FU_df["GDSTOT"].mean()) + " (" + str(PD_FU_df["GDSTOT"].std()) + ")")

PD Subject Cohort

At baseline: 	mean (std)~~~~~~~~~~~~~~~~~~~~~~
Age at Visit: 	60.57727272727272 (10.89918877794655)
PD Duration yr: 1.1064757160647571 (1.3905080433471115)
UPDRS III: 	22.40909090909091 (10.711182811876437)
Hoehn & Yahr: 	1.5454545454545454 (0.5958006000151015)
MMSE: 		nan (nan)
GDS: 		5.0 (1.5118578920369088)

At follow-up: 	mean (std)~~~~~~~~~~~~~~~~~~~~~
Age at Visit: 	63.449999999999996 (10.628387415201756)
UPDRS III: 	27.227272727272727 (10.364462259185371)
Hoehn & Yahr: 	1.8181818181818181 (0.3947710169758613)
MMSE: 		29.59090909090909 (0.7963662060880874)
GDS: 		5.181818181818182 (1.2587357087234732)


In [11]:
# Control Group Cohort

HC_BL_df = HC_cohort_final.loc[(HC_cohort_final['EVENT_ID'] == "BL")]
HC_BL_df = HC_BL_df.drop(["PATNO", "SEX", "COHORT", "PRIMDIAG", "COGSTATE", "ABNORM", "Depression"], axis=1)
HC_FU_df = HC_cohort_final.loc[(HC_cohort_final['EVENT_ID'] != "BL")]
HC_FU_df = HC_FU_df.drop(["PATNO", "SEX", "COHORT", "PRIMDIAG", "COGSTATE", "ABNORM", "Depression"], axis=1)

print("Control Group Cohort")
print("\nAt baseline: \tmean (std)~~~~~~~~~~~~~~~~~~~~~~~~")
print("Age at Visit: \t" + str(HC_BL_df["AGE_AT_VISIT"].mean()) + " (" + str(HC_BL_df["AGE_AT_VISIT"].std()) + ")")
print("UPDRS III: \t" + str(HC_BL_df["NP3TOT"].mean()) + " (" + str(HC_BL_df["NP3TOT"].std()) + ")")
print("Hoehn & Yahr: \t" + str(HC_BL_df["NHY"].mean()) + " (" + str(HC_BL_df["NHY"].std()) + ")")
#print("MoCA: \t" + str(HC_BL_df["MCATOT"].mean()) + " (" + str(HC_BL_df["MCATOT"].std()) + ")")
print("MMSE: \t\t" + str(HC_BL_df["MMSETOT"].mean()) + " (" + str(HC_BL_df["MMSETOT"].std()) + ")")
print("GDS: \t\t" + str(HC_BL_df["GDSTOT"].mean()) + " (" + str(HC_BL_df["GDSTOT"].std()) + ")")
print("\nAt follow-up: \tmean (std)~~~~~~~~~~~~~~~~~~~~~~~~")
print("Age at Visit: \t" + str(HC_FU_df["AGE_AT_VISIT"].mean()) + " (" + str(HC_FU_df["AGE_AT_VISIT"].std()) + ")")
print("UPDRS III: \t" + str(HC_FU_df["NP3TOT"].mean()) + " (" + str(HC_FU_df["NP3TOT"].std()) + ")")
print("Hoehn & Yahr: \t" + str(HC_FU_df["NHY"].mean()) + " (" + str(HC_FU_df["NHY"].std()) + ")")
#print("MoCA: \t" + str(HC_FU_df["MCATOT"].mean()) + " (" + str(HC_FU_df["MCATOT"].std()) + ")")
print("MMSE: \t\t" + str(HC_FU_df["MMSETOT"].mean()) + " (" + str(HC_FU_df["MMSETOT"].std()) + ")")
print("GDS: \t\t" + str(HC_FU_df["GDSTOT"].mean()) + " (" + str(HC_FU_df["GDSTOT"].std()) + ")")

Control Group Cohort

At baseline: 	mean (std)~~~~~~~~~~~~~~~~~~~~~~~~
Age at Visit: 	56.141176470588235 (12.90542806455531)
UPDRS III: 	0.4117647058823529 (1.0036697371030325)
Hoehn & Yahr: 	0.0 (0.0)
MMSE: 		nan (nan)
GDS: 		5.117647058823529 (0.6966305460192359)

At follow-up: 	mean (std)~~~~~~~~~~~~~~~~~~~~~~~~
Age at Visit: 	59.517647058823535 (12.848114418763831)
UPDRS III: 	1.0625 (1.4361406616345072)
Hoehn & Yahr: 	0.0 (0.0)
MMSE: 		29.529411764705884 (0.6242642728467979)
GDS: 		5.176470588235294 (0.6359337738364604)
