## Sorting data to cohorts
In this file I sort the data based on the MRIT1_elim.csv. I will create 4 cohorts:  <br>
cohort name:         (UDSD, ALZD)<br>
normal cognition:    (1, 0)<br>
alzheimer's disease: (4, 1)<br>
mci on way to alzd:  (3, 1)<br>
transition cohort:   every transitions, for exaple (3, 1) ---> (4, 1)<br>

In [18]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [19]:
loadPath = "../data/"
writePath = "../../NACC_data/sorted_cohorts/"
savePath = "../results/"

Reading the .csv file.

In [20]:
df = pd.read_csv(loadPath + 'MRIT1_elim.csv')

In [21]:
print(df['NACCID'].nunique())

898


Creating 4 empty arrays (4 groups), that will store NACCIDS based on given criteria.

In [38]:
nc = []                 # normal cognition
mci = []                # mci
alz = []               # alzheimer's disease
trans = []              # transitional array

In [39]:
for naccid in df['NACCID'].unique():

    patient = df[df['NACCID'] == naccid]

    a = len(patient)

    udsd = [row['NACCUDSD'] for idx, row in patient.iterrows()]
    alzd = [row['NACCALZD'] for idx, row in patient.iterrows()]

    if (udsd.count(1), alzd.count(8)) == (a, a):
        nc.append(naccid)

    elif (udsd.count(4), alzd.count(1)) == (a, a):
        alz.append(naccid)

    elif (udsd.count(3), alzd.count(1)) == (a, a):
        mci.append(naccid)

    else:
        trans.append(naccid)


IMPORTANT: I deleted 3 patients by hand in trans, since they were extreme cases WLOG.

In [42]:
print(len(nc) + len(mci) + len(alz) + len(trans))
print(len(alz))

898
64


Filtering the data.

In [43]:
df_nc = df[df['NACCID'].isin(nc)]
df_mci = df[df['NACCID'].isin(mci)]
df_alz = df[df['NACCID'].isin(alz)]
df_trans = df[df['NACCID'].isin(trans)]

Sanity check.

In [45]:
print(df_nc['NACCID'].nunique() + df_mci['NACCID'].nunique() +
      df_alz['NACCID'].nunique() + df_trans['NACCID'].nunique())

898


In [48]:
df_nc.to_csv(writePath + 'nc.csv', index=False)
df_mci.to_csv(writePath + 'mci.csv', index=False)
df_alz.to_csv(writePath + 'alzd.csv', index=False)
df_trans.to_csv(writePath + 'trans.csv', index=False)


print("Data writing to CSV complete!")

Data writing to CSV complete!


### Exploring data structure
I need to find out how NACC data is structured on a deeper level. I will look at some nifti files. 

In [49]:
import os
import nibabel as nib

In [56]:
dataPath = '../../NACC_data/within1yr/nifti/'

In [57]:
example = os.path.join(dataPath, '943_NACC154191_20200228ni/943_NACC154191_20200228/t1sag_208_3_128401136192408473843014414135288321582809643317.nii')

In [58]:
img = nib.load(example)

FileNotFoundError: No such file or no access: '../../NACC_data/within1yr/nifti/943_NACC154191_20200228ni/943_NACC154191_20200228/t1sag_208_3_128401136192408473843014414135288321582809643317.nii'

In [None]:
header = img.header
print(header)