# Project
For this project, let's assume that we have measured the brain volumes for all 400 images in the OASIS dataset and stored them in a csv file. We have also access to all image data sets.

This demo is a jupyter notebook, i.e. intended to be run step by step.

Author: Eric Einspänner
<br>
Contributor: Nastaran Takmilhomayouni

First version: 6th of July 2023

Copyright 2023 Clinic of Neuroradiology, Magdeburg, Germany

License: Apache-2.0

## Project description - Part I
For this project part we are using the data stored in a csv file.

With this data, we want to answer some crucial questions:
- ...
- ...


For this question, we can use a two-sample t-test. We will test the null hypothesis that the mean brain volumes shrinks due to Alzheimer's. The likelihood of the null hypothesis being true is assessed by calculating the t-statistic. If the magnitude of t is very far from 0, then we can reject the null hypothesis that the groups are the same.

### 1. Import Python libraries

In [1]:
# Make sure figures appears inline and animations works
# Edit this to ""%matplotlib notebook" when using the "classic" jupyter notebook interface
%matplotlib widget

In [2]:
# Used to change filepaths
from pathlib import Path

# We set up matplotlib and the display function
import matplotlib.pyplot as plt
from IPython.display import display

# import numpy, pandas, pydicom and ...
# ... YOUR CODE FOR TASK 1 ...

import pydicom
import nibabel as nb

### 2. Import and read the dataset

In [None]:
# Load OASIS csv file
df = # ... YOUR CODE FOR TASK 2 ...

### 3. Check the dataset

In [None]:
# Print the first five rows of the table
print(# ... YOUR CODE FOR TASK 3 ...)

# Print prevalence of Alzheimer's Disease
print(# ... YOUR CODE FOR TASK 3 ...)

# Print a correlation table excluding non-numeric columns
print(# ... YOUR CODE FOR TASK 3 ...)

### 4. Testing Group Differences
Let's test the hypothesis that Alzheimer's Disease is characterized by reduced brain volume. To run the t-test, we need the brain volumes for all patients with diagnosed Alzheimer's and without in our sample. Select the 'alzheimers' values with `df.loc`, then specifying the column with "brain volume" values. For the healthy cohort, we change the selected value to 'False'. To run the t-test, import the `ttest_ind()` function from SciPy's stats module. Then, pass the two vectors as our first and second populations. The results object contains the test statistic and the p-value. The p-value corresponds to the probability that the null hypothesis is true.

In this case, the two population samples are independent from each other because they are all separate subjects.

For this exercise, use the OASIS dataset (`df`) and `ttest_ind` to evaluate the hypothesis.

In [None]:
# Import independent two-sample t-test
from scipy.stats import ttest_ind

To better understand the function `ttest_ind` try to get more information about `ttest_ind` with the `help()` function. Which parameters does the function need? What is the output?

In [None]:
help(# ... YOUR CODE FOR TASK 4 ...)

Use DataFrame operations to extract brain volume data for Alzheimer's and typical groups for the column `brain_vol`.

In [None]:
# Select data from "alzheimers" and "typical" groups
brain_alz = # ... YOUR CODE FOR TASK 4 ...
brain_typ = # ... YOUR CODE FOR TASK 4 ...

We can now perform a two-sample t-test between the brain volumes of elderly adults with and without Alzheimer's Disease. Using `results.statistic` and `results.pvalue` as your guide, answer the question: Is there strong evidence that Alzheimer's Disease is marked by smaller brain size?

In [None]:
# Perform t-test of "alz" > "typ"
results = ttest_ind(# ... YOUR CODE FOR TASK 4 ...)
print(# ... YOUR CODE FOR TASK 4 ...)
print(# ... YOUR CODE FOR TASK 4 ...)

Solution: There is some evidence for decreased brain volume in individuals with Alzheimer's Disease. Since the p-value for this t-test is greater than 0.05, we would not reject the null hypothesis that states the two groups are equal.

Visualize the distribution of brain volumes based on whether individuals have Alzheimer's disease or not.

In [None]:
# Show boxplot of brain_vol differences
df.boxplot(# ... YOUR CODE FOR TASK 4 ...)
plt.show()

### 5. Normalizing metrics
We previously saw that there was not a significant difference between the brain volumes of elderly individuals with and without Alzheimer's Disease. To account for this potential confound, we can normalize brain volume with respect to skull size by calculating the brain to skull ratio.

But could a correlated measure, such as "skull volume" be masking the differences?

For this exercise, calculate a new test statistic for the comparison of brain volume between groups, after adjusting for the subject's skull size.

In [None]:
# Adjust `brain_vol` by `skull_vol`
df['adj_brain_vol'] = # ... YOUR CODE FOR TASK 5 ...

Use DataFrame operations to extract brain volume data for Alzheimer's and typical groups for the new column `adj_brain_vol` and perform a two-sample t-test between the brain volumes of elderly adults with and without Alzheimer's Disease. The statistics and the p-value would be interesting here (print statistic and p value).

Using `results.statistic` and `results.pvalue` as your guide, answer the question: Is there strong evidence that Alzheimer's Disease is marked by smaller brain size, relative to skull size?

In [None]:
# Select brain measures by group
brain_alz = df.loc[# ... YOUR CODE FOR TASK 5 ...]
brain_typ = df.loc[# ... YOUR CODE FOR TASK 5 ...]

# Evaluate null hypothesis
results = ttest_ind(# ... YOUR CODE FOR TASK 5 ...)
print(# ... YOUR CODE FOR TASK 5 ...)
print(# ... YOUR CODE FOR TASK 5 ...)

Solution: Yes, reject the null hypothesis! Based on the results.statistic and results.pvalue.

## Project description - Part II
For this project part, we are using DICOM images saved in .dcm formats.

We want to make a comparison between two DICOM images. In orther to do that, we can load one DICOM image and by modifying it create a new DICOM image.
what you can do is to load one OASIS dataset. 
Try to generate a new dataset by :
<br>
    1-changing the scan day
<br>
    2-applying one manual modification ( affine transformation, DICOM header modification)
<br>
At the end you'll have two datasets of one patient

### 1- Load both datasets
Load datasets with pydicom. print and compare DICOM headers.
When was the scan date/time of each dataset?
<br>
you can use pydicom python package

In [1]:
import pydicom

In [4]:
# Load images stored in the following path
file_path = 'files/Day1/MR000000.dcm'
dcm = pydicom.dcmread(file_path)

file_path2 ='files/Day2/MR000000_2.dcm'
dcm2 = pydicom.dcmread(file_path2)

# dcm =  # ... YOUR CODE FOR TASK 1 ...

Our two sets of images were taken on different days. We now need to check this. Take a look at the StudyDate, for example. Are there any other such attributes?

In [6]:
# dcm =  # ... YOUR CODE FOR TASK 1 ...
print('Study Date of the first image is:', dcm.StudyDate)
print('Series Date of the first image is:', dcm.SeriesDate)
print('Acquisition Date of the first image is:', dcm.AcquisitionDate)
print('Content Date of the first image is:', dcm.ContentDate)
print('Instance Creation Date of the first image is:', dcm.InstanceCreationDate)
print('--------------------------------------------------------------')
print('Study Date of the first image is:', dcm2.StudyDate)
print('Series Date of the first image is:', dcm2.SeriesDate)
print('Acquisition Date of the first image is:', dcm2.AcquisitionDate)
print('Content Date of the first image is:', dcm2.ContentDate)
print('Instance Creation Date of the first image is:', dcm2.InstanceCreationDate)

Study Date of the first image is: 20150114
Series Date of the first image is: 20150114
Acquisition Date of the first image is: 20150114
Content Date of the first image is: 20150114
Instance Creation Date of the first image is: 20150114
--------------------------------------------------------------
Study Date of the first image is: 20150315
Series Date of the first image is: 20150315
Acquisition Date of the first image is: 20150315
Content Date of the first image is: 20150315
Instance Creation Date of the first image is: 20150315


### 2- Convert DICOM to NIfTI
In this task we are going to convert DICOM images to NIfTI format and continue working with NIfTI files. We would like to use `dicom2nifti` for this. Search for the documentation and find a suitable function that transfers the folder with the .dcm files into a nii file.

In [25]:
import dicom2nifti
folder_path = 'files/Day1'
folder_path2 = 'files/Day2'
dicom2nifti.convert_directory(folder_path, folder_path)
dicom2nifti.convert_directory(folder_path2, folder_path2)

# Solution 
# loaded DICOM image=# ... YOUR CODE FOR TASK 2 ...
# sitk.WriteImage(loaded DICOM image, path to save NIfTI file)

Now we want to load the transformed nii files with `nibabel`:

In [None]:
import nibabel as nib
imnii = nib.load(folder_path + '201_t2w_tse.nii.gz')
imnii2 = nib.load(folder_path2 + '201_t2w_tse.nii.gz')

### 3- Plot average slice
Let's plot the average slice of NIfTI images. In order to do that, you may use nibabel python package.
After loading the files you will have a three dimensional matrix. To work with matrices, here taking an average, you can use numpy python package.
<br>
Finally, plot average slices of two images together using subplot.


In [3]:
# open nifti image containing all slides as a 3D array
img1=#....  
img2=#....

# average of the 3D array
img1_avg_data=#...
img2_avg_data=#...

### 4- Image Comparison
Here we can compare two images. In orther to do that, you can select different criterias. Yet, here it's recommended to compare two images using MAE, SSIM and Iod which you have learnt through the bootcamp.

In [3]:
#you can use the following function to better plot the results 
def format_and_render_plot():
    '''
    Custom function to simplify common formatting operations for exercises. Operations include: 
    1. Turning off axis grids.
    2. Calling `plt.tight_layout` to improve subplot spacing.
    3. Calling `plt.show()` to render plot.
    '''
    fig = plt.gcf()
    for ax in fig.axes:
        ax.axis('off')    
    plt.tight_layout()
    plt.show()

In [None]:
# MAE mean absolute error
# Calculate image difference
err = img1_avg_data - img2_avg_data

#SSIM 
from skimage.metrics import structural_similarity as ssim

### 5- Image Registration
The final step would be to register one image to another. It happens that we need to register two images on eachother to get more information.
<br>
You can register first image on second image using affine registration offered in dipy.align.transforms python package.

In [None]:
from dipy.align.imaffine import AffineRegistration
affreg = AffineRegistration()
transform = AffineTransform3D()
affine = affreg.optimize(first image data, second image data, transform, params0=None)