**Final Project: Quantifying Hippocampus Volume for Alzheimer's Progression**
Name: Anup Sharma, MD PhD

**Background**

Alzheimer's disease (AD) is a progressive neurodegenerative disorder that results in impaired neuronal (brain cell) function and eventually, cell death. AD is the most common cause of dementia. Clinically, it is characterized by memory loss, inability to learn new material, loss of language function, and other manifestations.

For patients exhibiting early symptoms, quantifying disease progression over time can help direct therapy and disease management.

A radiological study via MRI exam is currently one of the most advanced methods to quantify the disease. In particular, the measurement of hippocampal volume has proven useful to diagnose and track progression in several brain disorders, most notably in AD. Studies have shown a reduced volume of the hippocampus in patients with AD.

The hippocampus is a critical structure of the human brain (and the brain of other vertebrates) that plays important roles in the consolidation of information from short-term memory to long-term memory. In other words, the hippocampus is thought to be responsible for memory and learning. 

According to Nobis et al., 2019, the volume of hippocampus varies in a population, depending on various parameters, within certain boundaries, and it is possible to identify a "normal" range taking into account age, sex and brain hemisphere.

There is one problem with measuring the volume of the hippocampus using MRI scans, though - namely, the process tends to be quite tedious since every slice of the 3D volume needs to be analyzed, and the shape of the structure needs to be traced. The fact that the hippocampus has a non-uniform shape only makes it more challenging. 

As you might have guessed by now, we are going to build a piece of Al software that could help clinicians perform this task faster and more consistently.

You have seen throughout the course that a large part of Al development effort is taken up by curating the dataset and proving clinical efficacy. In this project, we will focus on the technical aspects of building a segmentation model and integrating it into the clinician's workflow, leaving the dataset curation and model validation questions largely outside the scope of this project.

**What Will Be Built**

In this project you will build an end-to-end Al system which features a machine learning algorithm that integrates into a clinical-grade viewer and automatically measures hippocampal volumes of new patients, as their studies are committed to the clinical imaging archive.
Fortunately you won't have to deal with full heads of patients. Our (fictional) radiology department runs a HippoCrop tool which cuts out a rectangular portion of a brain scan from every image series, making your job a bit easier, and our committed radiologists have collected and annotated a dataset of relevant volumes, and even converted them to NIFTI format!
You will use the dataset that contains the segmentations of the right hippocampus and you will use the U-Net architecture to build the segmentation model.
After that, you will proceed to integrate the model into a working clinical PACS such that it runs on every incoming study and produces a report with volume measurements.

**The Dataset**

We are using the "Hippocampus" dataset from the Medical Decathlon competition. (http://medicaldecathlon.com/) This dataset is stored as a collection of NIFTI files, with one file per volume, and one file per corresponding segmentation mask. The original images here are T2 MRI scans of the full brain. As noted, in this dataset we are using cropped volumes where only the region around the hippocampus has been cut out. This makes the size of our dataset quite a bit smaller, our machine learning problem a bit simpler and allows us to have reasonable training times. You should not think of it as "toy" problem, though. Algorithms that crop rectangular regions of interest are quite common in medical imaging. Segmentation is still hard.

**Local Environment**

If you would like to run the project locally, you would need a Python 3.7+ environment with the following libraries for the first two sections of the project:
• PyTorch (preferably with CUDA)
• nibabel • matplotlib
• numpy.
• pydicom
• Pillow (should be installed with pytorch)
• tensorboard

In the 3rd section of the project we will be working with three software products for emulating the clinical network. You would need to install and configure:

• Orthanc server for PACS emulation
• OHIF zero-footprint web viewer for viewing images. Note that if you deploy OHIF from its github repository, at the moment of writing the repo includes a yarn script (orthanc: up) where it downloads and runs the Orthanc server from a Docker container. If that works for you, you won't need to install Orthanc separately.
• If you are using Orthan (or other DICOMWeb server), you will need to configure OHIF to read data from your server. OHIF has instructions for this: https://docs.ohif.org/configuring/datasource.html
• In order to fully emulate the Udacity workspace, you will also need to configure Orthanc for auto-routing of studies to automatically direct them to your Al algorithm. For this you will need to take the script that you can find at section3/src/deploy_scripts/route_dicoms. lua and install it to Orthanc as explained on this page: https://book.orthancserver.com/users/lua.html
• DCMTK tools for testing and emulating a modality. Note that if you are running a Linux distribution, you might be able to install demtk directly from the package manager (e.g. apt-get install dcmtk in Ubuntu)

**Project Rubric:**

Part 1: Curating a Dataset of Brain MRIs

1. Dataset has been cleaned and outliers have been removed 
Correctly identified and removed the irrelevant files from the given dataset through inspection of the dataset

2. The project shows an understanding of how to apply medical metadata inspection methods to discover the physical dimensions of anatomical structures.
The project shows an understanding of how to apply medical metadata inspection methods to discover the physical dimensions of anatomical structures.

3. The project shows an understanding of how to extract pixel data for visualization. 
Jupyter Notebook contains renderings of medical volume slices that help inspect dataset slices and validate assumptions that one might have about how pixel data is stored in the arrays read from disk.

**Part 2: Training a Segmentation CNN**

Machine learning scripts run without errors and perform training and validation of the machine learning model.
There should be no <YOUR CODE HERE> blocks in the .py files of the project. All the TASK comments should be followed by blocks of code that perform the required actions or answers to questions.
Out folder contains model.pth file, about ~100Mb in size

Project shows evidence that a system was established allowing the monitoring of progress via Tensorboard
Script establishes proper logging of scalar and image data into Tensorboard folders, and monitoring is performed using the Tensorboard server.
Output folder includes screenshots of train/validation loss plots.
Plots could look like this: (ex: TensorBoard Scalars for Step)

Create a test code that runs without errors and computes volumetric performance measurements.
Code in utils/volume_stats.py/Jaccard3D should contain no <YOUR CODE HERE BLOCK>, should contain implementation of the metric and return the computed score.
Out folder contains results.json file that is a correct JSON and has at least Dice and Jaccard metrics.

**Integrating into a Clinical Network**

The inferencing code for DICOM volumes is complete

All TASK items in inference_dcm.py should be addressed. A sample report file should be included along with a screenshot/png/jpg version of the said report.
A good report screenshot may look like this:

Complete inferencing code for creating reports and pushing them back.
The student’s report can be viewed in the OHIF image viewer solution. The report at least has numerical values of the volume of the hippocampus structure. Here is an example:

Create a validation plan.
Out folder contains a validation plan. The plan should be in the freeform format, about 1-2 pages and should hit on topics:

What is the intended use of the product?
How was the training data collected?
How did you label your training data?
How was the training performance of the algorithm measured and how is the real-world performance going to be estimated?
What data will the algorithm perform well in the real world and what data it might not perform well on?

**Suggestions to Make Your Project Stand Out**

Write an explanation of how the algorithm works for clinicians.
Explain requirements for the training process (compute, memory), suggestions for making it more efficient (model architecture, data pipeline, loss functions, data augmentation). What kind of data augmentations would NOT add value?
Implement additional metrics in testing reports - sensitivity, specificity, accuracy, etc. Include an explanation of those in the #1 writeup.
Propose a better way of filtering study for the correct series.
Can you think of what would make the report you generate from your inference better? What would be the relevant information that you could present which would help a clinician better reason about whether your model performed well or not? Can you make it look nicer by making it an RGB image (hint - lookup in DICOM spec(opens in a new tab))?
Try to construct a fully valid DICOM as your model output (per DICOM PS3.3#A8(opens in a new tab)) with all relevant fields. Construction of valid DICOM has a very calming effect on the mind and body.
Try constructing a DICOM image with your segmentation mask so that you can overlay it on the original image using the clinical image viewer.

**Submission Checklist:**

[ ] Everything in the Rubric is complete.

The following are in Section 1's / section1/out/ folder/directory.

[ ] Curated dataset with labels, as collection of NIFTI files.

[ ] A Python Notebook or Python File with the results of your Exploratory Data Analysis.

The following are in Section 2's / section2/out/ folder/directory.

[ ] Functional code that trains the segmentation model.

[ ] Test report with Dice scores on test set (can be json file).

[ ] Screenshots from your Tensorboard (or other visualization engine) output.

[ ] Your trained model PyTorch parameter file (model.pth)

The following are in Section 3's /section3/out/ folder/directory.

[ ] Code that runs inference on a DICOM volume and produces a DICOM report.

[ ] A report.dcm file with a sample report.

[ ] Screenshots of your report shown in the OHIF viewer.

[ ] 1-2 page Validation Plan.



**Section 1: Curating a Dataset of Brain MRIs**

Project Master: https://github.com/udacity/nd320-c3-3d-imaging-starter/tree/master

Data is located: https://github.com/udacity/nd320-c3-3d-imaging-starter/tree/master/data/TrainingSet

In the project directory called section1 you will find a Python Notebook that has a few instructions in it that will help you inspect the dataset, understand the clinical side of the problem a bit better, and get it ready for consumption by your algorithm in Section 2 (later). 

The notebook has 2 types of comments:
- Comments marked with # TASK: are tasks, instructions, or questions you have to complete.
- Comments not marked are not mandatory but are suggestions, questions, or background that will help you get a better understanding of the subject and apply your newly acquired medical imaging dataset EDA skills.

Once you complete the tasks, copy the following to the directory section1/out:
- Curated dataset with labels, as collection of NIFTI files. Amount of training image volumes should be the same as the amount of label volumes.
- An updated Python Notebook including information from Section 1 with the results of your Exploratory Data Analysis. 


