# Download CT Head dataset from Qure.ai
## Goal
Download publicly available sample data from the Qure.ai Head CT dataset.

## Data license
Please register and acknowledge the dataset license before use:
http://headctstudy.qure.ai/dataset

## Directory setup
Create a data download folder called `zips`.

In [1]:
import os

zip_dir = 'zips'
if not os.path.isdir(zip_dir):
    os.mkdir(zip_dir)

## Download using wget
Now we can download the data from the AWS could using `wget`.

* **Please check the included `dicom2deployment-initial-setup.ipynb` for `wget` installation instructions.**
* Ensure that the provided text file `qure-headCT-100.txt` in your current working directory.
* <span style="color:red">WARNING: The following download code may take ~30 mins or longer to run and will use over 7 GB of disk space</span>

In [10]:
import wget
import csv

# get list of URLs for dataset
urls_file = './qure_headCT_100.txt'
with open(urls_file, 'r') as f:
    reader = csv.reader(f)
    urls = list(reader)
    
# download the data
for i, url in enumerate(urls, 1):
    print(f"Downloading on zip {i:03d} of {len(urls):03d}")
    _ = wget.download(url[0], os.path.join(zip_dir, '.'))

Downloading on zip 001 of 4
Downloading on zip 002 of 4
Downloading on zip 003 of 4
Downloading on zip 004 of 4


## Unzip files
Unzip the image DICOM data zip files into a new directory named `unzip`:

In [12]:
import zipfile
from glob import glob

# make an output directory for the unzipped DICOM data called unzip
unzip_dir = 'unzip'
if not os.path.isdir(unzip_dir):
    os.mkdir(unzip_dir)

# get a list of all the zipfiles that we downloaded
zips = sorted(glob(os.path.join(zip_dir, '*.zip')))

# loop through zips and unzip them into the dicom directory
for i, z in enumerate(zips, 1):
    print(f"Working on zip {z}, {i:03d} of {len(zips):03d}")
    # get the patient ID from the filename
    patientID = z.split(".")[0].split("-")[-1].zfill(3)
    # make an output directory to unzip files into
    out_path = os.path.join(unzip_dir, patientID)
    # unzip all
    with zipfile.ZipFile(z, 'r') as zref:
        zref.extractall(out_path)

Working on zip zips/CQ500-CT-0.zip, 001 of 100
Working on zip zips/CQ500-CT-102.zip, 002 of 100
Working on zip zips/CQ500-CT-103.zip, 003 of 100
Working on zip zips/CQ500-CT-107.zip, 004 of 100
Working on zip zips/CQ500-CT-110.zip, 005 of 100
Working on zip zips/CQ500-CT-113.zip, 006 of 100
Working on zip zips/CQ500-CT-114.zip, 007 of 100
Working on zip zips/CQ500-CT-12.zip, 008 of 100
Working on zip zips/CQ500-CT-121.zip, 009 of 100
Working on zip zips/CQ500-CT-126.zip, 010 of 100
Working on zip zips/CQ500-CT-130.zip, 011 of 100
Working on zip zips/CQ500-CT-140.zip, 012 of 100
Working on zip zips/CQ500-CT-142.zip, 013 of 100
Working on zip zips/CQ500-CT-15.zip, 014 of 100
Working on zip zips/CQ500-CT-150.zip, 015 of 100
Working on zip zips/CQ500-CT-151.zip, 016 of 100
Working on zip zips/CQ500-CT-153.zip, 017 of 100
Working on zip zips/CQ500-CT-156.zip, 018 of 100
Working on zip zips/CQ500-CT-160.zip, 019 of 100
Working on zip zips/CQ500-CT-162.zip, 020 of 100
Working on zip zips/CQ50

# Find the Correct DICOM Series
## Goal
Find the thin-slice non-contrast head CT series for each patient. For this specific dataset, this series is most commonly called `CT plain thin`.

## Rationale
Each of the extracted zip files represents an entire CT study, which may contain multiple non-contrast and post-contrast series. Here is an example of the directory structure:

```
unzip
└── 500
    └── CQ500CT0 CQ500CT0
        └── Unknown Study
            └── CT 4cc sec 150cc D3D on
                ├── CT000000.dcm
                ├── CT000000.dcm
                ...
            └── CT 4cc sec 150cc D3D on-2
                ├── CT000000.dcm
                ├── CT000000.dcm
                ...
            └── CT 4cc sec 150cc D3D on-3
                ├── CT000000.dcm
                ├── CT000000.dcm
                ...
            └── CT Plain
                ├── CT000000.dcm
                ├── CT000000.dcm
                ...
            └── CT PLAIN THIN
                ├── CT000000.dcm
                ├── CT000000.dcm
                ...
...
```
Our goal is to find only the specific DICOM series that we want to work with (in this case `CT PLAIN THIN`) and move it to a new directory called `dicom` with the following structure:

```
dicom
└── 001
    ├── CT000000.dcm
    ├── CT000001.dcm
    ...
└── 100
...
```

In [13]:
import shutil

# make a data output directory
dicom_dir = 'dicom'
if not os.path.isdir(dicom_dir):
    os.mkdir(dicom_dir)

# define the target study names
target_study_names = ['thin', 'plain']

# get the list of study directories
study_dirs = sorted(glob(os.path.join(unzip_dir, '*', '')))

# search through each study directory for the correct series
for i, study_dir in enumerate(study_dirs, 1):
    print(f"Working on study {study_dir}, {i:03d} of {len(study_dirs):03d}")
    series_dirs = glob(os.path.join(study_dir, '**', ''), recursive=True)
    found_series = []
    while not found_series:
        for series_dir in series_dirs:
            # find a series directory with correct name (includes the words 'thin' AND 'plain') and at least 100 DICOM files
            if all([name in series_dir.lower() for name in target_study_names]) and len(glob(os.path.join(series_dir, '*.dcm'))) > 100:
                found_series = series_dir
        break

    # copy the series we identified to the data directory
    if found_series and os.path.isdir(found_series):
        patientID = os.path.basename(study_dir.rstrip(os.path.sep))
        out_dir = os.path.join(dicom_dir, patientID)
        shutil.copytree(found_series, out_dir, dirs_exist_ok=True)
    

Working on study unzip/000/, 001 of 100
Working on study unzip/003/, 002 of 100
Working on study unzip/008/, 003 of 100
Working on study unzip/012/, 004 of 100
Working on study unzip/015/, 005 of 100
Working on study unzip/022/, 006 of 100
Working on study unzip/025/, 007 of 100
Working on study unzip/026/, 008 of 100
Working on study unzip/027/, 009 of 100
Working on study unzip/029/, 010 of 100
Working on study unzip/030/, 011 of 100
Working on study unzip/041/, 012 of 100
Working on study unzip/054/, 013 of 100
Working on study unzip/067/, 014 of 100
Working on study unzip/068/, 015 of 100
Working on study unzip/073/, 016 of 100
Working on study unzip/092/, 017 of 100
Working on study unzip/102/, 018 of 100
Working on study unzip/103/, 019 of 100
Working on study unzip/107/, 020 of 100
Working on study unzip/110/, 021 of 100
Working on study unzip/113/, 022 of 100
Working on study unzip/114/, 023 of 100
Working on study unzip/121/, 024 of 100
Working on study unzip/126/, 025 of 100


# Convert relevant series to NIfTI
## Goal
Convert the correct series from DICOM to NIfTI for further processing.

## Rationale
Most of the image processing tools that we will be using require NIfTI files and will not work on raw DICOM data.

## Convert DICOM to NIfTI using dcm2niix
Now we will convert each DICOM series identified in the previous step into a single NIfTI file.
* **Please check the included `dicom2deployment-initial-setup.ipynb` for `dcm2niix` installation instructions.**

## Example dcm2niix command
```dcm2niix -b n -z y -s y -f %f -w 1 ./dicom/<studyID>```

The terminal command above is an example of how to run dcm2niix for an individual series directory in our data directory. We will automated this process using python below.

### Explanation of options
* `-b n`: do not save metadata (json file contains metadata from the study). 
* `-z y`: enable gzip compression. 
* `-s y`: the output will be a single NIFTI file.
* `-f %f`: format string for the output filename - we will simply use the folder name, which is specified using %f.
* `-w 1`: handle filename conflicts by overwriting existing.
For more details of what options you may use, please refer to: https://manpages.ubuntu.com/manpages/jammy/en/man1/dcm2niix.1.html

## Automating dcm2niix with a python loop
The following code will automate the necessary dcm2niix calls for each DICOM directory.
<span style="color:red">WARNING: The following NIfTI conversion code may take ~8-10 mins to run.</span>
<span style="color:yellow">WARNING: If you installed dcm2niix using pip, not all of the exams will be successfully converted to NIfTI. Please follow the recommended installation instructions in `dicom2deploy-initial-setup.ipynb` to avoid this issue.</span>

In [14]:
import subprocess

# get list of DICOM directories to convert
data_dirs = sorted(glob(os.path.join(dicom_dir, '*', '')))

# loop through DICOM directories and convert to NIfTI using options above
for i, input_dir in enumerate(data_dirs, 1):
    print(f"Working on study {input_dir}, {i:03d} of {len(data_dirs):03d}")
    # build directory specific command
    cmd = f"dcm2niix -b n -z y -s y -f %f -w 1 {input_dir}"
    # run command and suppress terminal output
    _ = subprocess.call(cmd, shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
    

Working on study dicom/000/, 001 of 100
Working on study dicom/003/, 002 of 100
Working on study dicom/008/, 003 of 100
Working on study dicom/012/, 004 of 100
Working on study dicom/015/, 005 of 100
Working on study dicom/022/, 006 of 100
Working on study dicom/025/, 007 of 100
Working on study dicom/026/, 008 of 100
Working on study dicom/027/, 009 of 100
Working on study dicom/029/, 010 of 100
Working on study dicom/030/, 011 of 100
Working on study dicom/041/, 012 of 100
Working on study dicom/054/, 013 of 100
Working on study dicom/067/, 014 of 100
Working on study dicom/068/, 015 of 100
Working on study dicom/073/, 016 of 100
Working on study dicom/092/, 017 of 100
Working on study dicom/102/, 018 of 100
Working on study dicom/103/, 019 of 100
Working on study dicom/107/, 020 of 100
Working on study dicom/110/, 021 of 100
Working on study dicom/113/, 022 of 100
Working on study dicom/114/, 023 of 100
Working on study dicom/121/, 024 of 100
Working on study dicom/126/, 025 of 100


# Selection of output NIfTI and gantry tilt correction
## Goal
To identify the gantry tilt corrected NIfTI image, if available.

## Rationale

The output of dcm2niix may include both standard and gantry tilt corrected NIfTI files:

* `*.nii.gz`: The head CT NIfTI image without gantry tilt correction.
* `*_Tilt_1.nii.gz`: The head CT NIfTI image with gantry tilt correction, if applicable.

![Gantry Tilt](img/gantry-tilt.jpg)

Gantry tilt correction will only be applied when there was a CT gantry tilt during acquisition. Not all datasets will be affected by gantry tilt; however, gantry tilt correction can be important, particularly for 3D segmentation tasks. 

## Selecting gantry tilt corrected NIfTIs
We will select the gantry tilt corrected image if available, otherwise we will select the non-corrected image. Finally, once we identify the correct NIfTI file, we will move it to a separate output directory for subsequent processing:

In [15]:
# make the output NIfTI directory
nifti_dir = 'nifti'
if not os.path.isdir(nifti_dir):
    os.mkdir(nifti_dir)

# get the list of study directories
study_dirs = sorted(glob(os.path.join(dicom_dir, '*', '')))

# loop through study directories and find the correct NIfTI file
for i, study_dir in enumerate(study_dirs, 1):
    print(f"Working on study {study_dir}, {i:03d} of {len(study_dirs):03d}")
    # get list of niftis for this study sorted in alphabetical order
    niftis = sorted(glob(os.path.join(study_dir, "*.nii.gz")))
    # case: no NIfTI files
    if not niftis:
        fpath = None
    # case: only 1 NIfTI
    elif len(niftis) == 1:
        fpath = niftis[0]
    # case: more than 1 NIfTI - select last one, which will be gantry tilt corrected due to alphabetical file sorting
    else:
        fpath = niftis[-1]
    
    # move the NIfTI file to the output directory
    if fpath:
        nifti_out_path = os.path.join(nifti_dir, os.path.basename(fpath).split('.')[0].split('_')[0] + '.nii.gz')
        shutil.move(fpath, nifti_out_path)

Working on study dicom/000/, 001 of 100
Working on study dicom/003/, 002 of 100
Working on study dicom/008/, 003 of 100
Working on study dicom/012/, 004 of 100
Working on study dicom/015/, 005 of 100
Working on study dicom/022/, 006 of 100
Working on study dicom/025/, 007 of 100
Working on study dicom/026/, 008 of 100
Working on study dicom/027/, 009 of 100
Working on study dicom/029/, 010 of 100
Working on study dicom/030/, 011 of 100
Working on study dicom/041/, 012 of 100
Working on study dicom/054/, 013 of 100
Working on study dicom/067/, 014 of 100
Working on study dicom/068/, 015 of 100
Working on study dicom/073/, 016 of 100
Working on study dicom/092/, 017 of 100
Working on study dicom/102/, 018 of 100
Working on study dicom/103/, 019 of 100
Working on study dicom/107/, 020 of 100
Working on study dicom/110/, 021 of 100
Working on study dicom/113/, 022 of 100
Working on study dicom/114/, 023 of 100
Working on study dicom/121/, 024 of 100
Working on study dicom/126/, 025 of 100


# Resampling NIfTI images
## Goal
Resample NIfTI images of multiple different anisotropic voxel sizes to a single, uniform voxel size. 

## Rationale
There are many reasons why you may want to resample all of your image data to the same isotropic resolution. In this case, we are primarily doing this to reduce the dataset size and improve speed of our subsequent code.

## Resampling
Below is the code for resampling the NIfTI images and placing the resampled outputs in a new directory. Note that this section of code requires some python packages that are not installed by default. Please check the `dicom2deployment-initial-setup.ipynb` notebook for instructions on installing the necessary python packages.

In [16]:
import nibabel as nib
import numpy as np
from scipy.ndimage import zoom

# make the resampled output NIfTI directory
resampled_nifti_dir = 'nifti_resampled'
if not os.path.isdir(resampled_nifti_dir):
    os.mkdir(resampled_nifti_dir)

# function to resample NIfTIs to a desired isotropic voxel size
def resample_nifti(input_path, output_path, isotropic_voxel_size=1.):
    # Load the NIfTI image
    nii = nib.load(input_path)
    img = nii.get_fdata()

    # Determine the current voxel sizes
    current_voxel_sizes = nii.header.get_zooms()

    # Calculate the target voxel sizes for isotropic 1 mm
    target_voxel_sizes = [isotropic_voxel_size] * 3

    # Calculate the resampling factor for each axis
    resampling_factor = [current_size / target_size for current_size, target_size in zip(current_voxel_sizes, target_voxel_sizes)]
    
    # Perform resampling using scipy.ndimage.zoom
    resampled_img = zoom(img, tuple(resampling_factor), order=1, mode='nearest')
    
    # update header and affine for the output NIfTI
    new_affine = np.copy(nii.header.get_best_affine())
    new_affine[:3, :3] /= resampling_factor
    new_header = nii.header.copy()
    new_header.set_zooms(target_voxel_sizes)
    new_header.set_qform(new_affine, code=1)
    new_header.set_sform(new_affine, code=1)
    
    # Create a new NIfTI image with resampled data and updated header
    resampled_nii = nib.Nifti1Image(resampled_img, new_affine, new_header)

    # Save the resampled image
    nib.save(resampled_nii, output_path)

# get a list of the existing NIfTI files
nifti_files = sorted(glob(os.path.join(nifti_dir,  "*.nii.gz")))

# loop through existing NIfTI files, resample to desired resolution, and save in resampled output directory
for i, input_nifti in enumerate(nifti_files, 1):
    print(f"Working on NIfTI {input_nifti}, {i:03d} of {len(nifti_files):03d}")
    output_nifti = os.path.join(resampled_nifti_dir, os.path.basename(input_nifti))
    resample_nifti(input_nifti, output_nifti, isotropic_voxel_size=1.)
    

Working on NIfTI nifti/000.nii.gz, 001 of 100
Working on NIfTI nifti/003.nii.gz, 002 of 100
Working on NIfTI nifti/008.nii.gz, 003 of 100
Working on NIfTI nifti/012.nii.gz, 004 of 100
Working on NIfTI nifti/015.nii.gz, 005 of 100
Working on NIfTI nifti/022.nii.gz, 006 of 100
Working on NIfTI nifti/025.nii.gz, 007 of 100
Working on NIfTI nifti/026.nii.gz, 008 of 100
Working on NIfTI nifti/027.nii.gz, 009 of 100
Working on NIfTI nifti/029.nii.gz, 010 of 100
Working on NIfTI nifti/030.nii.gz, 011 of 100
Working on NIfTI nifti/041.nii.gz, 012 of 100
Working on NIfTI nifti/054.nii.gz, 013 of 100
Working on NIfTI nifti/067.nii.gz, 014 of 100
Working on NIfTI nifti/068.nii.gz, 015 of 100
Working on NIfTI nifti/073.nii.gz, 016 of 100
Working on NIfTI nifti/092.nii.gz, 017 of 100
Working on NIfTI nifti/102.nii.gz, 018 of 100
Working on NIfTI nifti/103.nii.gz, 019 of 100
Working on NIfTI nifti/107.nii.gz, 020 of 100
Working on NIfTI nifti/110.nii.gz, 021 of 100
Working on NIfTI nifti/113.nii.gz,

# Finished!
Now we are all done with data preparation! The 'nifti_resampled' directory will be the dataset directory that we use for MONAILabel.