<a href="https://colab.research.google.com/github/kirbyju/TCIA_Notebooks/blob/main/TCIA_PROSTATEx_MR_Classification_Challenge.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## PROSTATEx Summary

The PROSTATEx Challenge ("SPIE-AAPM-NCI Prostate MR Classification Challenge”) focused on  quantitative image analysis methods for the diagnostic classification of clinically significant prostate cancers and was held in conjunction with the 2017 SPIE Medical Imaging Symposium.  PROSTATEx ran from November 21, 2016 to January 15, 2017, though a "live" version has also been established at https://prostatex.grand-challenge.org  which serves as an ongoing way for researchers to benchmark their performance for this task.

## Acknowledgements
This notebook was created by [Justin Kirby](https://www.linkedin.com/in/justinkirby82/) and [Ahmed Harouni](https://www.linkedin.com/in/ahmed-el-harouni-019a8a22/) as a pilot for the [MONAI](https://monai.io/) Datasets Program. 

If you publish based on this dataset please include the following citations in your paper as required by the TCIA Data Usage Policy:

* Geert Litjens, Oscar Debats, Jelle Barentsz, Nico Karssemeijer, and Henkjan Huisman. "ProstateX Challenge data", The Cancer Imaging Archive (2017). DOI: 10.7937/K9TCIA.2017.MURS5CL
* Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. DOI: 10.1007/s10278-013-9622-7

## Data Description
The PROSTATEx dataset is described in full on The Cancer Imaging Archive at https://doi.org/10.7937/K9TCIA.2017.MURS5CL. The majority of the challenge instructions and details about how things are organized are covered on the "Detailed Description" tab found beneath the Summary section.

Training and test cohorts were established by the challenge organizers and the following types of data were provided:

* Images (.tcia manifest for the DICOM files)
* Ktrans images (.zip containing .mhd files)
* Lesion information (.zip containing .doc and .xls files)
* Lesion reference thumbnails (.zip containing .bmp files)

Next we will walk through how to download these data from TCIA.


### Downloading and preparing the label data

First, let's create our directory structure and download the relevant label files from TCIA. Then we'll unzip the label data into their respective train/test directories.

In [None]:
# create directories
!mkdir -p /content/PROSTATEx/
!mkdir -p /content/PROSTATEx/Train
!mkdir -p /content/PROSTATEx/Test

In [None]:
# download the Ktrans images (.zip containing .mhd files)
!wget -O /content/PROSTATEx/Train/Ktrans-Train.zip https://app.box.com/shared/static/y871i386j4o9rqwcsms5mqni63nojzx1
!wget -O /content/PROSTATEx/Test/Ktrans-Test.zip https://app.box.com/shared/static/k3iofc0r3ktjnb4f7105lssdzjon95ie

# download the Lesion information (.zip containing .doc and .xls files)
!wget -O /content/PROSTATEx/Train/LesionInfo-Train.zip https://wiki.cancerimagingarchive.net/download/attachments/23691656/ProstateX-TrainingLesionInformationv2.zip?version=2&modificationDate=1483479231532&api=v2
!wget -O /content/PROSTATEx/Test/LesionInfo-Test.zip https://wiki.cancerimagingarchive.net/download/attachments/23691656/ProstateX-TestLesionInformation.zip?version=2&modificationDate=1483479234096&api=v2

# download the Lesion reference thumbnails (.zip containing .bmp files)
!wget -O /content/PROSTATEx/Train/LesionThumb-Train.zip https://wiki.cancerimagingarchive.net/download/attachments/23691656/ProstateX-Screenshots-Train.zip?version=1&modificationDate=1479401241653&api=v2
!wget -O /content/PROSTATEx/Test/LesionThumb-Test.zip https://app.box.com/shared/static/7jn4jtd3pbi9rlurc4pvnkx554lnfljw

In [None]:
# unzip the Ktrans images (.zip containing .mhd files)
!unzip /content/PROSTATEx/Train/Ktrans-Train.zip -d /content/PROSTATEx/Train/PROSTATExKTrans-Train
!unzip /content/PROSTATEx/Test/Ktrans-Test.zip -d /content/PROSTATEx/Test/

# unzip the Lesion information (.zip containing .doc and .xls files)
!unzip /content/PROSTATEx/Train/LesionInfo-Train.zip -d /content/PROSTATEx/Train/
!unzip /content/PROSTATEx/Test/LesionInfo-Test.zip -d /content/PROSTATEx/Test/

# unzip the Lesion reference thumbnails (.zip containing .bmp files)
!unzip /content/PROSTATEx/Train/LesionThumb-Train.zip -d /content/PROSTATEx/Train/
!unzip /content/PROSTATEx/Test/LesionThumb-Test.zip -d /content/PROSTATEx/Test/

### Perform the DICOM data download
Next, we'll download the DICOM data.  This involves installing a command-line tool called NBIA Data Retriever that reads the "manifest" files and downloads the corresponding images from TCIA.  There are 2 separate manifest files (train and test) which we'll need to open with the NBIA Data Retriever.  If there are any network hiccups or other issues the Data Retriever will automatically retry downloading the affected scans.  

### Understanding the DICOM organization and available metadata
The NBIA Data Retriever download directory will contain a file called **metadata.csv** which contains one row per DICOM scan/series.  Columns include a wide variety of DICOM metadata of potential use including Patient IDs, Study UIDs, Series UIDs, Study Descriptions, Study Dates, Series Descriptions, Image Modality, Scanner Manufacturer, Number of Images per Scan and more.

All image data will be saved in the following directory structure:

* Collection Name 
* Patient ID 
* part of Study Date + part of Study ID + part of Study Description + last 5 digits of Study Instance UID 
* part of Series Number + part of Series Description + last 5 digits of Series Instance UID 

The DICOM files in each series folder are first ordered by ordinal position of acquisition number and then by ordinal position of instance number. The files are then assigned numbers with the lowest acquisition being 1 and the lowest instance number within that acquisition being 1, separated by a dash. The numbers are incremented by 1 as the next values are encountered. All values are left-padded with zeros to provide for the ordering within the file system. 

As an example, a series with 2 acquisition numbers with each acquisition having 42 instance numbers would start with 1-01.dcm for the lowest acquisition number and the lowest instance number in that acquisition. The last file in that acquisition would be 1-42.dcm.  Then the second acquisition would start with 2-01.dcm, ending with 2-42.dcm, in that acquisition. For the purposes of ordering, an empty value in either acquisition number or instance number is lower than a file having a value for those number.

In [None]:
# install NBIA Data Retriever software for downloading images 
!mkdir /usr/share/desktop-directories/
!wget -P /content/NBIA-Data-Retriever https://cbiit-download.nci.nih.gov/nbia/releases/ForTCIA/NBIADataRetriever_4.2/nbia-data-retriever-4.2.deb
!dpkg -i /content/NBIA-Data-Retriever/nbia-data-retriever-4.2.deb

# NOTE: If you're running this notebook on something that doesn't support .deb packages you can also try changing the wget line above to point to
#       https://cbiit-download.nci.nih.gov/nbia/releases/ForTCIA/NBIADataRetriever_4.2/NBIADataRetriever-4.2-1.x86_64.rpm
#       Currently it is not possible to use the command line version of the Data Retriever on Mac or Windows, but you can install a GUI version using
#       MAC: https://apps.apple.com/us/app/downloader-app/id1399207860?mt=12
#       Windows: https://cbiit-download.nci.nih.gov/nbia/releases/ForTCIA/NBIADataRetriever_4.2/NBIA%20Data%20Retriever-4.2.msi

In [None]:
# download the manifest for the DICOM training dataset
!wget -O /content/PROSTATEx/Train/PROSTATEx-Train.tcia https://wiki.cancerimagingarchive.net/download/attachments/23691656/PROSTATEx-train.tcia?version=1&modificationDate=1534787030035&api=v2

# download the manifest for the DICOM test dataset
!wget -O /content/PROSTATEx/Test/PROSTATEx-Test.tcia https://wiki.cancerimagingarchive.net/download/attachments/23691656/PROSTATEx-test.tcia?version=1&modificationDate=1534787450806&api=v2

In [None]:
# create copy of DICOM training manifest with only 1st 10 scans for testing purposes
!head -n 16 /content/PROSTATEx/Train/PROSTATEx-Train.tcia > /content/PROSTATEx/Train/PROSTATEx-Train-Sample.tcia

# create copy of DICOM testing manifest with only 1st 10 scans for testing purposes
!head -n 16 /content/PROSTATEx/Test/PROSTATEx-Test.tcia > /content/PROSTATEx/Test/PROSTATEx-Test-Sample.tcia


In [None]:
# Start by downloading the sample (1st 10 training cases) before uncommenting and downloading the full cohort in the steps below
# NOTE: user will have to click to activate the cell below and type "y" to agree to the data usage policy and initiate the download

!/opt/nbia-data-retriever/nbia-data-retriever --cli /content/PROSTATEx/Train/PROSTATEx-Train-Sample.tcia -d /content/PROSTATEx/Train/DICOM/

In [None]:
# Start by downloading the sample (1st 10 test cases) before uncommenting and downloading the full cohort in the steps below
# NOTE: user will have to click to activate the cell below and type "y" to agree to the data usage policy and initiate the download

!/opt/nbia-data-retriever/nbia-data-retriever --cli /content/PROSTATEx/Test/PROSTATEx-Test-Sample.tcia -d /content/PROSTATEx/Test/DICOM/

In [None]:
# execute the download of the Training Cohort using the name of the file you saved earlier
# NOTE: user will have to click to activate the cell below and type "y" to agree to the data usage policy and initiate the download
# NOTE: This may take several hours while <x gbytes are downloaded>.  Please verify you have enough disk space before starting.

#!/opt/nbia-data-retriever/nbia-data-retriever --cli /content/PROSTATEx/Train/PROSTATEx-Train.tcia -d /content/PROSTATEx/Train/DICOM/

In [None]:
# execute the download of the Test Cohort using the name of the file you saved earlier
# NOTE: user will have to activate the cell below and type "y" to agree to the data usage policy and initiate the download
# NOTE: This may take several hours while <x gbytes are downloaded>. Please verify you have enough disk space before starting.

#!/opt/nbia-data-retriever/nbia-data-retriever --cli /content/PROSTATEx/Test/PROSTATEx-Test.tcia -d /content/PROSTATEx/Test/DICOM/

# MONAI Section!! Insert your next steps here and let me know if you have any questions.

In [None]:
## import DICOM metadata from the metadata.csv?

