# Download dataset form NIH cancer imaging archive  
This cancer imaging archive is a great source for medical data sets. 
This notebook will walk you through:
1- Selecting a dataset
2- Downloading your datset
3- Converting data dicom images to nifti 
4- Converting data dicom seg images to nifti
5- Creating your dataset json to begin training your AI models 


# 0. Clone code 
Clone code from tcia_downloader repo, move it to this directory

In [None]:
!git clone https://github.com/lescientifik/tcia_downloader /claraDevDay/Data/TCIA/tmp
!mv -vn /claraDevDay/Data/TCIA/tmp/* /claraDevDay/Data/TCIA/
!mv -vn /claraDevDay/Data/TCIA/tmp/src/* /claraDevDay/Data/TCIA/src/

# 1. Select images to download
You can find out different studies from [their site](https://www.cancerimagingarchive.net/collections/). 
Using the [online tool](https://nbia.cancerimagingarchive.net/nbia-search/), 
you can download list of images you would like to download.
<br><img src="../screenShots/TCIADownload.png" alt="Drawing" style="width: 600px; height: 400px;"/><br>
After adding images into your cart you would download a tcia file that you can use to download your dataset 

In [None]:
DataRoot="/claraDevDay/Data/"
CodeRoot="/claraDevDay/Data/tcia-master"
%cd $CodeRoot
!pwd


### Download needed conversion tool 

In [None]:
! wget https://github.com/QIICR/dcmqi/releases/download/v1.2.2/dcmqi-1.2.2-linux.tar.gz && \
    tar xf dcmqi-1.2.2-linux.tar.gz && \
    cp dcmqi-1.2.2-linux/bin/segimage2itkimage /usr/local/bin/ && \
    rm -rf dcmqi-1.2.2-linux*


# 1. Setup data directories 


In [None]:
TCIA_FileName="TCGA-GBM_2Patients_1.tcia"
TCIA_FileName="C4KC-KiTS_5.tcia"
TCIA_FileName="NSCLC_5.tcia" # 6 labels + background needs flag -f ap on converting


In [None]:
TCIA_FilePath=DataRoot+"tcia-master/SampleTCIA/"+TCIA_FileName
FLD_NAME=DataRoot+TCIA_FileName[:-5]
FLD_NAME_ZIP=FLD_NAME+"/ZIP"
FLD_NAME_NII=FLD_NAME+"/nii"
FLD_NAME_DCM=FLD_NAME+"/DCM"

# 2. Download Dicom data from TCIA 

In [None]:
!python -m src.tcia $TCIA_FilePath $FLD_NAME_ZIP

#### Check on downloaded files 

In [None]:
! ls -la $FLD_NAME_ZIP

# 3. Unzip files 

In [None]:
!python src/unzip.py $FLD_NAME_ZIP $FLD_NAME_DCM

#### Check on unzipped dicom files 

In [None]:
! ls -la $FLD_NAME_DCM

# 4. Convert Dicom seg to Nifti

In [None]:
!python convert2nii.py $FLD_NAME_DCM -o $FLD_NAME_NII -f ap

#### Check on label nifti files 

In [None]:
! ls -la $FLD_NAME_NII

# 5. Convert Dicom Images to Nifti

In [None]:
# ! /claraDevDay/Data/tcia-master/3rdParty/dcm2niix -h
! /claraDevDay/Data/tcia-master/3rdParty/dcm2niix -f %i -z y -o $FLD_NAME_NII $FLD_NAME_DCM
# !python convert2nii.py $FLD_NAME_DCM -o $FLD_NAME_NII -d

#### Check on nifti image files 

In [None]:
! ls -la $FLD_NAME_NII



# 6. Create dataset.Json File 

In [None]:
import fileIO
import glob

DATA_ROOT = FLD_NAME_NII
wrtfilename = DATA_ROOT + "dataSet.json"

print("finding files in "+DATA_ROOT)
dataJson=fileIO.DataJson(DATA_ROOT)
for fName in glob.glob(DATA_ROOT+'/*_seg.nii.gz',recursive=True):
#     print (fName)
    fName= fName.replace(DATA_ROOT,"") # remove full path
    gtName = fName
    imgName = fName.replace("_seg", "")
    dataJson.appendDataPt(imgName,gtName)
dataJson.write2file(wrtfilename)



# 7. Visualization
Now lets visualize one image from the downloaded data


In [None]:
import matplotlib.pyplot as plt

vol,_ = fileIO.openNifti(dataJson.getItemAt(0))
seg,_ = fileIO.openNifti(dataJson.getItemAt(0,"label"))

plt.figure(figsize=(40,40))
rows = 5
for i,z in enumerate(range(30,vol.shape[2],20)):
    print("in ",str(i) ,"z = ",z)
    plt.subplot(rows, 2, 2*i+1 )
    plt.imshow(vol[:,:,z])
    plt.subplot(rows, 2, 2*i+2 )
    plt.imshow(seg[:,:,z])
