<a href="https://colab.research.google.com/github/AIM-Harvard/mhub/blob/colab/mhub/totalsegmentator/notebooks/totalsegmentator_mwe.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **ModelHub - Whole Body CT Segmentation**

This notebook provides a minimal working example of TotalSegmentator, a tool for the segmentation of 104 anatomical structures from CT images. The model was trained using a wide range of imaging CT data of different pathologies from several scanners, protocols and institutions.

We test TotalSegmentator by implementing an end-to-end (cloud-based) pipeline on publicly available whole body CT scans hosted on the [Imaging Data Commons (IDC)](https://portal.imaging.datacommons.cancer.gov/), starting from raw DICOM CT data and ending with a DICOM SEG object storing the segmentation masks generated by the AI model. The testing dataset we use is external and independent from the data used in the development phase of the model (training and validation) and is composed by a wide variety of image types (from the area covered by the scan, to the presence of contrast and various types of artefacts).

The way all the operations are executed - from pulling data, to data postprocessing, and the standardisation of the results - have the goal of promoting transparency and reproducibility.

Please cite the following article if you use this code or pre-trained models:

Wasserthal, J., Meyer, M., Breit, H.C., Cyriac, J., Yang, S. and Segeroth, M., 2022. TotalSegmentator: robust segmentation of 104 anatomical structures in CT images. arXiv preprint arXiv:2208.05868, [
https://doi.org/10.48550/arXiv.2208.05868]( 	
https://doi.org/10.48550/arXiv.2208.05868).

The original code is published on
[GitHub](https://github.com/wasserth/TotalSegmentator)  using the [Apache-2.0 license](https://github.com/wasserth/TotalSegmentator/blob/master/LICENSE).

# **Disclaimer**

The code and data of this repository are provided to promote reproducible research. They are not intended for clinical care or commercial use.

The software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.

## **Environment Setup**

This demo notebook is intended to be run using a GPU.

To access a free GPU on Colab:
`Edit > Notebooks Settings`.

From the dropdown menu under `Hardware accelerator`, select `GPU`. Let's check the Colab instance is indeed equipped with a GPU.

In [1]:
import os
import sys
import shutil

import yaml

import time
import tqdm


# useful information
curr_dir = !pwd
curr_droid = !hostname
curr_pilot = !whoami

print(time.asctime(time.localtime()))
print("\nCurrent directory :", curr_dir[-1])
print("Hostname          :", curr_droid[-1])
print("Username          :", curr_pilot[-1])
print("Python version    :", sys.version.split('\n')[0])

Wed Feb  1 19:13:13 2023

Current directory : /content
Hostname          : c1bf05b02622
Username          : root
Python version    : 3.8.10 (default, Nov 14 2022, 12:59:47) 


The authentication to Google is necessary to run BigQuery queries.

Every operation throughout the whole notebook (BigQuery, fetching data from the IDC buckets) is completely free. The only thing that is needed in order to run the notebook is the set-up of a Google Cloud project. In order for the notebook to work as intended, you will need to specify the name of the project in the cell after the authentication one.

In [2]:
from google.colab import auth
auth.authenticate_user()

In [3]:
from google.colab import files
from google.cloud import storage
from google.cloud import bigquery as bq

# INSERT THE ID OF YOUR PROJECT HERE!
# FIXME
#project_id = "idc-external-030"
project_id = "idc-sandbox-000"

In [4]:
!gcloud projects list | grep "idc-"

idc-dev-etl                     idc-dev-etl                     500129912605
idc-external-018                idc-external-018                907949205650
idc-nlst                        idc-nlst                        821657986137
idc-sandbox-000                 idc-sandbox-000                 370953977065


Throughout this Colab notebook, for image pre-processing we will use [Plastimatch](https://plastimatch.org), a reliable and open source software for image computation. We will be running Plastimatch using the simple [PyPlastimatch](https://github.com/AIM-Harvard/pyplastimatch/tree/main/pyplastimatch) python wrapper. 

In [5]:
%%capture
!pip install yamlmagic

In [6]:
%load_ext yamlmagic

In [7]:
%%capture
!apt install plastimatch

In [8]:
# check plastimatch was correctly installed
!plastimatch --version

plastimatch version 1.8.0


Install Apache's subversion. 

We will use subversion to clone only specific subfolders of the mhub repository.

In [9]:
%%capture
!apt install subversion

---

Start by cloning the AIMI hub repository on the Colab instance.

The AIMI hub repository stores all the code we will use for pulling, preprocessing, processing, and postprocessing the data for this use case - as long as the others shared through AIMI hub.

In [10]:
%%capture
!svn checkout https://github.com/AIM-Harvard/aimi_alpha/trunk/aimi/general_utils/ mhub/aimi_utils

!svn checkout https://github.com/AIM-Harvard/mhub/trunk/mhub/mhubio mhub/mhubio
!svn checkout https://github.com/AIM-Harvard/mhub/trunk/mhub/ymldicomseg mhub/ymldicomseg
!svn checkout https://github.com/AIM-Harvard/mhub/trunk/mhub/totalsegmentator mhub/totalsegmentator

To organise the DICOM data in a more common (and human-understandable) fashion after downloading those from the buckets, we will make use of [DICOMSort](https://github.com/pieper/dicomsort). 

DICOMSort is an open source tool for custom sorting and renaming of dicom files based on their specific DICOM tags. In our case, we will exploit DICOMSort to organise the DICOM data by `PatientID` and `Modality` - so that the final directory will look like the following:

```
data/raw/nsclc-radiomics/dicom/$PatientID
 └─── CT
       ├─── $SOPInstanceUID_slice0.dcm
       ├─── $SOPInstanceUID_slice1.dcm
       ├───  ...
       │
      RTSTRUCT 
       ├─── $SOPInstanceUID_RTSTRUCT.dcm
      SEG
       └─── $SOPInstanceUID_RTSEG.dcm

```

In [11]:
!git clone https://github.com/pieper/dicomsort dicomsort

fatal: destination path 'dicomsort' already exists and is not an empty directory.


We will also use DCMQI for converting the resulting segmentation into standard DICOM SEG objects.

In [12]:
%%capture
dcmqi_release_url = "https://github.com/QIICR/dcmqi/releases/download/v1.2.4/dcmqi-1.2.4-linux.tar.gz"
dcmqi_download_path = "/content/dcmqi-1.2.4-linux.tar.gz"
dcmqi_path = "/content/dcmqi-1.2.4-linux"

!wget -O $dcmqi_download_path $dcmqi_release_url

!tar -xvf $dcmqi_download_path

!mv $dcmqi_path/bin/* /bin

In [13]:
!printf '#!/bin/bash\npython3 /content/dicomsort/dicomsort.py "$@"\n' > /usr/bin/dicomsort
!chmod +x /usr/bin/dicomsort

---

In [14]:
%%capture
!pip install pyplastimatch nnunet ipywidgets
!pip install TotalSegmentator

In [15]:
import shutil
import random

import json
import pprint
import numpy as np
import pandas as pd

import pydicom
import nibabel as nib
import SimpleITK as sitk
import pyplastimatch as pypla

print("Python version               : ", sys.version.split('\n')[0])
print("Numpy version                : ", np.__version__)

# ----------------------------------------

#everything that has to do with plotting goes here below
import matplotlib
matplotlib.use("agg")

import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from matplotlib.patches import Patch

%matplotlib inline
%config InlineBackend.figure_format = "png"

import ipywidgets as ipyw

## ----------------------------------------

# create new colormap appending the alpha channel to the selected one
# (so that we don't get a \"color overlay\" when plotting the segmask superimposed to the CT)
cmap = plt.cm.Reds
my_reds = cmap(np.arange(cmap.N))
my_reds[:, -1] = np.linspace(0, 1, cmap.N)
my_reds = ListedColormap(my_reds)

cmap = plt.cm.Greens
my_greens = cmap(np.arange(cmap.N))
my_greens[:, -1] = np.linspace(0, 1, cmap.N)
my_greens = ListedColormap(my_greens)

cmap = plt.cm.Blues
my_blues = cmap(np.arange(cmap.N))
my_blues[:, -1] = np.linspace(0, 1, cmap.N)
my_blues = ListedColormap(my_blues)

cmap = plt.cm.spring
my_spring = cmap(np.arange(cmap.N))
my_spring[:, -1] = np.linspace(0, 1, cmap.N)
my_spring = ListedColormap(my_spring)
## ----------------------------------------

import seaborn as sns

Python version               :  3.8.10 (default, Nov 14 2022, 12:59:47) 
Numpy version                :  1.21.6


Provided everything was set up correctly, we can run the BigQuery query and get all the information we need to download the testing data from the IDC platform.

For this specific use case, we are going to be working with the "CT lymph nodes" collection hosted on IDC - which groups a collections of series that are close to whole body CT scans.

In [16]:
%%bigquery cohort_df --project=$project_id 

SELECT
  dicom_pivot_v11.PatientID,
  dicom_pivot_v11.collection_id,
  dicom_pivot_v11.source_DOI,
  dicom_pivot_v11.StudyInstanceUID,
  dicom_pivot_v11.SeriesInstanceUID,
  dicom_pivot_v11.SOPInstanceUID,
  dicom_pivot_v11.Modality,
  dicom_pivot_v11.gcs_url
FROM
  `bigquery-public-data.idc_v11.dicom_pivot_v11` dicom_pivot_v11
WHERE
  StudyInstanceUID IN (
    SELECT
      StudyInstanceUID
    FROM
      `bigquery-public-data.idc_v11.dicom_pivot_v11` dicom_pivot_v11
    WHERE
      (
        dicom_pivot_v11.collection_id IN ('Community', 'nsclc_radiomics')
      )
    GROUP BY
      StudyInstanceUID
  )
GROUP BY
  dicom_pivot_v11.PatientID,
  dicom_pivot_v11.collection_id,
  dicom_pivot_v11.source_DOI,
  dicom_pivot_v11.StudyInstanceUID,
  dicom_pivot_v11.SeriesInstanceUID,
  dicom_pivot_v11.SOPInstanceUID,
  dicom_pivot_v11.Modality,
  dicom_pivot_v11.gcs_url
ORDER BY
  dicom_pivot_v11.PatientID ASC,
  dicom_pivot_v11.collection_id ASC,
  dicom_pivot_v11.source_DOI ASC,
  dicom_pivot_v11.StudyInstanceUID ASC,
  dicom_pivot_v11.SeriesInstanceUID ASC,
  dicom_pivot_v11.SOPInstanceUID ASC,
  dicom_pivot_v11.Modality ASC,
  dicom_pivot_v11.gcs_url ASC

Query is running:   0%|          |

Downloading:   0%|          |

In [17]:
# this works as intended only if the BQ query parses data from a single dataset
# if not, feel free to set the name manually!
dataset_name = cohort_df["collection_id"].values[0]

dataset_name

'nsclc_radiomics'

In [18]:
# create the directory tree
!mkdir -p data

!mkdir -p data/input_data 
!mkdir -p data/output_data 
!mkdir -p data/tmp

## **Parsing Cohort Information from BigQuery Tables**

We can check the various fields of the table we populated by running the BigQuery query.

This table will store one entry for each DICOM file in the dataset (therefore, expect thousands of rows!)

In [19]:
pat_id_list = sorted(list(set(cohort_df["PatientID"].values)))

print("Total number of unique Patient IDs:", len(pat_id_list))

display(cohort_df.info())

display(cohort_df.head())

Total number of unique Patient IDs: 422
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52073 entries, 0 to 52072
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   PatientID          52073 non-null  object
 1   collection_id      52073 non-null  object
 2   source_DOI         52073 non-null  object
 3   StudyInstanceUID   52073 non-null  object
 4   SeriesInstanceUID  52073 non-null  object
 5   SOPInstanceUID     52073 non-null  object
 6   Modality           52073 non-null  object
 7   gcs_url            52073 non-null  object
dtypes: object(8)
memory usage: 3.2+ MB


None

Unnamed: 0,PatientID,collection_id,source_DOI,StudyInstanceUID,SeriesInstanceUID,SOPInstanceUID,Modality,gcs_url
0,LUNG1-001,nsclc_radiomics,10.7937/K9/TCIA.2015.PF0M9REI,1.3.6.1.4.1.32722.99.99.2393413539117143687725...,1.2.276.0.7230010.3.1.3.2323910823.20524.15972...,1.2.276.0.7230010.3.1.4.2323910823.20524.15972...,SEG,gs://idc-open-cr/553521b9-f9e8-4103-b04d-5f032...
1,LUNG1-001,nsclc_radiomics,10.7937/K9/TCIA.2015.PF0M9REI,1.3.6.1.4.1.32722.99.99.2393413539117143687725...,1.3.6.1.4.1.32722.99.99.2279381215866080725084...,1.3.6.1.4.1.32722.99.99.6468474582136099606367...,RTSTRUCT,gs://idc-open-cr/5bcda93e-ef26-4a58-a7b4-47832...
2,LUNG1-001,nsclc_radiomics,10.7937/K9/TCIA.2015.PF0M9REI,1.3.6.1.4.1.32722.99.99.2393413539117143687725...,1.3.6.1.4.1.32722.99.99.2989917765213423750108...,1.3.6.1.4.1.32722.99.99.1047764232230739912736...,CT,gs://idc-open-cr/2b028478-80a6-4cc4-95d8-36bd1...
3,LUNG1-001,nsclc_radiomics,10.7937/K9/TCIA.2015.PF0M9REI,1.3.6.1.4.1.32722.99.99.2393413539117143687725...,1.3.6.1.4.1.32722.99.99.2989917765213423750108...,1.3.6.1.4.1.32722.99.99.1064644568755722921755...,CT,gs://idc-open-cr/fdbe15bb-a030-4a8d-b041-b4a73...
4,LUNG1-001,nsclc_radiomics,10.7937/K9/TCIA.2015.PF0M9REI,1.3.6.1.4.1.32722.99.99.2393413539117143687725...,1.3.6.1.4.1.32722.99.99.2989917765213423750108...,1.3.6.1.4.1.32722.99.99.1077431100943926205431...,CT,gs://idc-open-cr/d17ff2f7-de4f-4084-b1bc-eed8a...


## **Setup mhubio**

TODO: import mhubio


In [20]:
import sys, os
sys.path.append('.')

from mhub.mhubio.Config import Config, DataType, FileType, CT, SEG
from mhub.mhubio.modules.importer.UnsortedDicomImporter import UnsortedInstanceImporter
from mhub.mhubio.modules.importer.DataSorter import DataSorter
from mhub.mhubio.modules.convert.NiftiConverter import NiftiConverter
from mhub.mhubio.modules.convert.DsegConverter import DsegConverter
from mhub.mhubio.modules.organizer.DataOrganizer import DataOrganizer
from mhub.totalsegmentator.utils.TotalSegmentatorRunner import TotalSegmentatorRunner

## **Running the Analysis for a Single Patient**

The following cells run all the processing pipeline, from pre-processing to post-processing.

In [21]:
# select one patient from the cohort
pat_id = random.choice(cohort_df["PatientID"].values)
pat_df = cohort_df[cohort_df["PatientID"] == pat_id].reset_index(drop = True)

# select only data for which the modality is CT 
pat_df = pat_df[pat_df["Modality"] == "CT"].reset_index(drop = True)

# if more than one series are available for the selected patient, pick one
if len(np.unique(pat_df["SeriesInstanceUID"].values)) > 1:
  series_uid = random.choice(pat_df["SeriesInstanceUID"].values)
  pat_df = pat_df[pat_df["SeriesInstanceUID"] == series_uid].reset_index(drop = True)

# sanity check
assert len(np.unique(pat_df["SeriesInstanceUID"].values)) == 1

display(pat_df.info())
display(pat_df.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 197 entries, 0 to 196
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   PatientID          197 non-null    object
 1   collection_id      197 non-null    object
 2   source_DOI         197 non-null    object
 3   StudyInstanceUID   197 non-null    object
 4   SeriesInstanceUID  197 non-null    object
 5   SOPInstanceUID     197 non-null    object
 6   Modality           197 non-null    object
 7   gcs_url            197 non-null    object
dtypes: object(8)
memory usage: 12.4+ KB


None

Unnamed: 0,PatientID,collection_id,source_DOI,StudyInstanceUID,SeriesInstanceUID,SOPInstanceUID,Modality,gcs_url
0,LUNG1-021,nsclc_radiomics,10.7937/K9/TCIA.2015.PF0M9REI,1.3.6.1.4.1.32722.99.99.1083883755847649334112...,1.3.6.1.4.1.32722.99.99.1745927143060515205794...,1.3.6.1.4.1.32722.99.99.1033403515433535768726...,CT,gs://idc-open-cr/928ffe8f-6652-46f6-b1a3-6be8b...
1,LUNG1-021,nsclc_radiomics,10.7937/K9/TCIA.2015.PF0M9REI,1.3.6.1.4.1.32722.99.99.1083883755847649334112...,1.3.6.1.4.1.32722.99.99.1745927143060515205794...,1.3.6.1.4.1.32722.99.99.1046397329010069196558...,CT,gs://idc-open-cr/ae0192cf-5ff0-430a-b8df-d9634...
2,LUNG1-021,nsclc_radiomics,10.7937/K9/TCIA.2015.PF0M9REI,1.3.6.1.4.1.32722.99.99.1083883755847649334112...,1.3.6.1.4.1.32722.99.99.1745927143060515205794...,1.3.6.1.4.1.32722.99.99.1056725405308863745188...,CT,gs://idc-open-cr/4b08764c-d1de-4483-a9ef-d9a1c...
3,LUNG1-021,nsclc_radiomics,10.7937/K9/TCIA.2015.PF0M9REI,1.3.6.1.4.1.32722.99.99.1083883755847649334112...,1.3.6.1.4.1.32722.99.99.1745927143060515205794...,1.3.6.1.4.1.32722.99.99.1057248079481956132043...,CT,gs://idc-open-cr/bb3b1d75-dccd-420d-9966-0b5b2...
4,LUNG1-021,nsclc_radiomics,10.7937/K9/TCIA.2015.PF0M9REI,1.3.6.1.4.1.32722.99.99.1083883755847649334112...,1.3.6.1.4.1.32722.99.99.1745927143060515205794...,1.3.6.1.4.1.32722.99.99.1062983987412563298112...,CT,gs://idc-open-cr/7aa5c35a-96df-4f07-acff-42a50...


In [22]:
# init

print("Processing patient: %s"%(pat_id))

Processing patient: LUNG1-021


In [23]:
# FIXME
# data cross-loading
from mhub.aimi_utils.gcs import download_patient_data

download_patient_data(raw_base_path = "data/tmp",
                      sorted_base_path = "data/input_data",
                      patient_df = pat_df,
                      remove_raw = True)

Copying files from IDC buckets to data/tmp/LUNG1-021...
Done in 9.35082 seconds.

Sorting DICOM files...
Done in 0.620543 seconds.
Sorted DICOM data saved at: data/input_data/LUNG1-021
Removing un-sorted data at data/tmp/LUNG1-021...
... Done.


Write a custom configuration file containing the specifics for all the MHub modules we're going to use, using the `%%writefile` magik (from the `yamlmagic` package).

This file is tailored to this specific use case.

In [24]:
%%writefile totalsegmentator_config.yml

general:
  data_base_dir: /content/data
modules:
  UnsortedInstanceImporter:
    input_dir: input_data
  DataSorter:
    base_dir: /content/data/sorted
    structure: '%SeriesInstanceUID/dicom/%SOPInstanceUID.dcm'
  DsegConverter:
    #dicomseg_json_path: /content/mhub/totalsegmentator/config/dicomseg_metadata_whole.json
    dicomseg_yml_path: /content/mhub/totalsegmentator/config/dseg.yml
    skip_empty_slices: True
  TotalSegmentatorRunner:
    use_fast_mode: true

Overwriting totalsegmentator_config.yml


In [25]:
# config
config = Config('totalsegmentator_config.yml')
config.verbose = True  # TODO: define levels of verbosity and integrate consistently. 

In [26]:
# import a collection of unsorted DICOM data
UnsortedInstanceImporter(config).execute()


--------------------------
Start UnsortedInstanceImporter
Done in 8.96454e-05 seconds.


In [27]:
# sort such collection of DICOM data using dicomsort
DataSorter(config).execute()


--------------------------
Start DataSorter
sorting schema: /content/data/sorted/%SeriesInstanceUID/dicom/%SOPInstanceUID.dcm
>> run:  dicomsort -k -u /content/data/input_data /content/data/sorted/%SeriesInstanceUID/dicom/%SOPInstanceUID.dcm
adding ct in dicom format with resolved path:  /content/data/sorted/1.3.6.1.4.1.32722.99.99.174592714306051520579451223294652406755/dicom
Done in 0.591842 seconds.


In [28]:
# convert the DICOM data to NIfTI, as required by TotalSegmentator, using plastimatch
NiftiConverter(config).execute()


--------------------------
Start NiftiConverter

Running 'plastimatch convert' with the specified arguments:
  --input /content/data/sorted/1.3.6.1.4.1.32722.99.99.174592714306051520579451223294652406755/dicom
  --output-img /content/data/sorted/1.3.6.1.4.1.32722.99.99.174592714306051520579451223294652406755/image.nii.gz
... Done.
Done in 18.2066 seconds.


In [29]:
# run the inference phase 
TotalSegmentatorRunner(config).execute()


--------------------------
Start TotalSegmentatorRunner
Running TotalSegmentator in fast mode ('--fast', 3mm): 
>> run ts:  TotalSegmentator -i /content/data/sorted/1.3.6.1.4.1.32722.99.99.174592714306051520579451223294652406755/image.nii.gz -o /app/tmp/d51b9b0c-1cc8-42a8-b7a1-e8b275f67eea --fast
Done in 40.7197 seconds.


In [32]:
# convert the results to DICOM SEG
DsegConverter(config).execute()


--------------------------
Start DsegConverter
{'AORTA': 'aorta.nii.gz', 'BRAIN': 'brain.nii.gz', 'COLON': 'colon.nii.gz', 'DUODENUM': 'duodenum.nii.gz', 'ESOPHAGUS': 'esophagus.nii.gz', 'FACE': 'face.nii.gz', 'GALLBLADDER': 'gallbladder.nii.gz', 'INFERIOR_VENA_CAVA': 'inferior_vena_cava.nii.gz', 'LEFT_ADRENAL_GLAND': 'adrenal_gland_left.nii.gz', 'LEFT_ATRIUM': 'heart_atrium_left.nii.gz', 'LEFT_AUTOCHTHONOUS_BACK_MUSCLE': 'autochthon_left.nii.gz', 'LEFT_CLAVICLE': 'clavicula_left.nii.gz', 'LEFT_FEMUR': 'femur_left.nii.gz', 'LEFT_GLUTEUS_MAXIMUS': 'gluteus_maximus_left.nii.gz', 'LEFT_GLUTEUS_MEDIUS': 'gluteus_medius_left.nii.gz', 'LEFT_GLUTEUS_MINIMUS': 'gluteus_minimus_left.nii.gz', 'LEFT_HIP': 'hip_left.nii.gz', 'LEFT_HUMERUS': 'humerus_left.nii.gz', 'LEFT_ILIAC_ARTERY': 'iliac_artery_left.nii.gz', 'LEFT_ILIAC_VEIN': 'iliac_vena_left.nii.gz', 'LEFT_ILIOPSOAS': 'iliopsoas_left.nii.gz', 'LEFT_KIDNEY': 'kidney_left.nii.gz', 'LEFT_LOWER_LUNG_LOBE': 'lung_lower_lobe_left.nii.gz', 'LEFT_RI

In [33]:
# organize data into output folder
# FIXME: don't save stuff under /app
organizer = DataOrganizer(config, set_file_permissions = sys.platform.startswith('linux'))
organizer.setTarget(DataType(FileType.NIFTI, CT), "/content/data/output_data/[i:SeriesID]/[path]")
organizer.setTarget(DataType(FileType.DICOMSEG, SEG), "/content/data/output_data/[i:SeriesID]/TotalSegmentator.seg.dcm")
organizer.execute()


--------------------------
Start DataOrganizer
organizing instance <I:/content/data/sorted/1.3.6.1.4.1.32722.99.99.174592714306051520579451223294652406755>
created directory /content/data/output_data/1.3.6.1.4.1.32722.99.99.174592714306051520579451223294652406755
Done in 0.119234 seconds.


---

## **Data Download**

In [61]:
%%capture

archive_fn = "%s.zip"%(pat_id)

try:
  os.remove(archive_fn)
except OSError:
  pass

!zip -j -r $archive_fn "/content/data/output_data" "/content/data/input_data"

In [62]:
filesize = os.stat(archive_fn).st_size/1024e03
print('Starting the download of "%s" (%2.1f MB)...\n'%(archive_fn, filesize))

files.download(archive_fn)

Starting the download of "LUNG1-021.zip" (96.8 MB)...



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>