You can download and run this notebook locally, or you can run it for free in a cloud environment using Colab or Sagemaker Studio Lab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github.com/SL2027/TCIA_Notebooks/blob/main/TCIA_REST_API_Complete_Documentation.ipynb)

[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github.com/SL2027/TCIA_Notebooks/blob/main/TCIA_REST_API_Complete_Documentation.ipynb)

# Summary

Access to large, high-quality datasets is essential for researchers to understand disease and precision medicine pathways, especially in cancer. However, HIPAA constraints make sharing medical images outside an individual institution complex process. [The Cancer Imaging Archive (TCIA)](https://www.cancerimagingarchive.net/) is a public service funded by the National Cancer Institute which addresses this challenge by providing hosting and de-identification services to take major burdens of data sharing off researchers.

# 1. Learn about Available Collections on the TCIA Website

[Browsing Collections](https://www.cancerimagingarchive.net/collections) and viewing [Analysis Results](https://www.cancerimagingarchive.net/tcia-analysis-results/) of TCIA datasets are the easiest ways to become familiar with what is available. These pages will help you quickly identify datasets of interest, find valuable supporting data that are not available via our APIs (e.g. clinical spreadsheets and non-DICOM segmentation data), and answer the most common questions you might have about the datasets.

# 2. REST API Overview 
TCIA uses software called NBIA to manage DICOM data. The NBIA REST APIs are provided for the search and download functions used in the TCIA radiology portal and allow access to both public and limited access collections.
1. The [NBIA Search REST APIs](https://wiki.cancerimagingarchive.net/x/fILTB) allow you to perform basic queries and download data from **public** collections. These APIs do not require a TCIA account.
2. The [NBIA Search with Authentication REST APIs](https://wiki.cancerimagingarchive.net/x/X4ATBg) allow you to perform basic queries and download data from **public and limited-access** collections. These APIs require a TCIA account to create authentication tokens.
3. The [NBIA Advanced REST APIs](https://wiki.cancerimagingarchive.net/x/YoATBg) also allow access to **public and limited-access** collections, but provide query endpoints mostly geared towards developers seeking to integrate searching and downloading TCIA data into web and desktop applications. This API requires a TCIA account to create authentication tokens.

# 3. Query Functions

Detailed usage of some of these functions can be found in [section 7.2](#7.2)

## queryData

**Params: (endpoint, api_url)**

* Checks for valid security tokens where needed. 
* Because it is called by other functions to select base URL, ***<font color='red'>please do NOT use this function</font>***.

## getCollections

**Params: (api_url = "", format = "")**

* *Optional: api_url, format*
* Gets a list of collections from a specified api_url

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getCollections(, format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getCollections(format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getCollections(format = "csv")

## getBodyPart

**Params: (collection = "", modality = "", api_url = "", format = "")**

* *Optional: api_url, format*
* Gets Body Part Examined metadata from a specified api_url
* Allows filtering by collection and modality

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getBodyPart(collection = "CPTAC-SAR", modality = "CT", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getBodyPart(collection = "CPTAC-SAR", modality = "CT", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getBodyPart(collection = "CPTAC-SAR", modality = "CT", format = "csv")

## getModality

**Params: (collection = "", bodyPart = "", api_url = "", format = "")**

* *Optional: api_url, format*
* Gets Modalities metadata from a specified api_url
* Allows filtering by collection and bodyPart

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getModality(collection = "CPTAC-SAR", bodyPart = "EXTREMITY", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getModality(collection = "CPTAC-SAR", bodyPart = "EXTREMITY", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getModality(collection = "CPTAC-SAR", bodyPart = "EXTREMITY", format = "csv")

## getPatient

**Params: (collection = "", api_url = "", format = "")**

* *Optional: api_url, format*
* Gets Patient metadata from a specified api_url
* Allows filtering by collection

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getPatient(collection = "CPTAC-SAR", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getPatient(collection = "CPTAC-SAR", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getPatient(collection = "CPTAC-SAR", format = "csv")

## getPatientByCollectionAndModality

**Params: (collection, modality, api_url = "", format = "")**

* *Optional: api_url, format*
* Gets Patient IDs from a specified api_url
* Returns a list of patient IDs

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getPatientByCollectionAndModality(collection = "CPTAC-SAR", modality = "CT", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getPatientByCollectionAndModality(collection = "CPTAC-SAR", modality = "CT", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getPatientByCollectionAndModality(collection = "CPTAC-SAR", modality = "CT", format = "csv")

## getNewPatientsInCollection

**Params: (collection, date, api_url = "", format = "")**

* *Optional: api_url, format*
* Gets "new" patient metadata from a specified api_url
* The date format is YYYY/MM/DD

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getNewPatientsInCollection(collection = "CPTAC-SAR", date = "2000/08/20", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getNewPatientsInCollection(collection = "CPTAC-SAR", date = "2000/08/20", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getNewPatientsInCollection(collection = "CPTAC-SAR", date = "2000/08/20", format = "csv")

## getStudy

**Params: (collection, patientId = "", studyUid = "", api_url = "", format = "")**
* *Optional: patientId, studyUid, api_url, format*
* Gets Study (visit/timepoint) metadata from a specified api_url

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getStudy(collection = "CPTAC-SAR", patientId = "", studyUid = "", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getStudy(collection = "CPTAC-SAR", patientId = "", studyUid = "", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getStudy(collection = "CPTAC-SAR", patientId = "", studyUid = "", format = "csv")

## getNewStudiesInPatient
**Params: (collection, patientId, date, api_url = "", format = "")**

* *Optional: api_url, format*
* Gets "new" patient metadata from a specified api_url
* The date format is YYYY/MM/DD

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getNewStudiesInPatient(collection = "CPTAC-SAR", patientId = "C3N-00843", date = "2010/09/06", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getNewStudiesInPatient(collection = "CPTAC-SAR", patientId = "C3N-00843", date = "2010/09/06", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getNewStudiesInPatient(collection = "CPTAC-SAR", patientId = "C3N-00843", date = "2010/09/06", format = "csv")

## getSeries

**Params: (collection = "", patientId = "", studyUid = "", seriesUid = "", modality = "", bodyPart = "",<br>
manufacturer = "", manufacturerModel = "", api_url = "", format = "")**

* *All parameters are optional.*
* Gets Series (scan) metadata from a specified api_url
* Allows filtering by collection, patient ID, study UID, series UID, modality, body part, manufacturer & model

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getSeries(collection = "CPTAC-SAR", patientId = "", studyUid = "", seriesUid = "", 
                    modality = "", bodyPart = "", manufacturer = "", manufacturerModel = "", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getSeries(collection = "CPTAC-SAR", patientId = "", studyUid = "", seriesUid = "", 
                    modality = "", bodyPart = "", manufacturer = "", manufacturerModel = "", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getSeries(collection = "CPTAC-SAR", patientId = "", studyUid = "", seriesUid = "", 
                    modality = "", bodyPart = "", manufacturer = "", manufacturerModel = "", format = "csv")

## getUpdatedSeries
**Params: (date, api_url = "", format = "")**

* *Optional: api_url, format*
* Gets "new" series metadata from a specified api_url
* The date format is YYYY/MM/DD
* ***NOTE: Unlike other API endpoints, this function expects MM/DD/YYYY, we'll convert from YYYY/MM/DD so tcia-utils is consistent***

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getUpdatedSeries(date = "2010/09/06", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getUpdatedSeries(date = "2010/09/06", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getUpdatedSeries(date = "2010/09/06", format = "csv")

## getSeriesMetadata
**Params: (seriesUid, api_url = "", format = "")**

* *Optional: api_url, format*
* Gets Series (scan) metadata from a specified api_url
* Output includes DOI and license details that are not in the getSeries() function

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getSeriesMetadata(seriesUid = "1.3.6.1.4.1.14519.5.2.1.3320.3273.106936860187940539374736870621", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getSeriesMetadata(seriesUid = "1.3.6.1.4.1.14519.5.2.1.3320.3273.106936860187940539374736870621", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getSeriesMetadata(seriesUid = "1.3.6.1.4.1.14519.5.2.1.3320.3273.106936860187940539374736870621", format = "csv")

## getSeriesSize
**Params: (seriesUid, api_url = "", format = "")**

* *Optional: api_url, format*
* Gets the file count and disk size of a series/scan

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getSeriesSize(seriesUid = "1.3.6.1.4.1.14519.5.2.1.3320.3273.106936860187940539374736870621", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getSeriesSize(seriesUid = "1.3.6.1.4.1.14519.5.2.1.3320.3273.106936860187940539374736870621", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getSeriesSize(seriesUid = "1.3.6.1.4.1.14519.5.2.1.3320.3273.106936860187940539374736870621", format = "csv")

## getSopInstanceUids
**Params: (seriesUid, api_url = "", format = "")**

* *Optional: api_url, format*
* Gets SOP Instance UIDs from a specific series/scan

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getSopInstanceUids(seriesUid = "1.3.6.1.4.1.14519.5.2.1.3320.3273.106936860187940539374736870621", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getSopInstanceUids(seriesUid = "1.3.6.1.4.1.14519.5.2.1.3320.3273.106936860187940539374736870621", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getSopInstanceUids(seriesUid = "1.3.6.1.4.1.14519.5.2.1.3320.3273.106936860187940539374736870621", format = "csv")

## getManufacturer
**Params: (collection = "", modality = "", bodyPart = "", api_url = "", format = "")**

* *All parameters are optional.*
* Gets manufacturer metadata from a specified api_url
* Allows filtering by collection, body part & modality

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getManufacturer(collection = "", modality = "", bodyPart = "", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getManufacturer(collection = "", modality = "", bodyPart = "", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getManufacturer(collection = "", modality = "", bodyPart = "", format = "csv")

## getCollectionDescriptions

**Params: (api_url = "", format = "")**

* *All parameters are optional.*
* Gets HTML-formatted descriptions of collections and their DOIs

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getCollectionDescriptions(format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getCollectionDescriptions(format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getCollectionDescriptions(format = "csv")

## getCollectionPatientCounts
**Params: (api_url = "", format = "")**

* *All parameters are optional.*
* Gets counts of Patient by collection from Advanced API

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getCollectionPatientCounts(format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getCollectionPatientCounts(format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getCollectionPatientCounts(format = "csv")

## getModalityCounts

**Params: (collection = "", bodyPart = "", api_url = "", format = "")**

* *All parameters are optional.*
* Gets counts of Modality metadata from Advanced API
* Allows filtering by collection and bodyPart

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getModalityCounts(collection = "CPTAC-SAR", bodyPart = "", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getModalityCounts(collection = "CPTAC-SAR", bodyPart = "EXTREMITY", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getModalityCounts(collection = "CPTAC-SAR", bodyPart = "EXTREMITY", format = "csv")

## getBodyPartCounts

**Params: (collection = "", modality = "", api_url = "", format = "")**

* *All parameters are optional.*
* Gets counts of Body Part metadata from Advanced API
* Allows filtering by collection and modality

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getBodyPartCounts(collection = "CPTAC-SAR", modality = "CT", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getBodyPartCounts(collection = "CPTAC-SAR", modality = "CT", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getBodyPartCounts(collection = "CPTAC-SAR", modality = "CT", format = "csv")

## getManufacturerCounts

**Params: (collection = "", modality = "", bodyPart = "", api_url = "", format = "")**

* *All parameters are optional.*
* Gets counts of Manufacturer metadata from Advanced API
* Allows filtering by collection, body part and modality

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getManufacturerCounts(collection = "CPTAC-SAR",  modality = "CT", bodyPart = "EXTREMITY", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getManufacturerCounts(collection = "CPTAC-SAR",  modality = "CT", bodyPart = "EXTREMITY", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getManufacturerCounts(collection = "CPTAC-SAR",  modality = "CT", bodyPart = "EXTREMITY", format = "csv")

## getSeriesList

**Params: (list, api_url = "", csv_filename = "")**

* *Optional: api_url, csv_filename*
* Get series metadata from Advanced API
* Allows submission of a list of UIDs
* Returns result as dataframe and CSV

In [None]:
series_list = ["1.3.6.1.4.1.14519.5.2.1.3320.3273.106936860187940539374736870621", "1.3.6.1.4.1.14519.5.2.1.6834.5010.215193814203822462481389051414"]
nbia.getSeriesList(list = series_list)

## getDicomTags

**Params: (seriesUid, api_url = "", format = "")**

* *Optional: api_url, format*
* Gets DICOM tag metadata for a given series UID (scan)

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getDicomTags(seriesUid = "1.3.6.1.4.1.14519.5.2.1.3320.3273.106936860187940539374736870621", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getDicomTags(seriesUid = "1.3.6.1.4.1.14519.5.2.1.3320.3273.106936860187940539374736870621", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getDicomTags(seriesUid = "1.3.6.1.4.1.14519.5.2.1.3320.3273.106936860187940539374736870621", format = "csv")

## getDoiMetadata

**Params: (doi, output, api_url = "", format = "")**

* *Optional: api_url, format*
* Gets a list of Collections or Series associated with a DOI
* The result includes whether the data are 3rd party analyses or not.

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getDoiMetadata(doi = "https://doi.org/10.7937/K9/TCIA.2018.PAT12TBS", output = "CPTAC-LUAD", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getDoiMetadata(doi = "https://doi.org/10.7937/K9/TCIA.2018.PAT12TBS", output = "CPTAC-LUAD", format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getDoiMetadata(doi = "https://doi.org/10.7937/K9/TCIA.2018.PAT12TBS", output = "CPTAC-LUAD", format = "csv")

## getSimpleSearchWithModalityAndBodyPartPaged

**Params: (collections = [], species = [], modalities = [], bodyParts = [], manufacturers  = [], <br>
fromDate = "", toDate = "", patients = [], minStudies: int = 0, modalityAnded = False, <br>
start = 0, size = 10, sortDirection = 'ascending', sortField = 'subject', api_url = "", format = "")**

* *All parameters are optional.*
* Takes the same parameters as the SimpleSearch GUI
* Use more parameters to narrow the number of subjects received.
* Please refer to [section 7.2.2](#7.2.2) for a more in-depth tutorial.

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getSimpleSearchWithModalityAndBodyPartPaged(collections = ["CPTAC-LUAD"], modalities = ["CT"], format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getSimpleSearchWithModalityAndBodyPartPaged(collections = ["CPTAC-LUAD"], modalities = ["CT"], format = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getSimpleSearchWithModalityAndBodyPartPaged(collections = ["CPTAC-LUAD"], modalities = ["CT"], format = "csv")

## getSharedCart

**Params: (name, api_url = "", format = "")**

* *Optional: api_url, format*
* Gets "Shared Cart" (scan) metadata from a specified api_url<br>
* First use https://nbia.cancerimagingarchive.net/nbia-search/ to create a cart, then add data to your basket, then click "Share" > "Share my cart".
* The "name" parameter is part of the URL that generates. E.g https://nbia.cancerimagingarchive.net/nbia-search/?saved-cart=nbia-49121659384603347 has a cart "name" of "nbia-49121659384603347".

In [None]:
# If the format is not specified, it returns a JSON object.
nbia.getSharedCart(name = "nbia-49121659384603347", format = "")

In [None]:
# If the format is set to "df", it returns a pandas dataframe object.
nbia.getSharedCart(name = "nbia-49121659384603347", formay = "df")

In [None]:
# If the format is set to "csv", it saves a csv file to the workspace.
nbia.getSharedCart(name = "nbia-49121659384603347", format = "csv")

# 4. Download Functions

Detailed usage of these functions can be found in [section 7.3](#7.3)

## downloadSeries

**Params: (series_data, number = 0, path = "", hash = "", api_url = "",<br>
input_type = "", format = "", csv_filename = "")**

* Ingests a set of seriesUids and downloads them
* By default, series_data expects JSON containing "SeriesInstanceUID" elements.
* Set number = n to download the first n series if you don't want the full dataset.
* Set hash = "y" if you'd like to retrieve MD5 hash values for each image.
* Saves to tciaDownload folder in current directory if no path is specified
* Set input_type = "list" to pass a list of Series UIDs instead of JSON.
* Set input_type = "manifest" to pass the path of a *.TCIA manifest file as series_data.
* Format can be set to "df" or "csv" to return series metadata.
* Setting a csv_filename will create the csv even if format isn't specified.
* The metadata includes info about series that have previously been downloaded.

## downloadImage
**Params: (downloadImage(seriesUID, sopUID, path = "", api_url = ""))**

* Ingests a seriesUids and SopInstanceUid and downloads the image

# 5. Image Visualization Functions

Detailed usage of these functions can be found in [section 7.4](#7.4)

## viewSeries

**Params: (seriesUid = "", path = "")**

* Visualizes a Series (scan) you've downloaded in the notebook
* Requires EITHER a seriesUid or path parameter
* Leave seriesUid empty if you want to provide a custom path.
* The function assumes "tciaDownload/\<seriesUid\>/" as path if seriesUid is provided since this is where downloadSeries() saves data.

## makeVizLinks

**Params: (series_data, csv_filename="")**

* Ingests JSON output of getSeries() or getSharedCart()
* Creates URLs to visualize them in a browser
* The links appear in the last 2 columns of the dataframe.
* TCIA links display the individual series described in each row.
* IDC links display the entire study (all scans from that time point).
* IDC links may not work if they haven't mirrored the series from TCIA, yet.
* This function only works with fully public datasets (no limited-access data).
* Optionally accepts a csv_filename parameter if you'd like to export a CSV file.

# 6. Other Functions

## setApiUrl

**Params: (endpoint, api_url)**

* Checks for valid security tokens where needed
* Because it is called by other functions to select base URL, ***<font color='red'>please do NOT use this function</font>***.

## getToken
**Params: (user = "", pw = "", api_url = "")**

* Retrieves security token to access APIs that require authorization
* Provides interactive prompts for user/pw if they're not specified as parameters
* Uses getToken() for querying restricted collections with "Search API"
* Uses getToken(api_url = "nlst") for "Advanced API" queries of National Lung Screening Trial
* Sets expiration time for tokens (2 hours from creation)

In [None]:
nbia.getToken()

## makeCredentialFile
**Params: (user = "", pw = "")**

* Create a credential file to use with NBIA Data Retriever
* Provides interactive prompts for user/pw if they're not specified as parameters
* Documentation at https://wiki.cancerimagingarchive.net/x/2QKPBQ

## manifestToList

**Params: (manifest)**

* Ingests a TCIA manifest file and removes header
* Returns a list of series UIDs

## makeSeriesReport

**Params: (series_data, input_type = "", format = "", filename = None, api_url = "")**

* Ingests JSON output from any function that returns series-level data and creates summary report
* Specify input_type = "manifest" to ingest a *.TCIA manifest file or "list" for a python list of UIDs.
* If input_type = "manifest" or "list" and there are series UIDs that are restricted, you must call getToken() with a user ID that has access to all UIDs before calling this function.
* Specifying api_url is only necessary if you are using input_type = "manifest" or "list" with NLST data (e.g. api_url = "nlst").
* Specify format = "var" to return the report values as a dictionary.
* Access variables example after saving function output to report_data: subjects = report_data["subjects"].
* Specify format = "file" to save the report to a file.
* Specify a filename parameter to set a filename if you don't want the default.

# 7. Examples

## 7.1 Preparation

### Import tcia_utils

The following cells import [**tcia_utils**](https://github.com/kirbyju/tcia_utils) which contain a variety of useful functions for accessing TCIA via Jupyter/Python. We'll step through many of its functions in the following section.

By default, most functions from tcia_utils return results in JSON.  However, you can use **format = "df"** to return the results as a dataframe, or **format = "csv"** to save a CSV file in addition to returning a dataframe.

Nearly all functions allow you to specify **api_url** as a query parameter.  This allows you to specify if you'd like to access restricted collections or the [National Lung Screening Trial (NLST)](https://doi.org/10.7937/TCIA.HMQ8-J677) collection, which lives on a separate server due to its size (>26,000 patients!).  We'll provide examples to show how this works later in the notebook.

In [None]:
!pip install --upgrade -q tcia-utils
!pip install --upgrade -q pandas
!pip install --upgrade -q requests

In [None]:
## imports
import requests
import pandas as pd
from tcia_utils import nbia

### Set logging level to INFO in Google Colab (optional)
This step should be skipped unless you're running on **Google Colab** as its logging root handler only shows warnings and errors by default.  If you'd like to see INFO statements you can run the following code.  This is particularly helpful when running some of the download examples so you can see the progress as downloads complete.

In [None]:
import logging

# Check current handlers
#print(logging.root.handlers)

# Remove all handlers associated with the root logger object.
for handler in logging.root.handlers[:]:
    logging.root.removeHandler(handler)
#print(logging.root.handlers)

# Set handler with level = info
logging.basicConfig(format='%(asctime)s:%(levelname)s:%(message)s', 
                    level=logging.INFO)

print("Logging set to INFO")

<a id = "7.2"></a>
## 7.2 Query Data

The majority of data-querying functions work in the same manner, let's use a couple as examples to showcase the usages.<br>
Let's first check what collections are avaliable in the database with **getCollections()**.

In [None]:
nbia.getCollections()

### 7.3.1 getPatient()
The **getPatient()** function returns available patient information (e.g. species, gender, and ethnicity). You can also learn whether the subject is a [phantom](https://www.nist.gov/physics/what-are-imaging-phantoms) or not.  The only query parameter for this function is **collection**.

Let's try looking at the **CPTAC-LUAD** collection this time.  We'll also set the output format to a dataframe to make it easier to view.

In [None]:
df = nbia.getPatient(collection = "CPTAC-LUAD", format = "df")
display(df)

Here's an example that does the same thing with the [National Lung Screening Trial (NLST) Collection](https://doi.org/10.7937/TCIA.HMQ8-J677).  In this case we have to set **api_url = "nlst"** to talk to the NLST server, but everything else works the same. 

In [None]:
df = nbia.getPatient(collection = "NLST", format = "df", api_url = "nlst")
display(df)

In [None]:
df = nbia.getStudy(collection = "CPTAC-LUAD", format = "df")
display(df)

<a id = "7.2.2"></a>
### 7.2.2 getSimpleSearchWithModalityAndBodyPartPaged()
The **getSimpleSearchWithModalityAndBodyPartPaged()** function retrieves a list of collections based on searching criteria entered.<br>
Avaliable filters are: 
>1. collections: list[str]   -- The DICOM collections of interest to you
>2. species: list[str]       -- Filter collections by species. Possible values are 'human', 'mouse', and 'dog'
>3. modalities: list[str]    -- Filter collections by modality
>4. modalityAnded: bool      -- If true, only return subjects with all requested modalities, as opposed to any
>5. minStudies: int          -- The minimum number of studies a collection must have to be included in the results
>6.manufacturers: list[str] -- Imaging device manufacturers, e.g. SIEMENS
>7. bodyParts: list[str]     -- Body parts of interest, e.g. CHEST, ABDOMEN
>8. fromDate: str            -- First cutoff date, in YYYY/MM/DD format. Defaults to 1900/01/01
>9. toDate: str              -- Second cutoff date, in YYYY/MM/DD format. Defaults to today's date
>10. patients: list[str]      -- Patients to include in the output
>11. start: int               -- Start of returned series page. Defaults to 0.
>12. size: int                -- Size of returned series page. Defaults to 10.

Avaliable sorting options are:
>1. sortDirection            -- 'ascending' or 'descending'. Defaults to 'ascending'.
>2. sortField                -- 'subject', 'studies', 'series', or 'collection'. Defaults to 'subject'.

In [None]:
myJSON = nbia.getSimpleSearchWithModalityAndBodyPartPaged(collections = ["CPTAC-LUAD"], modalities = ["CT"], format = "")
pd.DataFrame(myJSON["resultSet"])

<a id = "7.3"></a>
## 7.3 Download Data

**tcia_utils** contains a **downloadSeries()** function that has multiple options for specifying the seriesUids you'd like to download.  By default, the function expects JSON data containing "SeriesInstanceUID" elements, which can be generated using **getSeries()** or **getCart()**.  However, if you have a series UID list from some other source, you can set **input_type = "list"** to pass a python list of one or more series UIDs instead of JSON. You can also set **input_type = "manifest"** and provide a path to a **.TCIA** manifest file.

Data are saved to a **tciaDownload** folder in your current working directory by default, but you can use the **path** parameter to change this to a different directory.

There is an optional **format** parameter that can be used to return metadata about what was downloaded.  It can be set to **df** to return dataframe or **csv** to save a spreadsheet. There's also a **csv_filename** parameter if you want to set a specific file name.

You can specify **number = n** to tell the function to only download the first **n** scans of your seriesUids.  Remove this parameter in the examples below to download the full dataset. 

The **api_url** parameter can be omitted in most cases.  However, it must be set to **api_url = "nlst"** to access the [National Lung Screening Trial (NLST)](https://doi.org/10.7937/TCIA.HMQ8-J677) collection and you must use **api_url = "restricted"** for datasets that require logging in.  

Last but not least, there is some logic built in to detect whether you've already downloaded a series.  If a directory named after the seriesUid already exists the function will assume it's already been downloaded and skip that series.

In this section we'll cover downloading data via the REST API for the following use cases:

1.   Download a full TCIA collection
2.   Download custom results of an API query
3.   Download a "[Shared Cart](https://wiki.cancerimagingarchive.net/display/NBIA/TCIA+Radiology+Portal+User+Guide#TCIARadiologyPortalUserGuide-SharingDatainYourCart)" that was created via https://nbia.cancerimagingarchive.net/
4.   Download data from a TCIA manifest file
5.   Download data from a **restricted** collection that requires creating an API token
6.   Download images

### 7.3.1 Download a full collection

You can [Browse Collections](https://www.cancerimagingarchive.net/collections) on our website to figure out what you might want to download, but you can also get a list of available collections via the API as shown below.

In [None]:
# get list of available collections as JSON
nbia.getCollections()

Let's say that we're interested in downloading the entire **Soft-tissue-Sarcoma** collection.  First we need to get a list of all Series Instance UIDs in that collection.  We can use **nbia.getSeries()** to return JSON metadata about all series (scans) in this collection.

In [None]:
data = nbia.getSeries(collection = "Soft-tissue-Sarcoma")
print(data)

We can then pass that to the our download functions and view/save the metadata for what was downloaded.  We'll leverage the **number** parameter here to just grab the first 3 scans as a test.  You can remove this parameter if you want to download the full collection.

In [None]:
nbia.downloadSeries(data, number = 3)

You can learn more about various ways to visualize and analyze your data in the other notebooks at https://github.com/kirbyju/TCIA_Notebooks.  However, let's use **nbia.viewSeries()** to get a quick look at one of the series we've downloaded.  You can change the Series UID in the code below to view other scans you've downloaded.

In [None]:
seriesUid = "1.3.6.1.4.1.14519.5.2.1.5168.1900.104193299251798317056218297018"
nbia.viewSeries(seriesUid)

### 7.3.2 Part A Download custom API query
The REST API allows for a variety of different query options as demonstrated in [this notebook](https://github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_REST_API_Queries.ipynb).  For this use case, let's assume that you are only interested in the MR scans from the [TCGA-BRCA](https://doi.org/10.7937/K9/TCIA.2016.AB2NAZRP) collection that were acquired on Siemens scanners.

In [None]:
# getSeries with query parameters
data = nbia.getSeries(collection = "TCGA-BRCA", 
               modality = "MR", 
               manufacturer = "SIEMENS")

print(len(data), 'Series returned')

Once again, let's pass those Series Instance UIDs to our download function.  This time we'll also set **format = "df"** to return a dataframe about what we downloaded.

In [None]:
# feed series_data to our downloadSampleSeries function
df = nbia.downloadSeries(data, number = 3, format = "df")
display(df)

Another common use case may be that you'd like to review the results from **getSeries()** and do some further filtering to your dataframe.  Then, after removing unwanted scans, pass the remaining Series Instance UIDs to **downloadSeries()**.  For example, let's say that you only wanted to download **T2 MR** series instead of any MR.

In [None]:
# getSeries with query parameters
df = nbia.getSeries(collection = "TCGA-BRCA", 
               modality = "MR", 
               manufacturer = "SIEMENS", format = "df")

Let's drop scans that don't contain **t2** in either the **ProtocolName** or **SeriesDescription** fields.  Note that we're down to 21 scans now instead of 353.

In [None]:
# convert the columns to lowercase to catch both upper/lower case with filter for 't2'
filtered_df = df[(df['ProtocolName'].str.lower().str.contains('t2')) | 
                 (df['SeriesDescription'].str.lower().str.contains('t2'))]

display(filtered_df)

Now we can feed the Series Instance UIDs from this dataframe back to **downloadSeries()**.  Note that since we're not working with the API's default JSON output anymore we need to extract the Series UIDs from the dataframe into a **list** object and specify **input_type = "list"** when we call **downloadSeries()**.

In [None]:
# extract the SeriesInstanceUID column as a list variable
series_uids = filtered_df['SeriesInstanceUID'].tolist()

# download the selected series_uids
nbia.downloadSeries(series_uids, input_type = "list", number = 3)

### 7.3.2 Part B Download custom NLST API query
Let's show a similar example where we look for a specific modality and manufacturer within the [National Lung Screening Trial (NLST) Collection](https://doi.org/10.7937/TCIA.HMQ8-J677).  We have to set **api_url = "nlst"** in our functions for this to work, but otherwise the steps are the same.

In [None]:
# getSeries with query parameters
data = nbia.getSeries(collection = "NLST", 
               modality = "CT", 
               manufacturer = "Philips",
               api_url = "nlst")

print(len(data), 'Series returned')

In [None]:
# feed series_data to our downloadSampleSeries function
df = nbia.downloadSeries(data, number = 3, api_url = "nlst", format = "df")
display(df)

### 7.3.3 Download a "shared cart"
It's possible to use https://nbia.cancerimagingarchive.net to create a "[Shared Cart](https://wiki.cancerimagingarchive.net/display/NBIA/TCIA+Radiology+Portal+User+Guide#TCIARadiologyPortalUserGuide-SharingDatainYourCart)" which includes a specific set of scans you'd like to share with others. After creating a Shared Cart you receive a URL like https://nbia.cancerimagingarchive.net/nbia-search/?saved-cart=nbia-49121659384603347 which can be shared with others.  Try clicking the link to see what this looks like on the TCIA website.  Then use the code below to see how you can use the cart name at the end of the URL to download the related scans via the API.

In [None]:
# getSharedCart metadata
data = nbia.getSharedCart(name = "nbia-49121659384603347")
print(len(data), 'Series returned')

We'll skip the use of the **number** parameter this time since the full cart is only 4 series.  Let's also try **format = "csv"** to save a spreadsheet of the metadata in addition to returning a dataframe.

In [None]:
# feed series_data to our downloadSampleSeries function
df = nbia.downloadSeries(data, format = "csv")
display(df)

### 7.3.4 Download data from a TCIA manifest file

When working with manifest files you can install the NBIA Data Retriever to open the manifest and download the data as shown in [this notebook](https://github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_Linux_Data_Retriever_App.ipynb).  However, there may be cases where you don't have administrative rights to install software or prefer using the REST API to download a manifest.  

In order to demonstrate this use case, let's assume that after you [Browse Collections](https://www.cancerimagingarchive.net/collections) you are interested in the [RIDER Breast MRI](https://doi.org/10.7937/K9/TCIA.2015.H1SXNUXL) collection.  We can find the URL of the manifest to download the full collection by looking at the blue "Download" button on that page.  Then we can download the manifest with the following commands.  

In [None]:
# download manifest file from RIDER Breast MRI page
manifest = requests.get("https://wiki.cancerimagingarchive.net/download/attachments/22512757/doiJNLP-Fo0H1NtD.tcia?version=1&modificationDate=1534787017928&api=v2")
with open('RIDER_Breast_MRI.tcia', 'wb') as f:
    f.write(manifest.content)

If you open this manifest file in a text editor you'll notice that it contains several lines of download parameters that precede a list of Series Instance UIDs to download.  If we set **input_type = "manifest"** we can provide the path/filename to **downloadSeries()** and it will extract the UIDs from the file and download them.

In [None]:
df = nbia.downloadSeries("RIDER_Breast_MRI.tcia", input_type = "manifest", number = 3, format = "df")
display(df)

### 7.3.5 Download data from a restricted collection
In some cases, you must specifically request access to collections before you can download them.  These are listed as **limited access** on the [Browse Collections](https://www.cancerimagingarchive.net/collections/) page. The steps to request access may vary depending on the collection, but will always require that you first [create a TCIA user account](https://wiki.cancerimagingarchive.net/x/xgHDAg). Once you've created an account, you can use your login/password to create an API token with the **getToken()** function from **tcia_utils** to verify your permissions. **<font color='red'>Tokens are valid for 2 hours and must be refreshed after that point.</font>**

In [None]:
nbia.getToken()

Let's say that we're interested in the [QIN-Breast-02](https://doi.org/10.7937/TCIA.2019.4cfm06rr) collection. As you can see on the collection page, you must email help@cancerimagingarchive.net to request access to the data. Once you've recieved approval we can use **nbia.getSeries()** to get a full list of series UIDs in this restricted collection by including **api_url = "restricted"** as a parameter.

In [None]:
# getSeries with query parameters
data = nbia.getSeries(collection = "QIN-Breast-02", 
                      api_url = "restricted")

print(len(data), 'Series returned')

Now we can download those scans.  Don't forget to include **api_url = "restricted"** in the download functions as well!

In [None]:
# feed series_data to our downloadSampleSeries function
df = nbia.downloadSeries(data, number = 3, api_url = "restricted", format = "df")
display(df)

### 7.3.6 Download Images

You can [Browse Collections](https://www.cancerimagingarchive.net/collections) on our website to figure out what you might want to download, but you can also get a list of available collections via the API as shown below.

In [None]:
# get list of available collections as JSON
nbia.getCollections()

Let's say that we're interested in downloading the images associated with the **CMB-GEC** collection.  First we need to get a list of all Series Instance UIDs in that collection.  We can use **nbia.getSeries()** to return JSON metadata about all series (scans) in this collection.

In [None]:
nbia.getSeries(collection = "CMB-GEC")

We can then pass that to the our **nbia.getSopInstanceUids()** functions and view a list of metadata associated with the series and find the SOP instance we want to download the images from.

In [None]:
nbia.getSopInstanceUids(seriesUid = "1.3.6.1.4.1.14519.5.2.1.1600.1204.919741553251398079475267746505")

After we have both the series UID and SOP UID, we can call the **nbia.downloadImage()** function to download all images associated with the instance. Note that when the path parameter is not provided, a folder tciaDownload will be automatically created.

In [None]:
nbia.downloadImage(seriesUID = "1.3.6.1.4.1.14519.5.2.1.1600.1204.919741553251398079475267746505", 
              sopUID = "1.3.6.1.4.1.14519.5.2.1.1600.1204.211684247543622814130853101548")

<a id = "7.4"></a>
## 7.4 Analyze Data

### 7.4.1 makeSeriesReport()

This function ingests the JSON output from **getSeries()** or **getSharedCart()** and creates summary report.  Let's try it using the Shared Cart results that we looked at in our last query.

In [None]:
data = nbia.getSharedCart(name = "nbia-49121659384603347")

nbia.makeSeriesReport(data)

### 7.4.2 makeVizLinks()
This function ingests JSON output from **getSeries()** or **getSharedCart()**  and creates URLs to visualize them in a browser.  The links appear in the last 2 columns of the dataframe.  

The TCIA column displays the individual series described in each row.  The [Imaging Data Commons (IDC)](https://portal.imaging.datacommons.cancer.gov/) column displays the entire study (all series/scans from that time point).  The function accepts a **csv_filename** parameter if you'd like to save a CSV file of the output.  It just returns the dataframe if this is ommitted.

There are a few caveats worth noting about this function:
* Modalities such as SEG/RTSTRUCT will not load using the TCIA series viewer, but opening the entire study with the IDC viewer generally enables you to see RTSTRUCT/SEG annotations overlaid on top of the images they were derived from.
* IDC links may not work if they haven't mirrored the series from TCIA yet. Here is the [list of the collections](https://portal.imaging.datacommons.cancer.gov/collections/) they currently host.
* The visualization URLs only work if the series/study you selected is from a fully public dataset. Visualization of limited-access collections is not currently supported.

In [None]:
# use getSeries() to identify some scans of interest
data = nbia.getSeries(collection = "CPTAC-LUAD", modality = "CT")

# create a dataframe and CSV file visualization links 
nbia.makeVizLinks(data, csv_filename="viz_links")

### 7.4.3 Querying "Limited Access" Collections (optional)
In some cases, you must specifically request access to collections before you can download them.  These are listed as **limited access** on the [Browse Collections](https://www.cancerimagingarchive.net/collections/) page. 

The steps to request access may vary depending on the collection, but will always require that you first [create a TCIA user account](https://wiki.cancerimagingarchive.net/x/xgHDAg). Once you've created an account and have access to restricted collections you can use your login/password to create an API token with the **getToken()** function from **tcia_utils** to verify your permissions. **<font color='red'>Tokens are valid for 2 hours and must be refreshed after that point.</font>**

In [None]:
nbia.getToken()

Let's say that we're interested in the [QIN-Breast-02](https://doi.org/10.7937/TCIA.2019.4cfm06rr) collection. As you can see on the collection page, you must email help@cancerimagingarchive.net to request access to the data. Once you've recieved approval we can use **nbia.getSeries()** to get a full list of series UIDs in this restricted collection by including **api_url = "restricted"** as a parameter.

In [None]:
# getSeries with query parameters
df = nbia.getSeries(collection = "QIN-Breast-02", 
                      format = "df",
                      api_url = "restricted")
display(df)

**Note:** If you'd like to do further exploration of restricted datasets, you can modify any of the previously discussed queries in the notebook by adding the **api_url = "restricted"** parameter as shown above.

# Acknowledgements
TCIA is funded by the [Cancer Imaging Program (CIP)](https://imaging.cancer.gov/), a part of the United States [National Cancer Institute (NCI)](https://www.cancer.gov/).  It is managed by the [Frederick National Laboratory for Cancer Research (FNLCR)](https://frederick.cancer.gov/) and hosted by the [University of Arkansas for Medical Sciences (UAMS)](https://www.uams.edu/)

This notebook was created by [Justin Kirby](https://www.linkedin.com/in/justinkirby82/) and [Adam Li](https://www.linkedin.com/in/adam-l-713885121). If you leverage this notebook or any TCIA datasets in your work, please be sure to comply with the [TCIA Data Usage Policy](https://wiki.cancerimagingarchive.net/x/c4hF). In particular, make sure to cite the DOI(s) for the specific TCIA datasets you used in addition to the following paper!

# TCIA Citation

Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Koppel, P., Moore, S., Phillips, S., Maffitt, D., Pringle, M., Tarbox, L., & Prior, F. (2013). The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. Journal of Digital Imaging, 26(6), 1045–1057. https://doi.org/10.1007/s10278-013-9622-7