<a href="https://colab.research.google.com/github/kirbyju/TCIA_Notebooks/blob/main/TCIA_Series_UID_Report.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Summary

This notebook can be used to generate reports about TCIA data given a list of Series Instance UIDs.  The series UIDs can come from a TCIA manifest file or text file (one Series UID per row). The output options include:

1. A report containing the Collection Name, Subject ID,	Study UID,	Study Description,	Study Date,	Series UID,	Series Description,	Number of Images,	File Size (Bytes), Modality, and Manufacturer for each series
2.   A report that contains those columns plus Data Description URI (DOI), SOP Class UID, License Name, and License URL.  However, this report takes significantly longer to run.

# 1 Import tcia_utils

The following cells import [**tcia_utils**](https://github.com/kirbyju/TCIA_Notebooks/raw/main/tcia_utils.py) which contain a variety of useful functions for accessing TCIA via Jupyter/Python.

In [None]:
# imports
import requests
import pandas as pd

# download tcia_utils
tcia_utils_text = requests.get("https://github.com/kirbyju/TCIA_Notebooks/raw/main/tcia_utils.py")
with open('tcia_utils.py', 'wb') as f:
    f.write(tcia_utils_text.content)

In [None]:
import tcia_utils as tcia

# 2 Create a Token
To ensure you can obtain info about all Series UIDs in your list, you must enter your TCIA login/password to create a token.






In [None]:
tcia.getToken()

# 3 Import Series UIDs 

If you already have the file containing your series UIDs saved on the machine where this notebook is running you can skip this step. Otherwise:

1. To import a file to Colab from your hard drive, use the menu on the left sidebar to upload it. 
2. To import a file from the web (e.g. TCIA), use the command in the next cell by updating it with the URL of the file you want to analyze.  



In [None]:
# OPTIONAL: import your UID file from the web
url = "https://URL_on_TCIA/manifest.tcia"
local_filename = "manifest.tcia"

manifest = requests.get(url)
with open(local_filename, 'wb') as f:
    f.write(manifest.content)

Next we'll read in the UIDs from your file into a python list.  If you're using a manifest file, the code below will put the Series UIDs into a list while ignoring the parameter text.  

If you're using a text/csv file it will insert all rows into the list, so verify the file is formatted correctly **(one UID per row with no column header or commas)** or you may encounter errors.

In [None]:
# enter manifest path/filename
manifest = "manifest.tcia"

# converts manifest to list of UIDs
uids = tcia.manifestToList(manifest)

# 4 Create the Report

## 4.1 Create a Report of Series Metadata (Option 1)

This option will create **scan_metadata.csv** containing the Collection Name, Subject ID, Study UID, Study Description, Study Date, Series UID, Series Description, Number of Images, File Size (Bytes), Modality, and Manufacturer for each scan.

_**Note: This report generates more quickly than option #2 and is sufficient if you don't need Data Description URI (DOI), SOP Class UID, License Name, and License URL.**_

In [None]:
df = tcia.getSeriesList(uids)
display(df)

## 4.2 Create a Report of Series Metadata with DOIs and Licenses (Option 2)

This option will create **scan_metadata_with_DOIs_Licenses.csv**. This report will take more time to complete, especially with large manifests, but includes additional columns that are not available in Option 1, including Data Description URI (DOI), SOP Class UID, License Name,	and License URL.

In [None]:
df = pd.DataFrame()
count = 0
total = len(uids)

for seriesUid in uids:
    metadata = tcia.getSeriesMetadata(seriesUid, api_url = "restricted")
    df = pd.concat([df, pd.DataFrame(metadata)], ignore_index=True)
    count += 1;
    print('Completed', count, 'out of', total, 'series.')
        
df.to_csv('scan_metadata_with_DOIs_Licenses.csv')
display(df)

# Acknowledgements
TCIA is funded by the [Cancer Imaging Program (CIP)](https://imaging.cancer.gov/), a part of the United States [National Cancer Institute (NCI)](https://www.cancer.gov/), and is managed by the [Frederick National Laboratory for Cancer Research (FNLCR)](https://frederick.cancer.gov/).

This notebook was created by [Justin Kirby](https://www.linkedin.com/in/justinkirby82/).  If you leverage this notebook or any TCIA datasets in your work, please be sure to comply with the [TCIA Data Usage Policy](https://wiki.cancerimagingarchive.net/x/c4hF). In particular, make sure to cite the DOI(s) for the specific TCIA datasets you used in addition to the following paper!

## TCIA Citation

Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Koppel, P., Moore, S., Phillips, S., Maffitt, D., Pringle, M., Tarbox, L., & Prior, F. (2013). The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. Journal of Digital Imaging, 26(6), 1045–1057. https://doi.org/10.1007/s10278-013-9622-7