<a href="https://colab.research.google.com/github/ImagingDataCommons/IDC-Examples/blob/master/API/notebooks/How_to_use_IDC_APIs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to Use the IDC APIs

## Overview of this notebook
This notebook is designed as a quick introduction to the IDC APIs and how to access them with Python.

Topics covered:
* Overviews of APIs, Swagger, JSON, endpoints
* Use cases for IDC APIs
* Examples of IDC API endpoints

### Overview of APIs
An API or application-programming interface is a software intermediary that allows two applications to talk to each other. In other words, an API is the messenger that delivers your request to the provider that you’re requesting it from and then delivers the response back to you [(Wikipedia)](https://en.wikipedia.org/wiki/Application_programming_interface). Each action that an API can take is called an "endpoint".

Some useful tutorials and quick start guides on APIs are:
* [GDC's Getting Started guide for APIs](https://docs.gdc.cancer.gov/API/Users_Guide/Getting_Started/)
* [API Integration in Python](https://realpython.com/api-integration-in-python/)
* [Python API Tutorial: Getting Started with APIs](https://www.dataquest.io/blog/python-api-tutorial/)

### What is an HTTP Message?

Clients and the IDC API server communicate via HTTP messages. An overview of HTTP messaging can be found [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Messages).

### What is JSON?

JSON  or JavaScript Object Notation is a lightweight data-interchange format that is easy for humans and machines to work with. More information can be found at [json.org](https://www.json.org/).

### What is SwaggerUI?

[SwaggerUI](https://swagger.io/tools/swagger-ui/) is a user interface that allows users to try out the APIs and view their documentation easily. You can access the IDC API SwaggerUI [here](https://api-dot-idc-dev.appspot.com/v1/swagger).

### What is an endpoint?

An endpoint is the *call* for a specific functionally of an API. For example, `/collections` at the end of the API request URL `https://api-dot-idc.appspot.com/v1/collections` is an endpoint that returns (or GETs) information about the available collections.

###IDC API Documentation

Detailed documentation on the IDC API can be found in the [API](https://learn.canceridc.dev/api/getting-started) section of the [IDC User Guide](https://learn.canceridc.dev/ps://).

### IDC API URL Preamble
We define the IDC API URL preamble as a variable so that we can easily change it.

In [1]:
idc_api_preamble = 'https://api-dot-idc-dev.appspot.com/v1'

### Python library `requests`

In this notebook, we use the Python Requests HTTP library to access IDC API endpoints.

In [2]:
# Install requests if needed
# pip install requests

# Import the requests library
import requests

## Use cases for IDC APIs

The IDC APIs can be used for a number of different tasks for interacting with the Google Cloud Platform and BigQuery. It can be used to subset data into cohorts or to access cohorts that have been created using the IDC WebApp. The location of the DICOM objects associated with a cohort can be obtained.

## Example: GET `/about` Endpoint



We are first going to explore the `about` endpoint using the 'GET' request to the API. This API will give you such information about the IDC API as links to the Swagger UI interface and to the IDC User Guide.

In [3]:
# First submit the 'get' request to the API
about_req = requests.get('{}/about'.format(idc_api_preamble))

Now that we have the request response, we are going to check that we didn't receive an error code or if the request was successful. If the request was successful, then the status code will come back as 200 but if something went wrong then the status code may be something 404 or 503. If you have recieved any error codes, you can check out Google's Troubleshooting response errors guide.



In [4]:
# Check that there wasn't an error with the request
if about_req.status_code != 200:
  # Print the error code if something went wrong
  print(about_req.status_code)

Finally, we will print out the information that we have received from the API. This response returns as a dictionary though responses can also be a combination of dictionaries and lists depending on which endpoint is called. This means that you can access different data in the response the same way that you would access dictionaries and lists as demonstarted below.

In [5]:
# Print the full response
print("Full response:\n")
print(about_req.json(), end='\n\n')

# Print the message portion of the response
print("Message:\n")
print(about_req.json()['message'], end='\n\n')

# Print the documentation portion of the response
print("Documentation:\n")
print(about_req.json()['documentation'])

Full response:

{'code': 200, 'documentation': 'SwaggerUI interface available at <https://dev-api.canceridc.dev/v1/swagger/>. Documentation is available at <https://learn.canceridc.dev/>', 'message': 'Welcome to the NCI IDC API, Version 1'}

Message:

Welcome to the NCI IDC API, Version 1

Documentation:

SwaggerUI interface available at <https://dev-api.canceridc.dev/v1/swagger/>. Documentation is available at <https://learn.canceridc.dev/>


The requests library makes it easy to use the APIs! Next we will cover a few of the other informational APIs.

## Example: GET `/versions` endpoint

Over time, the set of data hosted by the IDC will change. For the most part, such changes will be due to new data having been added. The totality of IDC hosted data resulting from any such change is represented by a unique IDC data version ID. That is, each time that the set of publicly available data changes, a new IDC version is created that exactly defines the revised data set. 

The IDC data version is intended to enable the reproducibility of research results. For example, consider a patient in the DICOM data model. Over time, new studies might be performed on a patient and become associated with that patient, and the corresponding DICOM instances added to the IDC hosted data. Moreover, additional patients might well be added to the IDC data set over time. This means that the set of subjects defined by some filter set will change over time. Thus, for purposes of reproducibility, we define a cohort in terms of a filter set and an IDC data version.

The `/versions` endpoint returns information about the defined IDC versions. This information includes the data sources (BQ tables) containing the data of each version, as well as the set of programs (sets of collections) belonging to a version. This endpoint returns a more complicated JSON object which has a combination of lists and dictionaries. We will first retrieve the request and then view if there was an error code within the response.

In [6]:
# Retrieve the response from the API endpoint
versions_req = requests.get('{}/versions'.format(idc_api_preamble))

# Check that there wasn't an error with the request
if versions_req.status_code != 200:
  # Print the error code and message if something went wrong
  print(versions_req.json())  # Print the error code if something went wrong

We are going to use the `json` library in order to view the response more easily.

In [7]:
# import pip json
import json

In [8]:
# Create a variable with the JSON output
versions_json = json.dumps(versions_req.json(), sort_keys=True, indent=4)

# Print the versions JSON text
print(versions_json)

{
    "code": 200,
    "versions": [
        {
            "active": false,
            "data_sources": [
                {
                    "data_type": "Clinical, Biospecimen, and Mutation Data",
                    "name": "isb-cgc.TCGA_bioclin_v0.Biospecimen"
                },
                {
                    "data_type": "Clinical, Biospecimen, and Mutation Data",
                    "name": "isb-cgc.TCGA_bioclin_v0.clinical_v1"
                },
                {
                    "data_type": "Image Data",
                    "name": "idc-dev-etl.idc_v1.dicom_pivot_v1"
                }
            ],
            "date_active": "2020-10-06",
            "idc_data_version": "1.0"
        },
        {
            "active": true,
            "data_sources": [
                {
                    "data_type": "Clinical, Biospecimen, and Mutation Data",
                    "name": "isb-cgc.TCGA_bioclin_v0.Biospecimen"
                },
                {
                

The returned data is a combination of dictionaries and lists. We see  that, as of this writing, there is a single IDC version, "1.0", that was activated on 2020-10-06. Next we will iterate over the JSON object to neatly list the data sources and programs in each version.

In [12]:
# Print out the number of IDC data versions
print('Number of IDC data versions: {}'.format(len(versions_req.json()['versions'])))

#...and for each version, print out the version's status, and the data sources in the version.
for version in versions_req.json()['versions']:
  print('version {} is {}'.format(version['idc_data_version'], 'active' if version['active'] else 'inactive'))
  for data_source in version['data_sources']:
    print('\tData source: {}, Data type: {}'.format(data_source['name'],data_source['data_type']))


Number of IDC data versions: 2
version 1.0 is inactive
	Data source: isb-cgc.TCGA_bioclin_v0.Biospecimen, Data type: Clinical, Biospecimen, and Mutation Data
	Data source: isb-cgc.TCGA_bioclin_v0.clinical_v1, Data type: Clinical, Biospecimen, and Mutation Data
	Data source: idc-dev-etl.idc_v1.dicom_pivot_v1, Data type: Image Data
version 2.0 is active
	Data source: isb-cgc.TCGA_bioclin_v0.Biospecimen, Data type: Clinical, Biospecimen, and Mutation Data
	Data source: isb-cgc.TCGA_bioclin_v0.clinical_v1, Data type: Clinical, Biospecimen, and Mutation Data
	Data source: idc-dev-etl.idc_v2.dicom_pivot_v2, Data type: Image Data


## Example: GET `/collections` endpoint

A *collection* is a set of DICOM data provided by a single source. Collections are further categorized as Original collections or Analysis collections. 

Original collections are comprised primarily of DICOM image data that was obtained from some set of patients. Typically, the patients in an Original collection are related by a common disease.

Analysis collections are comprised of derived DICOM data that was generated by analyzing other (typically Original) collections. Typically such analysis is performed by a different entity than that which provided the original collection(s) on which the analysis is based. Examples of data in analysis collections include segmentations, annotations and further processing of original images. Note that some Original collections include such data, though most of the data in Original collections are original images.
Programs

The programs that we listed above are sets of original collections. The collections in a program are produced by a single source. Some programs provide additional non-imaging data. For example, the TCGA program provides extensive ancillary clinical and genomics data about each of the patients in the program. 

The /collections endpoint returns data about collections in a specified program for some IDC data version. If no collection is specified, it returns data about about the collections in all programs for some IDC data version. If a version is not specified then /collections defaults to the current IDC data version.


We will request the collection data for the TCGA program in IDC data version 1.0.

The program and version are passed as *query parameters* in a *query string*. The requests library accepts a dictionary of query parameters.

In [15]:
query_string = dict(
    idc_data_version = "1.0"
    )

collections_req = requests.get('{}/collections'.format(idc_api_preamble),
                    params=query_string)
# Check that there wasn't an error with the request
if collections_req.status_code != 200:
  # Print the error code and message if something went wrong
  print(collections_req.json())

In [16]:
# Create a variable with the JSON output
collections_json = json.dumps(collections_req.json(), sort_keys=True, indent=4)

# Print the verscollectionsions JSON text
print(collections_json)

{
    "code": 200,
    "collections": [
        {
            "active": true,
            "cancer_type": "Prostate Cancer",
            "collection_id": "tcga_prad",
            "date_updated": "2021-03-30",
            "description": "<div>\n\t<strong>Note:&nbsp;This collection has special restrictions on its usage. See <a href=\"https://wiki.cancerimagingarchive.net/x/c4hF\" target=\"_blank\">Data Usage Policies and Restrictions</a>.</strong></p>\n<div>\n\t&nbsp;</p>\n<div>\n\t<span>The <a href=\"http://imaging.cancer.gov/\" target=\"_blank\"><u>Cancer Imaging Program (CIP)</u></a></span><span>&thinsp;</span><span> is working directly with primary investigators from institutes participating in TCGA to obtain and load images relating to the genomic, clinical, and pathological data being stored within the <a href=\"http://tcga-data.nci.nih.gov/\" target=\"_blank\">TCGA Data Portal</a>.&nbsp;Currently this image collection of prostate adenocarcinoma (PRAD) patients can be matched by eac

Metadata for original collections has been obtained from the [TCIA Data Collections page](https://www.cancerimagingarchive.net/collections/), and for analysis collections from the [TCIA Analysis Results page](https://www.cancerimagingarchive.net/tcia-analysis-results/). Not that the `idc_data_versions` component of each collection is a list because any particular collection will almost certainly be available in multiple IDC data versions.

## Example: POST `/cohorts/preview/manifest` endpoint

The POST `/cohorts/preview/manifest` endpoint returns a manifest of *access_methods* for the objects in cohort cohort_id. Please refer to the [API](https://learn.canceridc.dev/api/getting-started) section of the [IDC User Guide](https://learn.canceridc.dev/ps://) for further information on manifests.

The `/cohorts/preview/manifest` API does not actually create a cohort, but acts as if a cohort were created. Creating a cohort requires authenticating to the API; that process is addessed in a subsequent example describing the `/cohorts/{cohort_id}/manifest` API.

A manifest is a list of *access methods*. Each access method describes how to access the study, series and/or instance DICOM objects in the cohort. A manifest can optionally include additional metadata per DICOM object. 

The objects in the preview cohort are defined by a *filters* object that is implicitly applied against the current (active) IDC data version
A *filters* object is a list of *attribute*,*values* pairs, where *values* is a list of one or more values which must be satisfied by the associated attribute.

In the following, we construct a dict, `cohortSpec`, containing the name and a description for the preview cohort, as well as a *filters* object that selects for subjects in either the TCGA-LUAD or TCGA-KIRC collections, having DICOM data with a CT or MR modality, and are Asian.



In [31]:
filters = {
  "collection_id": [ "TCGA-LUAD", "TCGA-KIRC" ],
  "Modality": ["CT", "MR"],
  "race": ["ASIAN"]
}

cohortSpec = {"name": "testcohort",
              "description": "Test description",
              "filters": filters}



The query string selects additonal metadata to be return. In addition, the amount of data returned by each call can be limited. When this is done the API can be iteratively called until all data has been received. The params object below selects a limited set of data to be returned; refer to the API documention for details. In this example, we will limit the returned data to 2 rows.

In [50]:
params = dict(
    sql = False,
    Collection_ID = True,
    Patient_ID = True,
    CRDC_Instance_GUID = True,
    GCS_URL = True,
    page_size = 2
)

We are now ready to call the endpoint. Note that /cohorts/preview/manifest is a POST method, so we call requests.post()

In [51]:
response = requests.post('{}/cohorts/preview/manifest'.format(idc_api_preamble),
                    params=params, json=cohortSpec)

# Check that there wasn't an error with the request
if response.status_code != 200:
  # Print the error code and message if something went wrong
  print(response.json())

We will prettyprint the results for easier comprehension:

In [52]:
print(json.dumps(response.json(), sort_keys=True, indent=4))

{
    "code": 200,
    "cohort": {
        "description": "Test description",
        "filterSet": {
            "filters": {
                "Modality": [
                    "CT",
                    "MR"
                ],
                "collection_id": [
                    "tcga_luad",
                    "tcga_kirc"
                ],
                "race": [
                    "ASIAN"
                ]
            },
            "idc_data_version": "2.0"
        },
        "name": "testcohort",
        "sql": ""
    },
    "manifest": {
        "json_manifest": [
            {
                "CRDC_Instance_GUID": "dg.4DFC/000c8565-76f2-4bc8-9a34-33dd3d3924b3",
                "Collection_ID": "tcga_kirc",
                "GCS_URL": "gs://idc_dev/000c8565-76f2-4bc8-9a34-33dd3d3924b3.dcm",
                "Patient_ID": "TCGA-B0-4821"
            },
            {
                "CRDC_Instance_GUID": "dg.4DFC/002b13fe-8f12-415a-a5e8-401eedba2909",
                "Collection_I

The returned data includes the cohort name, description and filterset which we passed as parameters.  

We can see that there are 1581 total rows, but only 2 were returned.

We could also have requested that the BigQuery SQL that produced these results be returned.


## Paged results

It can be seen that the above result includes a non-null `next_page` token. This token can be passed as a parameter in a subsequent invocation of /cohorts/preview/manifest to obtain additional data.

When we pass a next_page value, all other parameters except page_size are ignored.


In [53]:
query_string = dict(
    page_size = 10,
    next_page = response.json()['next_page']
)
response = requests.post('{}/cohorts/preview/manifest'.format(idc_api_preamble),
                    params=query_string, json=cohortSpec)

# Check that there wasn't an error with the request
if response.status_code != 200:
  # Print the error code and message if something went wrong
  print(response.json())

print(json.dumps(response.json(), sort_keys=True, indent=4))



{
    "code": 200,
    "cohort": {},
    "manifest": {
        "json_manifest": [
            {
                "CRDC_Instance_GUID": "dg.4DFC/00606631-64e6-4d1c-95dd-2373648aafb6",
                "Collection_ID": "tcga_kirc",
                "GCS_URL": "gs://idc_dev/00606631-64e6-4d1c-95dd-2373648aafb6.dcm",
                "Patient_ID": "TCGA-B0-4821"
            },
            {
                "CRDC_Instance_GUID": "dg.4DFC/0077877c-6172-488b-8754-bfa7df7589fd",
                "Collection_ID": "tcga_kirc",
                "GCS_URL": "gs://idc_dev/0077877c-6172-488b-8754-bfa7df7589fd.dcm",
                "Patient_ID": "TCGA-B0-4821"
            },
            {
                "CRDC_Instance_GUID": "dg.4DFC/00be9515-8b58-45c0-91ee-4c5022be2c00",
                "Collection_ID": "tcga_kirc",
                "GCS_URL": "gs://idc_dev/00be9515-8b58-45c0-91ee-4c5022be2c00.dcm",
                "Patient_ID": "TCGA-CJ-4899"
            },
            {
                "CRDC_Instance_GUI

## Notes on Authorization and Credentials
In the next section we will focus on the POST /cohorts/{cohort_id}/manifest endpoint. Unlike the /cohort/preview/manifest endpoint, the /cohorts/{cohort_id}/manifest returns a manifest against a cohort that was previously defined via the IDC POST /cohorts API or through the IDC webapp.

In order to be able to create a cohort or access a previously defined cohort from the API, the user must authenticate their identity with the web app. This section will step through the authentication/authorization process. Perform the following steps steps on your local machine:

1. Create a Credential File on your local machine by using the [idc_auth.py](https://github.com/ImagingDataCommons/IDC-API/tree/master/scripts/idc_auth.py) script from the [IDC-API Repository](https://github.com/ImagingDataCommons/IDC-API.git)
  * This script can be run from the command line or from within Python but should be run on your local machine.
2. Find the location of the Credential File on your local machine
  * By default, the script will save the file in the user's home folder with the file name: ".idc_credentials"

We can now proceed to load the credential file into the cloud environment you are using:

In [None]:
# If you skipped earlier sections, you will need these two packages to run the
# code below
# Install requests if needed
#pip install requests

# Import json
#import json

# Import the requests library
#import requests

In [54]:
# import os
import os
# Import files helper for Colab
from google.colab import files

In [55]:
# First delete any existing .idc_credentials. If we do not do this, file.upload() will name the new file ".idc_credentials (1)"
try:
  os.remove('./.idc_credentials')
except:
  print('.idc_credentials not found')

# Upload your credentials to the cloud environment
uploaded = files.upload()

.idc_credentials not found


Saving .idc_credentials to .idc_credentials


Now that we have the Credentials file created and uploaded to the cloud environment, we can extract the ID token that identifies you for the purpose of authorization.

In [56]:
# Open the credentials file
token = open(".idc_credentials", "r")
# Create a json object from teh credential file
token = json.loads(token.read())
# Get Credentials from the token
creds = token['token_response']['id_token']
# Create a json object for requests header
head = {'Authorization': 'Bearer ' + creds}

**Note:** the credentials file will expire after 1 hour and a new one will need to be generated. If a new file is not generated with the idc_auth script, you can delete the original file and try running the script again.

## Example: POST `/cohort/{cohort_id}/manifest` Endpoint
We can now proceed to use the POST `/cohorts` endpoint to create a cohort. We will use the same cohort definition as previously.

In [61]:
filters = {
  "collection_id": [ "TCGA-LUAD", "TCGA-KIRC" ],
  "Modality": ["CT", "MR"],
  "race": ["ASIAN"]
}
  
cohortSpec = {"name": "testcohort",
              "description": "Test description",
              "filters": filters}

response = requests.post(f'{idc_api_preamble}/cohorts/',
                      json=cohortSpec, headers = head)

# Check that there wasn't an error with the request
if response.status_code != 200:
    # Print the error code and message if something went wrong
    print(response.json())

print(json.dumps(response.json(), sort_keys=True, indent=1))

cohort_id = response.json()['cohort_properties']['cohort_id']

{
 "code": 200,
 "cohort_properties": {
  "cohort_id": 146,
  "description": "Test description",
  "filterSet": {
   "filters": {
    "Modality": [
     "CT",
     "MR"
    ],
    "collection_id": [
     "TCGA-LUAD",
     "TCGA-KIRC"
    ],
    "race": [
     "ASIAN"
    ]
   },
   "idc_data_version": "2.0"
  },
  "name": "testcohort"
 }
}


Note that the response includes the cohort_id of the newly created cohort. We will use this ID when querying for the cohort's manifest. Note also that the response repeats the filter and other cohort metadata.

This time we will return only series GUIDS and .

In [65]:
query_string = dict(
    CRDC_Series_GUID = True,
    page_size = 10
)

response = requests.get('{}/cohorts/{}/manifest'.format(idc_api_preamble, cohort_id),
                      params=query_string, headers = head)

# Check that there wasn't an error with the request
if response.status_code != 200:
    # Print the error code and message if something went wrong
    print(response.json())

print(json.dumps(response.json(), sort_keys=True, indent=4))


{
    "code": 200,
    "cohort": {
        "cohort_id": 146,
        "description": "Test description",
        "filterSet": {
            "filters": {
                "Modality": [
                    "CT",
                    "MR"
                ],
                "collection_id": [
                    "TCGA-LUAD",
                    "TCGA-KIRC"
                ],
                "race": [
                    "ASIAN"
                ]
            },
            "idc_data_version": "2.0"
        },
        "name": "testcohort",
        "sql": ""
    },
    "manifest": {
        "json_manifest": [
            {
                "CRDC_Series_GUID": "dg.4DFC/099a1844-dc1c-44bd-91f8-e6f1860cbfce"
            },
            {
                "CRDC_Series_GUID": "dg.4DFC/0cda1d25-1c66-4e45-bf4a-e77718debed8"
            },
            {
                "CRDC_Series_GUID": "dg.4DFC/32305644-df68-4667-a22d-867725b63670"
            },
            {
                "CRDC_Series_GUID": "dg.4DF

As with POST `/cohorts/preview/manifest`, the returned next_page token can be used in subsequent calls to obtain additional data.

A `guid` can be resolved to a GA4GH DrsObject. Please refer to the [API](https://learn.canceridc.dev/api/getting-started) section of the [IDC User Guide](https://learn.canceridc.dev/ps://) for further information on DrsObjects and access methods.

##Example: POST /cohorts/preview/query
The POST /cohorts/preview/query API returns selected metadata for the elements in a specified cohort. Like the POST /cohorts/preview/manifest API, the `/cohorts/preview/query` API does not actually create a cohort, but acts as if a cohort were created.

The values that can be queried are the Original and Derived values that define a filter. Currently these are:
Modality, BodyPartExamined, StudyDescription, StudyInstanceUID, PatientID, SeriesInstanceUID, SOPInstanceUID, SeriesDescription, SliceThickness, SeriesNumber, StudyDate, SOPClassUID, collection_id, AnatomicRegionSequence, SegmentedPropertyCategoryCodeSequence, SegmentedPropertyTypeCodeSequence, FrameOfReferenceUID, SegmentNumber, SegmentAlgorithmType, SUVbw, Volume, Diameter, Surface_area_of, Total_Lesion, Standardized_Added_Metabolic, Percent_Within_First_Quarter_of_Intensity_Range, Percent_Within_Third_Quarter_of_Intensity_Range, Percent_Within_Fourth_Quarter_of_Intensity_Range, Percent_Within_Second_Quarter_of_Intensity_Range, Standardized_Added_Metabolic_Activity, Glycolysis_Within_First_Quarter_of_Intensity_Range, Glycolysis_Within_Third_Quarter_of_Intensity_Range, Glycolysis_Within_Fourth_Quarter_of_Intensity_Range, Glycolysis_Within_Second_Quarter_of_Intensity_Range, Internal, Sphericity, Calcification, Lobular, Spiculation, Margin, Texture, Subtlety, Malignancy 



In the following, the filter defines a cohort of LIDC_IDRC cases that have spiculation values of 4 or 5 out of 5. The queried values include the PatientID, SOPInstanceID, Modality and Spiculation. 

In [69]:
filters = {
  "collection_id": [
    "LIDC_IDRI"
   ],
   "Spiculation": [
     "4 Out of 5",
     "5 out of 5 (Marked spiculation)"
    ]
}

cohort_def = {"name": "testcohort",
              "description": "Test description",
              "filters": filters}

queryFields = {
    "fields": [
      "PatientID","SOPInstanceUID","Modality","Spiculation"
    ]
  }
queryPreviewBody = {"cohort_def": cohort_def,
                    "queryFields": queryFields}

query_string = {
    'sql': False,
    'page_size': 10
}

response = requests.post(f'{idc_api_preamble}/cohorts/preview/query',
                      json=queryPreviewBody, 
                      params=query_string)
# Check that there wasn't an error with the request
if response.status_code != 200:
    # Print the error code and message if something went wrong
    print(response.json())

print(json.dumps(response.json(), sort_keys=True, indent=4))

{'code': 500, 'message': 'Encountered an error while attempting to get metadata.'}
{
    "code": 500,
    "message": "Encountered an error while attempting to get metadata."
}


Like the manifest APIs, there is also a GET cohorts/{cohort_id}/query form of this API, that returns additional data about an existing cohort. Also like the manifest APIs, the query APIs are paged.

##Example: GET /dicomMetadata

For completeness, this last section reviews the GET /dicomMetadata endpoint. This endpoint returns a fixed selection of DICOM metadata for all instances in IDC DICOM data. It is intended for use by other CRDC resources that might need such information for the purpose of aggregating imaging data with other data types.

Because it returns metadata on all DICOM instances, paging must be used to obtain the complete set of results.

In [71]:
query_string = dict(
    page_size = 10
)

response = requests.get('{}/dicomMetadata'.format(idc_api_preamble, cohort_id),
                      params=query_string, headers = head)

# Check that there wasn't an error with the request
if response.status_code != 200:
    # Print the error code and message if something went wrong
    print(response.json())

print(json.dumps(response.json(), sort_keys=True, indent=4))

{
    "code": 200,
    "next_page": "gAAAAABgvSBCkGcWUId1Qu6zsTe08jeIPzUZDYd46Yuw9e5m4LYtHEtRc7b5-4RJO6CKm1W9W_AuXDIJeE1EtEFQj0REHi6SNUq-6g9GZrCJk4U496cQgk7dAH5omjhREALz5mjuUWfV6VgfgQS5ku5T3liQEb-pfxg5yg6oNhvY8OoS2H7OOH9ObwMK1daA0vrXkB81QgYnCkQlwgTDMHd2XlQ7ygGO_3ip22teN9xpLTGYTUiY-RjfF2BqKJqfIjOmi6yG844abb34Dii6otjESdUKJvHfigbzhkyZf2a0BFBhTP3DGEWkZ_8YD6tT9t4yIS7wc74G",
    "query_results": {
        "json": [
            {
                "AdditionalPatientHistory": null,
                "Allergies": [],
                "BodyPartExamined": "BREAST",
                "EthnicGroup": null,
                "ImageType": [
                    "DERIVED",
                    "PRIMARY",
                    "DCE",
                    "SER"
                ],
                "LastMenstrualDate": null,
                "MedicalAlerts": [],
                "Modality": "MR",
                "Occupation": null,
                "PatientAge": "039Y",
                "PatientComments": null,
             