# TCGA API - Download Files-Cases Tables

---

*rbarreiro, May 2021*

Script to download TCGA files associated case data or metadata.

## Loading Libraries

In [1]:
import requests
import json

## Fields for the output table

In [2]:
fields = [
    "file_id",
    "file_name",
    "file_size",
    "md5sum",
    "state",
    "data_format",
    "cases.project.project_id",
    "cases.project.program.name",
    "cases.samples.sample_type",
    "experimental_strategy",
    "cases.submitter_id"
    ]

fields = ",".join(fields)

## Filtering samples 

(e.g Only BAM files, only Solid tissue normal, only RNA-Seq data, only in TCGA project)

In [3]:
filters = {
    "op": "and",
    "content":[
        {"op": "in",
        "content":{
            "field": "data_format",
            "value": ["BAM"],
            }},
        {"op": "in",
        "content":{
            "field": "cases.samples.sample_type",
            "value": ["solid tissue normal"],
            }},
        {"op": "in",
        "content":{
            "field": "experimental_strategy",
            "value": ["RNA-Seq"],
            }},
        {"op": "in",
        "content":{
            "field": "cases.project.program.name",
            "value": ["TCGA"],
            }}
        
        ]
    }

## Create request and download

*This may take a while*

In [4]:
# With a GET request, the filters parameter needs to be converted
# from a dictionary to JSON-formatted string

params = {
    "filters": json.dumps(filters),
    "fields": fields,
    "format": "TSV",
    "size": "3000"
    }

cases_endpt = "https://api.gdc.cancer.gov/files"
response = requests.get(cases_endpt, params = params)

## Creating Output

In [5]:
with open("output.tsv","w") as my_file:
    my_file.write(response.content.decode("utf-8"))