# GDC June 2021 Webinar: GDC Data Submission Overview

### Monday, June 28, 2021<br>2:00 PM - 3:00 PM (EST)<br>Bill Wysocki, Lead for GDC User Services <br>University of Chicago


# <a id='overview'>Notebook Overview</a>


### <a id='about_notebook'>About this notebook</a>

- This notebook functions as a step-by-step set of instructions to submit a BAM file to the GDC using Python. Submitters who have a completely empty project or who have just started submitting with python might find this useful.

- Commands and functions in this notebook will rely on the following Python packages:
    - `requests` - if not already installed on your system, can install with command `pip install requests` from command line or using a new code cell in this notebook
    - `json` - part of Python standard library, should already be installed on system
- To execute code in a code cell, press either 'Cmd + Enter' or 'Control + Enter' depending on operating system and keyboard layout
- A token file will need to be downloaded from the [GDC Submission Portal](https://docs.gdc.cancer.gov/Data_Submission_Portal/Users_Guide/Data_Submission_Process/#authentication)

### Overview

- For projects that have been approved to be included in the GDC, submitters can make use of the `submission` GDC API endpoint to submit node entities to submission projects
- Submission will require a token downloaded from the [GDC Submission Portal](https://docs.gdc.cancer.gov/Data_Submission_Portal/Users_Guide/Data_Submission_Process/#authentication)
- Data can be submitted in `JSON` or `TSV` format; depending on the data format, users will need to edit the `"Content-Type"` in the request command (see below)
- Additionally, `JSON` and `TSV` templates for nodes to be submitted can be downloaded from the GDC Data Dictionary Viewer webpage: https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?_top=1
- Submittable files (such as FASTQ or BAM files) should be uploaded with the [GDC Data Transfer Tool](https://gdc.cancer.gov/access-data/gdc-data-transfer-tool)
- Additional features and more information regarding submission using the GDC API can be found here: https://docs.gdc.cancer.gov/API/Users_Guide/Submission/ 
- [Strategies for Submitting in Bulk](https://docs.gdc.cancer.gov/Data_Submission_Portal/Users_Guide/Data_Submission_Walkthrough/#strategies-for-submitting-in-bulk)

### Endpoint

- The format for using the GDC API Submission endpoint uses the project information, i.e. `https://api.gdc.cancer.gov/submission/<program_name>/<project_code>`
- For example: https://api.gdc.cancer.gov/submission/TCGA/LUAD or https://api.gdc.cancer.gov/submission/CPTAC/3 

### Steps

1. Read in token file
2. Read in submission file
3. Edit endpoint with project ID information and submit data using `POST` (JSON file submission) or `PUT` (TSV file submission) request

### 1. Submitting a Case (JSON)

In [None]:
#1. Import Python packages and read in token file

import json
import requests

token = open("../gdc-user-token.txt").read().strip()

In [None]:
#2. Read in submission file

case_json = json.load(open("case.json"))

print(json.dumps(case_json, indent=4))

In [None]:
#3. Edit endpoint and submit data using PUT request

ENDPT = "https://api.gdc.cancer.gov/submission/GDC/INTERNAL/_dry_run"

#submission request if data is in JSON format
response = requests.put(url = ENDPT, json = case_json, headers={'X-Auth-Token': token, "Content-Type": "application/json"})
print(json.dumps(json.loads(response.text), indent = 4))

### 2: Submitting a Sample

In [None]:
#1. Read in submission file

sample_tsv = open("sample.tsv", "rb")
sample_tsv_display = open("sample.tsv", "r")

for x in sample_tsv_display.readlines():
   print(x.strip().split("\t"))

In [None]:
#2. Edit endpoint and submit data using PUT request

ENDPT = "https://api.gdc.cancer.gov/submission/GDC/INTERNAL/"

#submission request if data is in TSV format
response = requests.put(url = ENDPT, data = sample_tsv, headers={'X-Auth-Token': token, "Content-Type": "text/tsv"})

print(json.dumps(json.loads(response.text), indent = 4))

### 3: Submitting the Aliquot and Read_Group

In [None]:
#1. Read in submission file

aliquot_rg_json = json.load(open("aliquot_readgroup.json"))

print(json.dumps(aliquot_rg_json, indent=4))

In [None]:
#2. Submit data using PUT request

ENDPT = "https://api.gdc.cancer.gov/submission/GDC/INTERNAL"

#submission request if data is in JSON format
response = requests.put(url = ENDPT, json = aliquot_rg_json, headers={'X-Auth-Token': token, "Content-Type": "application/json"})
print(json.dumps(json.loads(response.text), indent = 4))

### 4: Register the Submitted Aligned Reads File

In [None]:
#1. Read in submission file

sar_json = json.load(open("SAR.json"))

print(json.dumps(sar_json, indent=4))

In [None]:
#2. Submit data using PUT request

ENDPT = "https://api.gdc.cancer.gov/submission/GDC/INTERNAL"

#submission request if data is in JSON format
response = requests.put(url = ENDPT, json = sar_json, headers={'X-Auth-Token': token, "Content-Type": "application/json"})
print(json.dumps(json.loads(response.text), indent = 4))

### 5: Upload the Submitted Aligned Reads Data File Using Data Transfer Tool


In [None]:
## ./gdc-client upload <UUID> -t token_file.txt