# TCGA - WXS Control Experiment
```
pi:ababaian
files: ~/Crown/data2/tcga_wxs/
start: 2019 12 20
complete : YYYY MM DD
```
## Introduction

For the MACP-Psi paper, a reviewer asked if there is DNA-level variation at 18S.1248.U (RNA45S.4908.T) which underlies the 'hypo-modification' phenotype.


### Reviewer Comment
```
1. The authors describe an rRNA base that varies at the RNA level in RNA-seq data, reflective of a modification that is caused by the RT enzyme generating a base change during cDNA synthesis. In order to conclude that this rRNA variant is due to loss of a modification being lost in colorectal cancer, rather than a somatic DNA variant, it is important that the variant is only detected at the RNA and not at the DNA level. The authors cite their own study to say that at the DNA level that RNA45S:1248.T is invariable. While this may be true, these are just based on normal human samples the author used in his previous study. It would be useful to confirm this in cancer samples at the DNA level with WGS data.
```

### Initial Response
```
We can include this analysis for TCGA-COAD as a single panel in S figure 1. We would like to note, the cancer-specific hypo-modification is an increase in the (invariable) reference T/U allele. If there was an rDNA variation it would be masked by the modification in the RNA-seq data and thus we did not include this experiment. What we can conclude is that the loss of modification is not caused by a DNA-level change, as any change from the reference T would appear as a modified rRNA. This statement will be added to the paper to make this explicit. Our pilot experiment of TCGA-COAD whole exome sequencing data analysis shows no DNA-level variation in cancer, we will require ~2 weeks to complete the analysis of all 947 samples.
```


## Hypothesis

1. The RNA45S.4908.T nucleotide is invariable in normal and cancer patient rDNA. That is the level of variation will be that of sequencing-error (>1%) and the same for normal and cancer.


## Matererials and Methods

### Data Initialization


- The initial plan was to use `bam slicing` capability to extract the hg38 `chrUn_GL000220v1:102,084-122,119` region from TCGA WXS data. This will then be re-aligned to hgr1 to measure _RNA45S.4908.T_ variation levels in each of the ~940 samples.

- There is an error and GDC helpdesk confirmed that the `_` character in chr name does not match the server-side REGEX limit for slicing. I'll have to download whole WXS files and use samtools to extract the region of interest.

The REGEX: `^[a-zA-Z0-9]+(:([0-9]+)?(-[0-9]+`

should read:

`^[a-zA-Z0-9_\-]+(:([0-9]+)?(-[0-9]+`

or ideally:

`^[\S]+(:([0-9]+)?(-[0-9]+`

In [None]:
# See: https://docs.gdc.cancer.gov/API/Users_Guide/BAM_Slicing/

# Set token as variable
token=$(cat gdc.token)

# Use GDC-API to download region
curl --header "X-Auth-Token: $token" \
  'https://api.gdc.cancer.gov/slicing/view/df80679e-c4d3-487b-934c-fcc782e5d46e?region=chrUn_GL000220v1' \
  --output get_RNA45S.bam

# Use GDC-API via POST
curl --header "X-Auth-Token: $token" \
  --request POST https://api.gdc.cancer.gov/slicing/view/9ca90dfa-e62f-4f9c-9946-dfcecfd3ca4d \
  --header "Content-Type: application/json" \
  -d@data.json \
  --output post_regions_slice.bam

## Where data.json is a file of format:
##
#{
#    "regions": [
#        "chrUn_GL000220v1"
#    ]
#}


## Error return:
#{
#"error": "u'chrUn_GL000220v1' does not match '^[a-zA-Z0-9]+(:([0-9]+)?(-[0-9]+
# "message": "u'chrUn_GL000220v1' does not match '^[a-zA-Z0-9]+(:([0-9]+)?(-[0-9
# }

From the GDC/TCGA website, this cohort of data was selected with the following filter command.

```
cases.project.project_id in ["TCGA-COAD"] and files.data_format in ["bam"] and files.experimental_strategy in ["WGS","WXS"]
```

Yielding 973 files in 443 cases in 22.95 Tb 0_0

The `Sample Sheet`, `File Manifest`, and `Biospecimen` data for this selection was downloaded. This is stored in `$PWD\metadata`

In the `TCGA_WXS1_File_Selection.xlsx` spreadsheet, this set of files was filtered/parsed to

1. If there is a technical replicate of the same sample, they will share a SampleID (`TCGA-XX-####-01A`), add a replicate suffix to make naming unique downstream (`TCGA-XX-####-01Ax`) where x = {_,b,c...}


The output of this parsing is copied to the input files: `tcga_wxs_pilot.input` and tcga_wxs.input`

### Scripts and Localization

#### 1 - Localization

In [1]:
WORKDIR='/home/artem/Crown/data2/tcga_wxs'
cd $WORKDIR
ls

# Amazon AWS S3 Home URL
S3URL='s3://crownproject/tcga_wxs'

droneB.sh             queenB.sh              tcga_wxs_pilot.input
hgr1_align_v5.wxs.sh  tcga_wxs.input         tcga_wxs_pilot.input1.log
metadata              tcga_wxs_errors.input  tcga_wxs_pilot.input2.log


In [2]:
INPUT='tcga_wxs_pilot.input'

cat $INPUT

TCGA-3L-AA1B-01A	TCGA-COAD	cbbbce29-1e5e-4dbc-b3fb-a96b69ba0bfe
TCGA-3L-AA1B-10A	TCGA-COAD	cb0a27a1-2fb4-4dd4-b036-9ead9492b404
TCGA-4N-A93T-01A	TCGA-COAD	81b63768-e633-4a6c-8ccb-2ebd174b45e7
TCGA-4N-A93T-10A	TCGA-COAD	9a8a5552-205a-4cb5-9a1c-0a3ae7c48a29
TCGA-4T-AA8H-01A	TCGA-COAD	78b6ebc3-95c4-42d3-b924-2787b06e0643
TCGA-4T-AA8H-10B	TCGA-COAD	b2e53fd1-b2f3-4b36-af43-9eb0165f519d
TCGA-5M-AAT4-01A	TCGA-COAD	75f75e67-8f41-42e2-9bb7-6fde50135aa8
TCGA-5M-AAT4-10A	TCGA-COAD	a0f5856e-d5e5-4bf7-8e8d-9af5281a7ef8
TCGA-5M-AAT5-01A	TCGA-COAD	4a9b3630-a447-435f-b633-8774748a6316
TCGA-5M-AAT5-10A	TCGA-COAD	e3557e59-b709-4b1b-8e50-3635e8683534

#### 2 - Script Versions

In [6]:
cd $WORKDIR
# Echo scripts to be used for this analysis for version control.
# Note these need to be manually copied to the $WORKDIR

cat hgr1_align_v5.wxs.sh
echo 
echo
cat queenB.sh
echo 
echo
cat droneB.sh
echo 
echo 

#!/bin/bash
# hgr1_align_v5.tcga-wxs.sh
# rDNA alignment pipeline - WXS extraction version
PIPE_VERSION='191220 build -- TCGA-WXS'
AMI_VERSION='crown-190601 - ami-0b375c9c58cb4a7a2'
# EC2: c4.2xlarge (8cpu / 15 gb)
# EC2: c4.xlarge  (4cpu / 8  gb)
# Storage: 200 Gb
#

# Input Requirements --------------------------

# $1 : Library name and file-output name (unique)
# $2 : Library population/analysis set
# $3 : Library UUID
# -- : Seq-read type = 'wxs'

# Control Panel -------------------------------
# Amazon AWS S3 Home URL
S3URL='s3://crownproject/tcga_wxs'

# CPU
	THREADS='3'

# Sequencing Data
	LIBRARY=$1 # Library/ File name

# Terminate instances upon completion (for debuggin)
  TERMINATE='FALSE'

# TCGA FILE UUID
  UUID=$3

# Region to extract
  REGION='chrUn_GL000220v1:1-161802'

 # FastQ File-names
    FQ0="$LIBRARY.tmp.sort.0.fq"
    FQ1="$LIBRARY.tmp.sort.1.fq"
    FQ2="$LIBRARY.tmp.sort.2.fq"
    FQR="$LIBRARY.region.fq"
    
# Read 

## Results - TCGA-WXS Pilot Run



#### 3 - Copy local to S3

In [10]:
# Local Folder Operations -----------------------------
# LOCAL:
cd $WORKDIR

#NOTE For pilot run, AWS s3 shutdown commented out. Re-upload hgr1 script upon full run

aws s3 cp queenB.sh $S3URL/scripts/
aws s3 cp droneB.sh $S3URL/scripts/
aws s3 cp hgr1_align_v5.wxs.sh $S3URL/scripts/
aws s3 cp $INPUT $S3URL/scripts/
aws s3 cp ../../gdc.token.txt $S3URL/scripts/gdc.token


Completed 4.5 KiB/4.5 KiB with 1 file(s) remainingupload: ./queenB.sh to s3://crownproject/tcga_wxs/scripts/queenB.sh
Completed 657 Bytes/657 Bytes with 1 file(s) remainingupload: ./droneB.sh to s3://crownproject/tcga_wxs/scripts/droneB.sh
Completed 4.6 KiB/4.6 KiB with 1 file(s) remainingupload: ./hgr1_align_v5.wxs.sh to s3://crownproject/tcga_wxs/scripts/hgr1_align_v5.wxs.sh
Completed 639 Bytes/639 Bytes with 1 file(s) remainingupload: ./tcga_wxs_pilot.input to s3://crownproject/tcga_wxs/scripts/tcga_wxs_pilot.input
Completed 1.0 KiB/1.0 KiB with 1 file(s) remainingupload: ../../gdc.token.txt to s3://crownproject/tcga_wxs/scripts/gdc.token


#### 4 - Launch and run master EC2 node

In [12]:
# Remote EC2 Instance Operations ----------------------

# Remote:
# Manually open an Amazon Linux 2 AMI
# ami-061392db613a6357b
# t2.micro
#
# ssh login:
# ssh -i "crown.pem" ec2-user@PUBLICDNS
# ssh -i "~/.ssh/CrownKey.pem" ec2-user@ec2-34-221-178-164.us-west-2.compute.amazonaws.com 

# Commands on EC2 machine to set-up AWS
# enter personal login info:

# REMOTE:
#aws configure
  # AWS Key ID
  # AWS Secret Key ID
  # Region: us-west-2
  
# Copy local run files to S3 and download them on EC2

# REMOTE:
# aws s3 cp --recursive s3://crownproject/tcga_wxs/scripts/ ./
#
# mv <KEY>.pem ~/.ssh/
# chmod 400 ~/.ssh/<KEY>.pem

# REMOTE:
# Open logging screen and being launchign EC2 instances
# screen -L
# 
# bash queenB.sh tcga_wxs_pilot.input
#
# aws s3 cp screenlog.0 s3://crownproject/tcga_wxs/logs/tcga_wxs_pilot.input.log

aws s3 cp s3://crownproject/tcga_wxs/logs/tcga_wxs_pilot.input1.log ./
cat tcga_wxs_pilot.input1.log

aws s3 cp s3://crownproject/tcga_wxs/logs/tcga_wxs_pilot.input2.log ./
cat tcga_wxs_pilot.input2.log

# Run completed successfully

Completed 2.2 KiB/2.2 KiB with 1 file(s) remainingdownload: s3://crownproject/tcga_wxs/logs/tcga_wxs_pilot.input1.log to ./tcga_wxs_pilot.input1.log
kec2-user@ip-172-31-29-116:~\[ec2-user@ip-172-31-29-116 ~]$ ls
CrownKey.pem  gdc.token             queenB.sh    tcga_wxs_pilot.input
droneB.sh     hgr1_align_v5.wxs.sh  screenlog.0
kec2-user@ip-172-31-29-116:~\[ec2-user@ip-172-31-29-116 ~]$ bash queenB.sh tcga_wxs_pilot.input 
Launch instance # 1
Sat Dec 21 02:41:12 UTC 2019
Instance Type: c4.xlarge
AMI Image: ami-0b375c9c58cb4a7a2
Run Script: s3://crownproject/tcga_wxs/scripts/hgr1_align_v5.wxs.sh
Parameters: TCGA-3L-AA1B-01A TCGA-COAD cbbbce29-1e5e-4dbc-b3fb-a96b69ba0bfe
Instance ID: i-0cd929e5bd12ab5cc
Public DNS: ec2-54-187-201-238.us-west-2.compute.amazonaws.com
download: s3://crownproject/tcga_wxs/scripts/hgr1_align_v5.wxs.sh to ./hgr1_align_v5.wxs.sh


Launch instance # 2
Sat Dec 21 02:44:18 UTC 2019
Instance Type: c4.xlarge
AMI Image: ami-0

## Results - TCGA-WXS Full Run

Run same as above but with TCGA-COAD list.


In [13]:
INPUT='tcga_wxs.input'
cat $INPUT

TCGA-5M-AAT6-01A	TCGA-COAD	e2fbd373-e44e-4d18-920a-58b5b7c35e67
TCGA-5M-AAT6-10A	TCGA-COAD	f1cabc65-4b00-44b6-80f0-fbb9193d25fc
TCGA-5M-AATA-01A	TCGA-COAD	4b9e0e99-808d-4cc6-8b7b-1b7e91dd1b54
TCGA-5M-AATA-10A	TCGA-COAD	ddd32176-6911-49c0-8d6d-904c69607019
TCGA-5M-AATE-01A	TCGA-COAD	0bb88200-c1e2-4417-ba37-b5d0248dd5ce
TCGA-5M-AATE-10A	TCGA-COAD	999ab1ab-d0e7-474e-82e0-3ef088c4def5
TCGA-A6-2671-01A	TCGA-COAD	90da5386-eac1-4f9b-8715-b101521819e2
TCGA-A6-2671-10A	TCGA-COAD	40924be9-a705-462d-a9ee-6d1fc956a06b
TCGA-A6-2671-11A	TCGA-COAD	5ae75803-dfad-4a48-92a7-2ef88356aaff
TCGA-A6-2672-01B	TCGA-COAD	cf44a424-5687-4164-beb2-1c68f9a78028
TCGA-A6-2672-01Bb	TCGA-COAD	1d7bb87a-0b9e-4ed6-98e1-d0bc11d62a73
TCGA-A6-2672-11A	TCGA-COAD	8c23361b-828d-42d6-a7e7-91034b303664
TCGA-A6-2674-01A	TCGA-COAD	9d2fb270-0601-4ac9-89d6-1c5b13079fe6
TCGA-A6-2674-01Ab	TCGA-COAD	9e070142-cf47-4217-9ca3-ff0c5d0b06a3
TCGA-A6-2674-01B	TCGA-COAD	92470c68-a390-4603-a4f4-6f5210cc1fcc
TCGA-A6-2674-10A	TCGA-C

In [14]:
# Local Folder Operations -----------------------------
# LOCAL:
cd $WORKDIR

#NOTE For pilot run, AWS s3 shutdown commented out. Re-upload hgr1 script upon full run

aws s3 cp queenB.sh $S3URL/scripts/
aws s3 cp droneB.sh $S3URL/scripts/
aws s3 cp hgr1_align_v5.wxs.sh $S3URL/scripts/
aws s3 cp $INPUT $S3URL/scripts/
aws s3 cp ../../gdc.token.txt $S3URL/scripts/gdc.token


Completed 4.5 KiB/4.5 KiB with 1 file(s) remainingupload: ./queenB.sh to s3://crownproject/tcga_wxs/scripts/queenB.sh
Completed 657 Bytes/657 Bytes with 1 file(s) remainingupload: ./droneB.sh to s3://crownproject/tcga_wxs/scripts/droneB.sh
Completed 4.6 KiB/4.6 KiB with 1 file(s) remainingupload: ./hgr1_align_v5.wxs.sh to s3://crownproject/tcga_wxs/scripts/hgr1_align_v5.wxs.sh
Completed 60.2 KiB/60.2 KiB with 1 file(s) remainingupload: ./tcga_wxs.input to s3://crownproject/tcga_wxs/scripts/tcga_wxs.input
Completed 1.0 KiB/1.0 KiB with 1 file(s) remainingupload: ../../gdc.token.txt to s3://crownproject/tcga_wxs/scripts/gdc.token


In [None]:
# Remote EC2 Instance Operations ----------------------
# bash queenB.sh tcga_wxs.input
#
# aws s3 cp screenlog.0 s3://crownproject/tcga_wxs/logs/tcga_wxs.input.log

aws s3 cp s3://crownproject/tcga_wxs/logs/tcga_wxs_input.log ./
cat tcga_wxs.input.log

### Error List

- TCGA-A6-2679-10A
```
Aligning
Error: Read HWI-ST807:232:D0VM5ACXX:5:1201:13180:439981/1 has more quality values than read characters.
```
- TCGA-A6-2681-10A `Read HWI-ST807:232:D0VM5ACXX:2:2210:16628:606201/1`
- TCGA-A6-2682-10A `Read HWI-ST807:232:D0VM5ACXX:2:2107:12164:573701/1`
- TCGA-A6-2684-10Ab `Read HWI-ST807:232:D0VM5ACXX:2:1309:16259:665651/1`
- TCGA-A6-4107-10A `Read HWI-ST807:232:D0VM5ACXX:4:2311:15608:577101/1`
- TCGA-A6-5665-01B `Read C0YRNACXX120905:4:1210:3297:319811/1`
- TCGA-AA-3494-11A
- TCGA-AA-3495-11A
- TCGA-AA-3506-11A
- TCGA-AA-3509-11A

```
# Temporary save command
aws s3 cp . s3://crownproject/tcga_wxs/error/ \
  --recursive --exclude "*" --include "*region.fq"
```

The Error is arising when quality of a line begins with `@` a /1 is being added by the sed command to the end. Need to make the regex more FASTQ sensitive.q

### Restart Run

After sample `TCGA-AA-3678-10A` will reboot the head node (ssh error also arose by changing permission of `.ssh` away from 700 accidently). I will fix the regex as well to prevent the above error.

```
ssh -i "~/.ssh/CrownKey.pem" ec2-user@ec2-35-167-240-253.us-west-2.compute.amazonaws.com
```

Changed sed command to:
```
  # Append \1 and \2 to each read name in fq files
  # Every 4th line in file
  sed -i '1~4 s/$/\/1/g' $FQ1
  sed -i '1~4 s/$/\/2/g' $FQ2
```

In [2]:
# Local Folder Operations -----------------------------
# LOCAL:
cd $WORKDIR

INPUT="tcga_wxs_errors.input"
cat $INPUT
echo ''
echo ''

aws s3 cp queenB.sh $S3URL/scripts/
aws s3 cp droneB.sh $S3URL/scripts/
aws s3 cp hgr1_align_v5.wxs.sh $S3URL/scripts/
aws s3 cp $INPUT $S3URL/scripts/
aws s3 cp ../../gdc.token.txt $S3URL/scripts/gdc.token


TCGA-A6-2681-10A	TCGA-COAD	e52fa5db-671e-4f0f-8279-01b66bd71069
TCGA-A6-2682-10A	TCGA-COAD	c6980d7e-8e22-49d7-9353-f8ed639b8a6a
TCGA-A6-2684-10Ab	TCGA-COAD	f48f0411-f1dc-41c3-a366-8aa05ebacb6c
TCGA-A6-4107-10A	TCGA-COAD	8c42abcb-73b2-403a-ba83-b26b36f43696
TCGA-A6-5665-01B	TCGA-COAD	4e12defd-63cd-4025-8e42-75d7fb737432
TCGA-AA-3494-11A	TCGA-COAD	ea3ab57b-9b07-452b-9fd7-1a1c144b33b0
TCGA-AA-3495-11A	TCGA-COAD	46c73be3-b3a0-47b6-ae4b-b81616b22a07
TCGA-AA-3506-11A	TCGA-COAD	87b516b0-9225-4d18-bbf0-4fc71717d307
TCGA-AA-3509-11A	TCGA-COAD	171a9bde-c390-47d1-8151-6baed9e546c3


Completed 4.5 KiB/4.5 KiB with 1 file(s) remainingupload: ./queenB.sh to s3://crownproject/tcga_wxs/scripts/queenB.sh
Completed 657 Bytes/657 Bytes with 1 file(s) remainingupload: ./droneB.sh to s3://crownproject/tcga_wxs/scripts/droneB.sh
Completed 4.6 KiB/4.6 KiB with 1 file(s) remainingupload: ./hgr1_align_v5.wxs.sh to s3://crownproject/tcga_wxs/scripts/hgr1_align_v5.wxs.sh
Completed 577 Bytes/577 

In [None]:
# REMOTE:
# aws s3 cp --recursive s3://crownproject/tcga_wxs/scripts/ ./
#
# mv <KEY>.pem ~/.ssh/
# chmod 400 ~/.ssh/<KEY>.pem

# REMOTE:
# Open logging screen and being launchign EC2 instances
# screen -L
# 
# bash queenB.sh tcga_wxs_pilot.input
#
# aws s3 cp screenlog.0 s3://crownproject/tcga_wxs/logs/tcga_wxs_pilot.input.log


In [3]:
# Local Folder Operations -----------------------------
# LOCAL:
# Split up remaining files into 3 inputs, run each
# in parallel
aws s3 cp tcga_wxs_2.input $S3URL/scripts/
aws s3 cp tcga_wxs_3.input $S3URL/scripts/
aws s3 cp tcga_wxs_4.input $S3URL/scripts/

Completed 43.8 KiB/43.8 KiB with 1 file(s) remainingupload: ./tcga_wxs_2.input to s3://crownproject/tcga_wxs/scripts/tcga_wxs_2.input
Completed 12.4 KiB/12.4 KiB with 1 file(s) remainingupload: ./tcga_wxs_3.input to s3://crownproject/tcga_wxs/scripts/tcga_wxs_3.input
Completed 12.6 KiB/12.6 KiB with 1 file(s) remainingupload: ./tcga_wxs_4.input to s3://crownproject/tcga_wxs/scripts/tcga_wxs_4.input


In [1]:
# All runs completed successfully, (non-zero bam file present)
# downloaded log files to S3 then to local

# One file was missed
# `TCGA-A6-2679-10A	TCGA-COAD	4b4502d4-27c3-4bd6-9f37-229efd347756`
# will run that one as a 'cleanup'
#

WORKDIR='/home/artem/Crown/data2/tcga_wxs'
cd $WORKDIR
ls logs


tcga_wxs_2.input.log  tcga_wxs_4.input.log  tcga_wxs_pilot.input1.log
tcga_wxs_3.input.log  tcga_wxs_error.log    tcga_wxs_pilot.input2.log


In [2]:
# Amazon AWS S3 file manifest
aws s3 ls s3://crownproject/tcga_wxs/TCGA-COAD/ > tcga_wxs.filelist
grep 'bam$' tcga_wxs.filelist > tcga_wxs.bamlist
wc -l tcga_wxs.bamlist


973 tcga_wxs.bamlist


## GVCF File for WXS

Use the `ADCalc.sh` script to generate GVCF files for rDNA based on WXS leakthrough bam files.



In [None]:
# DNS:ec2-54-149-215-86.us-west-2.compute.amazonaws.com
# AMI: ami-0b375c9c58cb4a7a2 (TCGA aligner)
# Instance: m4.4xlarge
# Storage: 400 Gb

# ON REMOTE:

## Copy WXS files (1.8 Gb) into it's dir
#mkdir -p ~/wxs; cd wxs
#aws s3 cp --recursive s3://crownproject/tcga_wxs/TCGA-COAD ./

## PATHS:
## ~/wxs

## Run ADcalc script.sh
# screen -L
# bash ADcalc_tcga.sh

# aws s3 cp screenlog.0 s3://crownproject/ccle/logs/ccle.gvcf.log

aws s3 cp s3://crownproject/ccle/logs/ccle.gvcf.log ./
cat ccle.gvcf.log

## DONE

In [None]:
#!/bin/bash
# ADcalc_wxs.sh
# Allelic Depth Calculator
# for a position
#

# Controls -----------------
DEPTH='100000' #Max per file DP

# Regions in hgr1.fa reference genome
REGIONS=('chr13:1003660-1005529' 'chr13:1005529-1005629' \
        'chr13:10219-10340' 'chr13:1006622-1006779' 'chr13:1007948-1013018')

# Corresponding region/gene names
GENES=('18S' '18SE' '5S' '5.8S' '28S')

# 18S  1870
# 18SE 101
# 5S   122
# 5.8S 158
# 28S  5071

# Terminate instances upon completion (for debugging)
TERMINATE='FALSE'

# S3 Output directory
S3DIR='s3://crownproject/tcga_wxs/gvcf/'

BAMLIST='bam.list.tmp'
TYPE='TCGA-COAD.WXS' # hardcode run type

# Script ------------------ ------------------------------
cd ~/wxs
mkdir -p ../GVCF #Output Folder

    echo Analyzing $TYPE...
    #cd $TYPE

    ls *.bam > bam.list.tmp
    ls *.bam > ../GVCF/$TYPE.bamlist
          
    for index in ${!GENES[*]}
    do
      printf "Started processing %s\n" ${GENES[$index]}
      OUTPUT="../GVCF/$TYPE.${GENES[$index]}.gvcf"

      # Iterate through every bam file in directory
      # look-up position and return VCF
      bcftools mpileup -f ~/resources/hgr1/hgr1.fa \
        --max-depth $DEPTH -A --min-BQ 30 \
        -a FORMAT/DP,AD \
        -r ${REGIONS[$index]} \
        --ignore-RG \
        -b $BAMLIST | \
        bcftools annotate -x INFO,FORMAT/PL - | \
        bcftools view -O v - \
        >> $OUTPUT

      RESULTS+=("$OUTPUT")
      printf "Done with %s \n" ${GENES[$index]}
      printf "%s\n" ${REGIONS[$index]}

    done

    rm bam.list.tmp

# Copy GVCF output to AWS S3
cd ../GVCF
aws s3 cp --recursive ./ $S3DIR


In [3]:
mkdir -p gvcf; cd gvcf
aws s3 cp --recursive s3://crownproject/tcga_wxs/gvcf/ ./

Completed 29.4 KiB/63.4 MiB with 6 file(s) remainingdownload: s3://crownproject/tcga_wxs/gvcf/TCGA-COAD.WXS.5S.gvcf to ./TCGA-COAD.WXS.5S.gvcf
Completed 29.4 KiB/63.4 MiB with 5 file(s) remainingCompleted 285.4 KiB/63.4 MiB with 5 file(s) remainingCompleted 541.4 KiB/63.4 MiB with 5 file(s) remainingCompleted 569.9 KiB/63.4 MiB with 5 file(s) remainingdownload: s3://crownproject/tcga_wxs/gvcf/TCGA-COAD.WXS.bamlist to ./TCGA-COAD.WXS.bamlist
Completed 569.9 KiB/63.4 MiB with 4 file(s) remainingCompleted 825.9 KiB/63.4 MiB with 4 file(s) remainingCompleted 1.1 MiB/63.4 MiB with 4 file(s) remaining  Completed 1.3 MiB/63.4 MiB with 4 file(s) remaining  Completed 1.6 MiB/63.4 MiB with 4 file(s) remaining  Completed 1.8 MiB/63.4 MiB with 4 file(s) remaining  Completed 2.1 MiB/63.4 MiB with 4 file(s) remaining  Completed 2.3 MiB/63.4 MiB with 4 file(s) remaining  Completed 2.6 MiB/63.4 MiB with 4 file(s) remaining  Completed 2.8 MiB/63.4 MiB with 4 file(s) remaining  Complete

## Discussion

Notes about run.



### Errors / Debugging
