# CPTAC - CRC Cohort 5 2019
```
pi:ababaian
files: ~/Crown/data2/crc_cptac/
start: 2019 05 28
complete : 2019 06 02
```
## Introduction

This year a new CRC cohort of ~110 patients was released, there is matched RNA-seq, Exome-seq and Proteomics for this cohort. Unfortunately there is no normal control RNA-seq but if we take a leap of faith that the other 30+ cohorts are representative, then we can allow for the assumption that the normals are normal here too.

I've also recieved $$ from AWS again so it's time to spool up this analysis.

## Addendum -- remainder of CPTAC

After completing this experiment (only 66 CRC were available) there are an additional 168 tumour RNA-seq available in CPTAC from breast, lung and ovarian cancers. Run them for completion.

In [1]:
# Initialize
WORKDIR='/home/artem/Crown/data2/crc_cptac'
mkdir -p $WORKDIR; cd $WORKDIR



## Objective

1. Pilot: Align 5x CRC5 RNA-seq libraries to hgr1. Confirm OK.
2. Full : Align remaining 105x CRC5 libraries to hgr1


## Materials and Methods

### Data Initialization


From the SRA website, the CPTAC Confirmatory Study project was found (not referenced in paper!) [BioProject:
PRJNA279695](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA279695/), this includes breast, ovarian and colorectal cancer samples. Loaded all data into SRA table viewer, downloaded accession files. Filtered for RNA-seq and CRC yielding 110 samples.

The output of this parsing is copied to the input file: `cptac_pilot.input`

Input columns are (see below):

1. Library Name
2. Data Type
3. Sample ID
4. SRA Accesion
5. Experiment Accession


### Scripts and Localization

#### 1 - Localization

In [2]:
WORKDIR='/home/artem/Crown/data2/crc_cptac'
cd $WORKDIR
ls

# Amazon AWS S3 Home URL
S3URL='s3://crownproject/cptac'

CPTAC_conf_study_SraRunTable.xlsx
cptac_pilot0.input
cptac_pilot2.input
dbgap.key
droneB.sh
hgr1_align_v4.cptac.sh
Human__CPTAC_COAD__PNNL__Proteome__TMT__03_01_2017__BCM__Gene__PNNL_Tumor_TMT_UnsharedLogRatio.cct
input
meta
paper
queenB.sh
tcga_proteomics


In [3]:
INPUT='cptac_pilot0.input'
# Note the different column requirements from CCLE

cat $INPUT

01CO001	crc5	SAMN03453626	SRR1999563	SRX1011590
01CO005	crc5	SAMN03453627	SRR1999549	SRX1011576

#### 2 - Script Versions

In [9]:
cd $WORKDIR
# Echo scripts to be used for this analysis for version control.
# Note these need to be manually copied to the $WORKDIR

cat hgr1_align_v4.cptac.sh
echo 
echo
cat queenB.sh
echo 
echo
cat droneB.sh
echo 
echo 

#!/bin/bash
# hgr1_align_v4.ccle.sh
# rDNA alignment pipeline - SRA version
PIPE_VERSION='191003 build -- CPTAC'
AMI_VERSION='crown-190601 - ami-0b375c9c58cb4a7a2'
# EC2: c4.2xlarge (8cpu / 15 gb)
# EC2: c4.xlarge  (4cpu / 8  gb)
# Storage: 200 Gb
#

# Input Requirements --------------------------

# $1 : Library name + Output name(unique)
# $2 : Seq-read type (wgs|rna)
# $3 : BioSample ID
# $4 : Library SRA Accession

# Control Panel -------------------------------
# Amazon AWS S3 Home URL
  S3URL='s3://crownproject/cptac'

# CPU
	THREADS='3'

# Terminate instances upon completion (for debuggin)
  TERMINATE='FALSE'
    
# Read Group Data
  LIBRARY=$1    # Library Name / File prefix / patient ID
  TYPE=$2       # wgs OR rna data-type (using crc5 here)
	RGPO='cptac-crc'  # Patient Population - CPTAC
	RGSM=$3       # Sample ID
	RGID=$4       # Read Group ID. SRA Accession Number
  RGLB=$LIBRARY # Library Name. Accession Number
  RGPL='ILLUMINA'   # Seq P

## Results - CCLE Pilot Run I

#### 3 - Copy local to S3

In [7]:
# Local Folder Operations -----------------------------
# LOCAL:
cd $WORKDIR

#NOTE For pilot run, AWS s3 shutdown commented out. Re-upload hgr1 script upon full run

aws s3 cp queenB.sh $S3URL/scripts/
aws s3 cp droneB.sh $S3URL/scripts/
aws s3 cp hgr1_align_v4.cptac.sh $S3URL/scripts/
aws s3 cp $INPUT $S3URL/scripts/
aws s3 cp dbgap.key $S3URL/scripts/


Completed 4.5 KiB/4.5 KiB with 1 file(s) remainingupload: ./queenB.sh to s3://crownproject/cptac/scripts/queenB.sh
Completed 657 Bytes/657 Bytes with 1 file(s) remainingupload: ./droneB.sh to s3://crownproject/cptac/scripts/droneB.sh
Completed 5.1 KiB/5.1 KiB with 1 file(s) remainingupload: ./hgr1_align_v4.cptac.sh to s3://crownproject/cptac/scripts/hgr1_align_v4.cptac.sh
Completed 95 Bytes/95 Bytes with 1 file(s) remainingupload: ./cptac_pilot0.input to s3://crownproject/cptac/scripts/cptac_pilot0.input
Completed 152 Bytes/152 Bytes with 1 file(s) remainingupload: ./dbgap.key to s3://crownproject/cptac/scripts/dbgap.key


In [8]:
# start
date
date -u

Thu Oct  3 09:47:47 PDT 2019
Thu Oct  3 16:47:47 UTC 2019


#### 4 - Launch and run master EC2 node

### Note: 20190919

There is an error in running this analysis. 1) This data is restricted in dbGAP, and while we have access to one CPTAC data set (which looks to be consent forms), I didn't request acess to this cohort (phs000892.v6.p1).

: / fuck

By 20191003 I have access sorted out. Resume

In [10]:
# Remote EC2 Instance Operations ----------------------

# Remote:
# Manually open an Amazon Linux 2 AMI
# ami-061392db613a6357b
# t2.micro
#
# ssh login:
# ssh -i "crown.pem" ec2-user@PUBLICDNS
#

# Commands on EC2 machine to set-up AWS
# enter personal login info:

# REMOTE:
#aws configure
  # AWS Key ID
  # AWS Secret Key ID
  # Region: us-west-2
  
# Copy local run files to S3 and download them on EC2

# REMOTE:
# aws s3 cp --recursive s3://crownproject/cptac/scripts/ ./
#
# mv <KEY>.pem ~/.ssh/
# chmod 400 ~/.ssh/<KEY>.pem

# REMOTE:
# Open logging screen and being launchign EC2 instances
# screen -L
# 
# bash queenB.sh cptac_pilot0.input
#
# aws s3 cp screenlog.0 s3://crownproject/cptac/logs/cptac_pilot0.log

aws s3 cp s3://crownproject/cptac/logs/cptac_pilot0.log ./
cat cptac_pilot0.log
date -u

# Run completed successfully!


Completed 1.5 KiB/1.5 KiB with 1 file(s) remainingdownload: s3://crownproject/cptac/logs/cptac_pilot0.log to ./cptac_pilot0.log
kec2-user@ip-172-31-40-172:~\[?1034h[ec2-user@ip-172-31-40-172 ~]$ ls
ADcalc_ccle2.sh     CrownKey.pem  droneB.sh		  queenB.sh
cptac_pilot0.input  dbgap.key	  hgr1_align_v4.cptac.sh  screenlog.0
kec2-user@ip-172-31-40-172:~\[ec2-user@ip-172-31-40-172 ~]$ lsexitbash queenB.sh cptac_pilot0.input
Launch instance # 1
Thu Oct  3 16:50:00 UTC 2019
Instance Type: c4.xlarge
AMI Image: ami-0b375c9c58cb4a7a2
Run Script: s3://crownproject/cptac/scripts/hgr1_align_v4.cptac.sh
Parameters: 01CO001 crc5 SAMN03453626 SRR1999563 SRX1011590
Instance ID: i-0e576ba89c5776da1
Public DNS: ec2-54-185-143-115.us-west-2.compute.amazonaws.com
download: s3://crownproject/cptac/scripts/hgr1_align_v4.cptac.sh to ./hgr1_align_v4.cptac.sh


Launch instance # 2
Thu Oct  3 16:53:07 UTC 2019
Instance Type: c4.xlarge
AMI Image: ami-0b375c9c58cb4a

## CRC - CPTAC Full Run



In [11]:
# Repeat above with entire cohort, 45 nodes ~3x run
cd $WORKDIR
INPUT="cptac_crc.input"

cat $INPUT

01CO006	crc5	SAMN04111321	SRR2518440	SRX1288089
01CO008	crc5	SAMN04111439	SRR2518441	SRX1288090
01CO013	crc5	SAMN05127283	SRR9861902	SRX6616551
01CO014	crc5	SAMN06208758	SRR9861950	SRX6616680
01CO019	crc5	SAMN05127298	SRR9862138	SRX6615968
01CO022	crc5	SAMN06208530	SRR9861951	SRX6616681
05CO002	crc5	SAMN03453636	SRR1999486	SRX1011513
05CO003	crc5	SAMN03453647	SRR1999570	SRX1011597
05CO005	crc5	SAMN06208707	SRR9861979	SRX6615528
05CO006	crc5	SAMN03453668	SRR1999616	SRX1011643
05CO007	crc5	SAMN03453645	SRR1999590	SRX1011617
05CO011	crc5	SAMN03453615	SRR1999580	SRX1011607
05CO014	crc5	SAMN03453622	SRR1999556	SRX1011583
05CO020	crc5	SAMN04111397	SRR2518460	SRX1288109
05CO026	crc5	SAMN05127186	SRR9862145	SRX6615975
05CO028	crc5	SAMN04111354	SRR2518461	SRX1288110
05CO029	crc5	SAMN04111362	SRR2518462	SRX1288111
05CO032	crc5	SAMN04111410	SRR2518463	SRX1288112
05CO033	crc5	SAMN04111339	SRR2518464	SRX1288113
05CO034	crc5	SAMN05127120	SRR9862146	SRX6615976
05CO035	crc5	SAMN051

In [12]:
aws s3 cp queenB.sh $S3URL/scripts/
aws s3 cp droneB.sh $S3URL/scripts/
aws s3 cp hgr1_align_v4.cptac.sh $S3URL/scripts/
aws s3 cp $INPUT $S3URL/scripts/
aws s3 cp dbgap.key $S3URL/scripts/


Completed 4.5 KiB/4.5 KiB with 1 file(s) remainingupload: ./queenB.sh to s3://crownproject/cptac/scripts/queenB.sh
Completed 657 Bytes/657 Bytes with 1 file(s) remainingupload: ./droneB.sh to s3://crownproject/cptac/scripts/droneB.sh
Completed 5.1 KiB/5.1 KiB with 1 file(s) remainingupload: ./hgr1_align_v4.cptac.sh to s3://crownproject/cptac/scripts/hgr1_align_v4.cptac.sh
Completed 3.0 KiB/3.0 KiB with 1 file(s) remainingupload: ./cptac_crc.input to s3://crownproject/cptac/scripts/cptac_crc.input
Completed 152 Bytes/152 Bytes with 1 file(s) remainingupload: ./dbgap.key to s3://crownproject/cptac/scripts/dbgap.key


In [13]:
# Remote EC2 Instance Operations ----------------------

# Remote:
# Manually open an Amazon Linux 2 AMI
# ami-061392db613a6357b
# t2.micro
#
# ssh login:
# ssh -i "crown.pem" ec2-user@PUBLICDNS
#

# Commands on EC2 machine to set-up AWS
# enter personal login info:

# REMOTE:
#aws configure
  # AWS Key ID
  # AWS Secret Key ID
  # Region: us-west-2
  
# Copy local run files to S3 and download them on EC2

# REMOTE:
# aws s3 cp --recursive s3://crownproject/cptac/scripts/ ./
#
# mv <KEY>.pem ~/.ssh/
# chmod 400 ~/.ssh/<KEY>.pem

# REMOTE:
# Open logging screen and being launchign EC2 instances
# screen -L
# 
# bash queenB.sh cptac_pilot0.input
#
# aws s3 cp screenlog.0 s3://crownproject/cptac/logs/cptac_crc.log

aws s3 cp s3://crownproject/cptac/logs/cptac_crc.log ./
cat cptac_crc.log
date -u

# Run completed successfully!


Completed 40.5 KiB/40.5 KiB with 1 file(s) remainingdownload: s3://crownproject/cptac/logs/cptac_crc.log to ./cptac_crc.log
kec2-user@ip-172-31-40-172:~\[?1034h[ec2-user@ip-172-31-40-172 ~]$ cat queenB.sh 
#!/bin/bash
# queenB.sh
# 20180814 build
# EC2 Launch / Control Script
#

# 1. queenB script is initialized locally and input files
#    are parsed ready for cluster analaysis
# 2. queenB launches instances, logs in to it and runs the
#    droneB.sh script remotely.
# 3. The droneB script is executed on the instance and it
#    launches a `screen` on the instance and loads and 
#    starts to perform the $TASK (gather.sh) script.
# 4. TASK script should include a instance shut-down
#    command to close instance upon completion.
#

# Amazon AWS S3 Home URL
S3URL='s3://crownproject/cptac'

# EC2 TASK Script - script for droneB to execute
TASK="$S3URL/scripts/hgr1_align_v4.cptac.sh"

# Parameter file:
# Each line of PARAMETERS will

## Materials and Methods

`ADcalc_cptac.sh`

In [None]:
#!/bin/bash
# ADcalc_cptac.sh
# Allelic Depth Calculator
# for a position
#
# s3://crownproject/hCAGE/ADcalc_hcage.sh

# Controls -----------------
DEPTH='100000' #Max per file DP

# Regions in hgr1.fa reference genome
REGIONS=('chr13:1003660-1005529' 'chr13:1005529-1005629' \
        'chr13:10219-10340' 'chr13:1006622-1006779' 'chr13:1007948-1013018')

# Corresponding region/gene names
GENES=('18S' '18SE' '5S' '5.8S' '28S')

# 18S  1870
# 18SE 101
# 5S   122
# 5.8S 158
# 28S  5071

# Terminate instances upon completion (for debugging)
TERMINATE='FALSE'

# S3 Output directory
S3DIR='s3://crownproject/cptac/gvcf/'
BAMLIST='bam.list.tmp'

# Script ------------------ ------------------------------
cd ~/cptac/
mkdir -p GVCF #Output Folder
TYPE='cptac_crc' # hardcode single ccle run
cd BAM

#for TYPE in $(echo "hgr1")
#do
    echo Analyzing $TYPE...
    #cd $TYPE

    ls *.bam > bam.list.tmp
    ls *.bam > ../GVCF/$TYPE.bamlist
          
    for index in ${!GENES[*]}
    do
      printf "Started processing %s\n" ${GENES[$index]}
      OUTPUT="../GVCF/$TYPE.${GENES[$index]}.gvcf"

      # Iterate through every bam file in directory
      # look-up position and return VCF
      bcftools mpileup -f ~/resources/hgr1/hgr1.fa \
        --max-depth $DEPTH -A --min-BQ 30 \
        -a FORMAT/DP,AD \
        -r ${REGIONS[$index]} \
        --ignore-RG \
        -b $BAMLIST | \
        bcftools annotate -x INFO,FORMAT/PL - | \
        bcftools view -O v - \
        >> $OUTPUT

      RESULTS+=("$OUTPUT")
      printf "Done with %s \n" ${GENES[$index]}
      printf "%s\n" ${REGIONS[$index]}

    done

    rm bam.list.tmp

#    cd .. # move to tcga folder to reset
#done

# Copy GVCF output to AWS S3
cd ../GVCF
aws s3 cp --recursive ./ $S3DIR


In [None]:
#QED

## Addendum - CPTAC_BRCA, CPTAC_OV, CPTAC_LUNG



### Initialize

In [1]:
# Set WD
WORKDIR='/home/artem/Crown/data2/crc_cptac/scripts'
cd $WORKDIR

# Amazon AWS S3 Home URL
S3URL='s3://crownproject/cptac'



In [2]:
# Input list
INPUT='cptac_2.input'
# Note the different column requirements from CCLE

cat $INPUT

11BR020	cptac_brca	SAMN05127287	SRR9861882	SRX6615511
11BR022	cptac_brca	SAMN05127344	SRR9861883	SRX6615512
11BR023	cptac_brca	SAMN05127208	SRR9861884	SRX6615513
14BR005	cptac_brca	SAMN05127452	SRR9861893	SRX6615522
14OV011	cptac_ov	SAMN05127182	SRR9861896	SRX6615525
17OV010	cptac_ov	SAMN05127403	SRR9861901	SRX6616550
05BR031	cptac_brca	SAMN05127237	SRR9861905	SRX6616554
11BR055	cptac_brca	SAMN05127177	SRR9861923	SRX6616600
11BR058	cptac_brca	SAMN05127293	SRR9861925	SRX6616602
16BR012	cptac_brca	SAMN05127303	SRR9861940	SRX6616645
18BR004	cptac_brca	SAMN05127178	SRR9861945	SRX6616650
18BR007	cptac_brca	SAMN05127455	SRR9861947	SRX6616677
09BR005	cptac_brca	SAMN06208486	SRR9861957	SRX6616687
11BR044	cptac_brca	SAMN06208809	SRR9861964	SRX6616723
18BR009	cptac_brca	SAMN06208601	SRR9861991	SRX6615540
18BR010	cptac_brca	SAMN06208579	SRR9861992	SRX6615541
01BR033	cptac_brca	SAMN06208795	SRR9861997	SRX6615577
01OV045	cptac_ov	SAMN06208449	SRR9861998	SRX6615578
02OV029	cptac_ov

In [3]:
# Echo scripts to be used for this analysis for version control.

cat hgr1_align_v4.cptac.sh
echo 
echo
cat queenB.sh
echo 
echo
cat droneB.sh
echo 
echo 

#!/bin/bash
# hgr1_align_v4.ccle.sh
# rDNA alignment pipeline - SRA version
PIPE_VERSION='191003 build -- CPTAC'
AMI_VERSION='crown-190601 - ami-0b375c9c58cb4a7a2'
# EC2: c4.2xlarge (8cpu / 15 gb)
# EC2: c4.xlarge  (4cpu / 8  gb)
# Storage: 200 Gb
#

# Input Requirements --------------------------

# $1 : Library name + Output name(unique)
# $2 : Seq-read type (wgs|rna)
# $3 : BioSample ID
# $4 : Library SRA Accession

# Control Panel -------------------------------
# Amazon AWS S3 Home URL
  S3URL='s3://crownproject/cptac'

# CPU
	THREADS='3'

# Terminate instances upon completion (for debuggin)
  TERMINATE='TRUE'
    
# Read Group Data
  LIBRARY=$1    # Library Name / File prefix / patient ID
  TYPE=$2       # wgs OR rna data-type (using crc5 here)
	RGPO='cptac-crc'  # Patient Population - CPTAC
	RGSM=$3       # Sample ID
	RGID=$4       # Read Group ID. SRA Accession Number
  RGLB=$LIBRARY # Library Name. Accession Number
  RGPL='ILLUMINA'   # Seq Pl

In [4]:
aws s3 cp queenB.sh $S3URL/scripts/
aws s3 cp droneB.sh $S3URL/scripts/
aws s3 cp hgr1_align_v4.cptac.sh $S3URL/scripts/
aws s3 cp $INPUT $S3URL/scripts/
aws s3 cp dbgap.key $S3URL/scripts/

Completed 4.5 KiB/4.5 KiB with 1 file(s) remainingupload: ./queenB.sh to s3://crownproject/cptac/scripts/queenB.sh
Completed 657 Bytes/657 Bytes with 1 file(s) remainingupload: ./droneB.sh to s3://crownproject/cptac/scripts/droneB.sh
Completed 5.1 KiB/5.1 KiB with 1 file(s) remainingupload: ./hgr1_align_v4.cptac.sh to s3://crownproject/cptac/scripts/hgr1_align_v4.cptac.sh
Completed 8.7 KiB/8.7 KiB with 1 file(s) remainingupload: ./cptac_2.input to s3://crownproject/cptac/scripts/cptac_2.input
Completed 152 Bytes/152 Bytes with 1 file(s) remainingupload: ./dbgap.key to s3://crownproject/cptac/scripts/dbgap.key


In [5]:
# start
date
date -u

Fri Oct  4 06:25:51 PDT 2019
Fri Oct  4 13:25:51 UTC 2019


### Run


In [6]:
# Remote EC2 Instance Operations ----------------------

# Remote:
# Manually open an Amazon Linux 2 AMI
# ami-061392db613a6357b
# t2.micro
#
# ssh login:
# ssh -i "crown.pem" ec2-user@PUBLICDNS
#

# Commands on EC2 machine to set-up AWS
# enter personal login info:

# REMOTE:
#aws configure
  # AWS Key ID
  # AWS Secret Key ID
  # Region: us-west-2
  
# Copy local run files to S3 and download them on EC2

# REMOTE:
# aws s3 cp --recursive s3://crownproject/cptac/scripts/ ./
#
# mv <KEY>.pem ~/.ssh/
# chmod 400 ~/.ssh/<KEY>.pem

# REMOTE:
# Open logging screen and being launchign EC2 instances
# screen -L
# 
# bash queenB.sh cptac_2.input
#
# aws s3 cp screenlog.0 s3://crownproject/cptac/logs/cptac_2.log

aws s3 cp s3://crownproject/cptac/logs/cptac_2.log ./
cat cptac_2.log
date -u

Completed 94.2 KiB/94.2 KiB with 1 file(s) remainingdownload: s3://crownproject/cptac/logs/cptac_2.log to ./cptac_2.log
kec2-user@ip-172-31-43-66:~\[?1034h[ec2-user@ip-172-31-43-66 ~]$ bash queenB.sh cptac_2.input 
Launch instance # 1
Fri Oct  4 13:27:04 UTC 2019
Instance Type: c4.xlarge
AMI Image: ami-0b375c9c58cb4a7a2
Run Script: s3://crownproject/cptac/scripts/hgr1_align_v4.cptac.sh
Parameters: 11BR020 cptac_brca SAMN05127287 SRR9861882 SRX6615511
Instance ID: i-0d0bf2078a685f49a
Public DNS: ec2-34-220-73-123.us-west-2.compute.amazonaws.com
download: s3://crownproject/cptac/scripts/hgr1_align_v4.cptac.sh to ./hgr1_align_v4.cptac.sh


Launch instance # 2
Fri Oct  4 13:30:11 UTC 2019
Instance Type: c4.xlarge
AMI Image: ami-0b375c9c58cb4a7a2
Run Script: s3://crownproject/cptac/scripts/hgr1_align_v4.cptac.sh
Parameters: 11BR022 cptac_brca SAMN05127344 SRR9861883 SRX6615512
Instance ID: i-01927f3dd5f211f2d
Public DNS: ec2-34-219-204-154.us-west

#### Adcalc CPTAC II

`ADcalc_cptac2.sh`

In [None]:
#!/bin/bash
# ADcalc_cptac.sh
# Allelic Depth Calculator
# for a position
#
# s3://crownproject/hCAGE/ADcalc_hcage.sh

# Controls -----------------
DEPTH='100000' #Max per file DP

# Regions in hgr1.fa reference genome
REGIONS=('chr13:1003660-1005529' 'chr13:1005529-1005629' \
        'chr13:10219-10340' 'chr13:1006622-1006779' 'chr13:1007948-1013018')

# Corresponding region/gene names
GENES=('18S' '18SE' '5S' '5.8S' '28S')

# 18S  1870
# 18SE 101
# 5S   122
# 5.8S 158
# 28S  5071

# Terminate instances upon completion (for debugging)
TERMINATE='FALSE'

# S3 Output directory
S3DIR='s3://crownproject/cptac/gvcf/'
BAMLIST='bam.list.tmp'

# Script ------------------ ------------------------------
cd ~/cptac/
mkdir -p GVCF #Output Folder
TYPE='cptac_brca' # hardcode single ccle run
cd bam

    echo Analyzing $TYPE...
    #cd $TYPE

    ls *brca*.bam > bam.list.tmp
    ls *.bam > ../GVCF/$TYPE.bamlist
          
    for index in ${!GENES[*]}
    do
      printf "Started processing %s\n" ${GENES[$index]}
      OUTPUT="../GVCF/$TYPE.${GENES[$index]}.gvcf"

      # Iterate through every bam file in directory
      # look-up position and return VCF
      bcftools mpileup -f ~/resources/hgr1/hgr1.fa \
        --max-depth $DEPTH -A --min-BQ 30 \
        -a FORMAT/DP,AD \
        -r ${REGIONS[$index]} \
        --ignore-RG \
        -b $BAMLIST | \
        bcftools annotate -x INFO,FORMAT/PL - | \
        bcftools view -O v - \
        >> $OUTPUT

      RESULTS+=("$OUTPUT")
      printf "Done with %s \n" ${GENES[$index]}
      printf "%s\n" ${REGIONS[$index]}

    done

    rm bam.list.tmp


# Script ------------------ ------------------------------
cd ~/cptac/
mkdir -p GVCF #Output Folder
TYPE='cptac_ov' # hardcode single ccle run
cd bam

    echo Analyzing $TYPE...
    #cd $TYPE

    ls *ov*.bam > bam.list.tmp
    ls *.bam > ../GVCF/$TYPE.bamlist
          
    for index in ${!GENES[*]}
    do
      printf "Started processing %s\n" ${GENES[$index]}
      OUTPUT="../GVCF/$TYPE.${GENES[$index]}.gvcf"

      # Iterate through every bam file in directory
      # look-up position and return VCF
      bcftools mpileup -f ~/resources/hgr1/hgr1.fa \
        --max-depth $DEPTH -A --min-BQ 30 \
        -a FORMAT/DP,AD \
        -r ${REGIONS[$index]} \
        --ignore-RG \
        -b $BAMLIST | \
        bcftools annotate -x INFO,FORMAT/PL - | \
        bcftools view -O v - \
        >> $OUTPUT

      RESULTS+=("$OUTPUT")
      printf "Done with %s \n" ${GENES[$index]}
      printf "%s\n" ${REGIONS[$index]}

    done

    rm bam.list.tmp
    
# Script ------------------ ------------------------------
cd ~/cptac/
mkdir -p GVCF #Output Folder
TYPE='cptac_lung' # hardcode single ccle run
cd bam

    echo Analyzing $TYPE...
    #cd $TYPE

    ls *lung*.bam > bam.list.tmp
    ls *.bam > ../GVCF/$TYPE.bamlist
          
    for index in ${!GENES[*]}
    do
      printf "Started processing %s\n" ${GENES[$index]}
      OUTPUT="../GVCF/$TYPE.${GENES[$index]}.gvcf"

      # Iterate through every bam file in directory
      # look-up position and return VCF
      bcftools mpileup -f ~/resources/hgr1/hgr1.fa \
        --max-depth $DEPTH -A --min-BQ 30 \
        -a FORMAT/DP,AD \
        -r ${REGIONS[$index]} \
        --ignore-RG \
        -b $BAMLIST | \
        bcftools annotate -x INFO,FORMAT/PL - | \
        bcftools view -O v - \
        >> $OUTPUT

      RESULTS+=("$OUTPUT")
      printf "Done with %s \n" ${GENES[$index]}
      printf "%s\n" ${REGIONS[$index]}

    done

    rm bam.list.tmp

# Copy GVCF output to AWS S3
cd ../GVCF
aws s3 cp --recursive ./ $S3DIR