# TCGA RNA-seq hgr1 alignments -- All types
```
pi:ababaian
files: ~/Crown/data2/tcga_3_general
start: 2018 08 30
complete : 2018 09 11
```
## Introduction

hgr1 Alignment of the TCGA RNA-seq libraries. Only libraries for which there is a matched-normal from the same sample. This is the collection of all data (except LUSC and COAD which is already done)


In [1]:
WORKDIR='/home/artem/Crown/data2/tcga_3_general'

cd $WORKDIR



## Materials and Methods

#### TCGA Data Input

Search Term for limiting files
```
cases.project.project_id IN ["TCGA-*"] and cases.project.project_id EXCLUDE ["TCGA-LUSC"] and cases.project.project_id EXCLUDE ["TCGA-COAD"] and cases.project.project_id EXCLUDE ["TCGA-UCEC"] and files.data_category in ["Raw Sequencing Data","Transcriptome Profiling"] and files.data_format in ["BAM"] and files.experimental_strategy in ["RNA-Seq"]
```

9,437 files are selected with these parameters.


In [3]:
INPUT1='tcga_run_pilot.txt'

cat $INPUT1

TCGA-BL-A13J-01B TCGA-BLCA 459ce800-7bab-428d-aeff-323609e11707
TCGA-BL-A13J-11A TCGA-BLCA ad9d77b2-ddcb-4f03-9dd0-cd562fb59495
TCGA-A7-A0CH-01A TCGA-BRCA 2e8875aa-4a40-489f-b59f-3c6e9ee3df5f
TCGA-A7-A0CH-11A TCGA-BRCA 82e56c60-a919-4909-bd40-72abfdcacdd2
TCGA-A7-A0D9-01A TCGA-BRCA c0ecd314-9d99-48ec-83f1-5a0c1ed656aa
TCGA-A7-A0D9-11A TCGA-BRCA 17cf6364-e228-4ee9-bffa-d1ad75f4152b
TCGA-FU-A3EO-01A TCGA-CESC 2c0ce235-6aeb-4bae-9dce-877a9ff9bd11
TCGA-FU-A3EO-11A TCGA-CESC 266cc885-563f-4fd7-87e4-88e00359313a

#### Scripts

In [8]:
WORKDIR='/home/artem/Crown/data2/tcga_3_general'

cd $WORKDIR

cat hgr1_align_v2.tcga.sh
echo 
echo
cat queenB.sh
echo 
echo
cat droneB.sh
echo 
echo 

#!/bin/bash
# 1kg_align_v2.tcga.sh
# rDNA alignment pipeline
# 180831 build -- TCGA
# AMI: crown-180813 - ami-0031fd61f932bdef9
# EC2: c4.2xlarge (8cpu / 15 gb)
# EC2: c4.xlarge  (4cpu / 8  gb)
# Storage: 200 Gb
#

# Input Requirements --------------------------

# $1 : Library name and file-output name
# $2 : Library population/analysis set
# $3 : Library UUID

# Control Panel -------------------------------
# CPU
	THREADS='3'

# Sequencing Data
	LIBRARY=$1 # Library/ File name

# TCGA FILE UUID
  UUID=$3

 # FastQ File-names
    FQ0="$LIBRARY.tmp.sort.0.fq"
    FQ1="$LIBRARY.tmp.sort.1.fq"
    FQ2="$LIBRARY.tmp.sort.2.fq"
    
# Read Group Data
# Extract from downloaded BAM file / input
	RGPO=$2 # Patient Population

	#RGSM= # Sample. Patient Identifer
	#RGID= # Read Group ID. Accession Number
    
	RGLB=$LIBRARY # Library Name. Accession Number
	RGPL='ILLUMINA'  # Sequencing Platform.
    
	# Extract Sequencing Run Info
	#  RGPU=$(gzip -dc $

## Pilot Run

In [9]:
# Instead of running locally; to run 1000+ samples use an EC2 machine
# as the launcher so it can stay online for a long time (use Free-tier computer)

# LOCAL: 
aws s3 cp hgr1_align_v2.tcga.sh s3://crownproject/tcga/scripts/
aws s3 cp queenB.sh             s3://crownproject/tcga/scripts/
aws s3 cp droneB.sh             s3://crownproject/tcga/scripts/
aws s3 cp tcga_run_pilot.txt    s3://crownproject/tcga/scripts/

# LOCAL:
# Copy over access Key for EC2 instances
# aws s3 cp ~/.ssh/<KEY>.pem s3://crownproject/<KEY>.pem

Completed 7.3 KiB/7.3 KiB with 1 file(s) remainingupload: ./hgr1_align_v2.tcga.sh to s3://crownproject/tcga/scripts/hgr1_align_v2.tcga.sh
Completed 3.8 KiB/3.8 KiB with 1 file(s) remainingupload: ./queenB.sh to s3://crownproject/tcga/scripts/queenB.sh
Completed 657 Bytes/657 Bytes with 1 file(s) remainingupload: ./droneB.sh to s3://crownproject/tcga/scripts/droneB.sh
Completed 511 Bytes/511 Bytes with 1 file(s) remainingupload: ./tcga_run_pilot.txt to s3://crownproject/tcga/scripts/tcga_run_pilot.txt


In [10]:
# Manually open an Amazon Linux 2 AMI
# ami-6cd6f714
# t2.micro
#
# ssh login:
# ssh -i "crown.pem" ec2-user@PUBLICDNS
#

# Commands on EC2 machine to set-up AWS
# enter personal login info:

# REMOTE:
#aws configure
  # AWS Key ID
  # AWS Secret Key ID
  # Region: us-west-2
  
# Copy local run files to S3 and download them on EC2

# REMOTE:
# aws s3 cp --recursive s3://crownproject/tcga/scripts/ ./
#
# mv <KEY>.pem ~/.ssh/
# chmod 400 ~/.ssh/<KEY>.pem

# REMOTE:
# Open logging screen and being launchign EC2 instances
# screen -L
# 
# bash queenB.sh tcga_run_pilot.txt

cat logs/tcga_3_general_pilot.log

## canceled early to run 100 samples (everythign seems in order)

kec2-user@ip-172-31-26-100:~\[?1034h[ec2-user@ip-172-31-26-100 ~]$ exit[Kbash [K[K[K[K[Kexit[2Plsbash queenB.sh tcga_run_pilot.txt
Launch instance # 1
Sat Sep  1 00:44:46 UTC 2018
Instance Type: c4.xlarge
AMI Image: ami-0031fd61f932bdef9
Run Script: s3://crownproject/tcga/scripts/hgr1_align_v2.tcga.sh
Parameters: TCGA-BL-A13J-01B TCGA-BLCA 459ce800-7bab-428d-aeff-323609e11707
Instance ID: i-07f4d038139aee980
Public DNS: ec2-35-165-237-201.us-west-2.compute.amazonaws.com
download: s3://crownproject/tcga/scripts/hgr1_align_v2.tcga.sh to ./hgr1_align_v2.tcga.sh


Launch instance # 2
Sat Sep  1 00:47:54 UTC 2018
Instance Type: c4.xlarge
AMI Image: ami-0031fd61f932bdef9
Run Script: s3://crownproject/tcga/scripts/hgr1_align_v2.tcga.sh
Parameters: TCGA-BL-A13J-11A TCGA-BLCA ad9d77b2-ddcb-4f03-9dd0-cd562fb59495
Instance ID: i-0bba0b3fde3c3c47c
Public DNS: ec2-34-213-251-53.us-west-2.compute.amazonaws.com
download: s3://cr

## Samples 1-100

In [13]:
cat queenB.sh # modified for 25 instance max

#!/bin/bash
# queenB.sh
# 20180814 build
# EC2 Launch / Control Script
#

# 1. queenB script is initialized locally and input files
#    are parsed ready for cluster analaysis
# 2. queenB launches instances, logs in to it and runs the
#    droneB.sh script remotely.
# 3. The droneB script is executed on the instance and it
#    launches a `screen` on the instance and loads and 
#    starts to perform the $TASK (gather.sh) script.
# 4. TASK script should include a instance shut-down
#    command to close instance upon completion.
#

# EC2 TASK Script - script for droneB to execute
TASK="s3://crownproject/tcga/scripts/hgr1_align_v2.tcga.sh"

# Parameter file:
# Each line of PARAMETERS will be input to STDIN of
# the droneB script which can then be used to run the
# TASK script.
# i.e. bash droneB.sh <line_N_of_PARAMETERS>
# PARAMETERS="tcga0_input.txt"
PARAMETERS=$1

# EC2 Set-up
instanceTYPE='c4.xlarge'
imageID='ami-0031fd61f932bdef9' #AMI TCGA

devNAME='

In [14]:
# Instead of running locally; to run 1000+ samples use an EC2 machine
# as the launcher so it can stay online for a long time (use Free-tier computer)

# LOCAL: 
aws s3 cp hgr1_align_v2.tcga.sh s3://crownproject/tcga/scripts/
aws s3 cp queenB.sh             s3://crownproject/tcga/scripts/
aws s3 cp droneB.sh             s3://crownproject/tcga/scripts/
aws s3 cp tcga_run_1_100.txt    s3://crownproject/tcga/scripts/

# REMOTE:
# Open logging screen and being launchign EC2 instances
# screen -L
# 
# bash queenB.sh tcga_run_1_100.txt

Completed 7.3 KiB/7.3 KiB with 1 file(s) remainingupload: ./hgr1_align_v2.tcga.sh to s3://crownproject/tcga/scripts/hgr1_align_v2.tcga.sh
Completed 3.8 KiB/3.8 KiB with 1 file(s) remainingupload: ./queenB.sh to s3://crownproject/tcga/scripts/queenB.sh
Completed 657 Bytes/657 Bytes with 1 file(s) remainingupload: ./droneB.sh to s3://crownproject/tcga/scripts/droneB.sh
Completed 6.2 KiB/6.2 KiB with 1 file(s) remainingupload: ./tcga_run_1_100.txt to s3://crownproject/tcga/scripts/tcga_run_1_100.txt


In [2]:
# Input of TCGA samples
cat tcga_run_1_100.txt
echo ' '
echo ' '
# Logs of instance run
cat logs/tcga_3_0-100.log

TCGA-BL-A13J-01B	TCGA-BLCA	459ce800-7bab-428d-aeff-323609e11707
TCGA-BL-A13J-11A	TCGA-BLCA	ad9d77b2-ddcb-4f03-9dd0-cd562fb59495
TCGA-BT-A20N-01A	TCGA-BLCA	f676134d-afff-4853-8e08-0751acd3b66d
TCGA-BT-A20N-11A	TCGA-BLCA	11e59164-6a72-4817-b621-cf5c54f13ded
TCGA-BT-A20Q-01A	TCGA-BLCA	2a492ebb-bb1c-469b-ae6d-b41bfd2b2919
TCGA-BT-A20Q-11A	TCGA-BLCA	996873db-fe18-4eb2-9f5d-a01a52f861b0
TCGA-BT-A20R-01A	TCGA-BLCA	295826fd-0ff5-4982-a80f-0e49c2478acc
TCGA-BT-A20R-11A	TCGA-BLCA	b6a854d4-50e0-4756-b2da-ee373bf73493
TCGA-BT-A20U-01A	TCGA-BLCA	b2b3a6d3-5209-45fb-90c4-813f8a629147
TCGA-BT-A20U-11A	TCGA-BLCA	c72ce85f-05f0-45ff-b4fc-da2b28831c29
TCGA-BT-A20W-01A	TCGA-BLCA	6dd9baf3-767c-48e1-8aa6-f8e98a1896f5
TCGA-BT-A20W-11A	TCGA-BLCA	89109446-653d-4e5e-9835-5afaa3b668f9
TCGA-BT-A2LA-01A	TCGA-BLCA	6ae3f7cd-0def-4a43-9a9a-cc04670e7507
TCGA-BT-A2LA-11A	TCGA-BLCA	4d62db96-ce8e-45a1-9854-df5efa4e40a6
TCGA-BT-A2LB-01A	TCGA-BLCA	15f4628b-5def-4ee0-9c1e-2ec21f7c7645
TCGA-BT-A2LB-11A	TCGA-BLC

## Samples 101-500

In [3]:
#LOCAL:
aws s3 cp tcga_run_101-500.txt    s3://crownproject/tcga/scripts/

#REMOTE: (same machine as before)
# aws s3 cp s3://crownproject/tcga/scripts/tcga_run_101-500.txt
# screen -L
# 
# bash queenB.sh tcga_run_101-500.txt

Completed 25.0 KiB/25.0 KiB with 1 file(s) remainingupload: ./tcga_run_101-500.txt to s3://crownproject/tcga/scripts/tcga_run_101-500.txt


In [None]:
# Note: TCGA-BH-A0DH-11A run failed. EC2 instance shutdown prematurely? No messages in log files.
# If you sort log files by size; this log is only 67 B while the next biggested is 10 kb.
# Use to catch this error type.

# Note: TCGA-CV-7183-01A run failed.
# error in GDC download 'Max tries exceeded'
# instance paused did not close (ran for ~20 hours)

# Restarted run manually to save partial download
# 
# screen -Ldmt sh ~/hgr1_align_v2_tcga.sh TCGA-CV-7183-01A TCGA-HNSC a529cd5f-234e-4dce-93c6-71ff279a0193

In [3]:
# Input of TCGA Sample for run 101-500
cat tcga_run_101-500.txt
echo ' '
echo ' '
# Logs of instance run
cat logs/tcga_3_101-500.log

TCGA-BH-A0DG-01A	TCGA-BRCA	865afd6b-84a7-4dde-aa23-0b925c0b9d50
TCGA-BH-A0DG-11A	TCGA-BRCA	bfdaf242-1e97-450d-9983-2cbb4e99305d
TCGA-BH-A0DH-01A	TCGA-BRCA	71a3c27c-0982-4da6-b260-cf16a4868a19
TCGA-BH-A0DH-11A	TCGA-BRCA	5a0374e5-cee9-4952-9df0-4ff125196478
TCGA-BH-A0DK-01A	TCGA-BRCA	a3df35ec-a8d2-44ad-8ba6-eaba504261e0
TCGA-BH-A0DK-11A	TCGA-BRCA	ae67044f-62c9-405f-bfc1-f0b8f1bc66d3
TCGA-BH-A0DL-01A	TCGA-BRCA	11d77ef2-b3f9-4af9-8490-71f9a8c599e0
TCGA-BH-A0DL-11A	TCGA-BRCA	bd8b100a-8391-4046-847f-c3fdd3830eeb
TCGA-BH-A0DO-01B	TCGA-BRCA	81ab65cc-34f6-4d3d-8e2e-0aae34e6de1c
TCGA-BH-A0DO-11A	TCGA-BRCA	0f821dff-79f1-4082-9a7e-31fa763f143c
TCGA-BH-A0DP-01A	TCGA-BRCA	7ff8a7a0-5235-4de0-bb9f-b811230b5bda
TCGA-BH-A0DP-11A	TCGA-BRCA	30f4e5d8-a13d-4ef2-88e0-a01e07c2e142
TCGA-BH-A0DQ-01A	TCGA-BRCA	a3198716-ccef-4746-a5ee-f928479ec5d8
TCGA-BH-A0DQ-11A	TCGA-BRCA	20d7550a-2cea-4544-a038-82ce552f49a0
TCGA-BH-A0DT-01A	TCGA-BRCA	61ad7408-dacd-4913-a479-c456e8b03191
TCGA-BH-A0DT-11A	TCGA-BRC

## Samples 501 - 1170

In [4]:
#LOCAL:
aws s3 cp tcga_run_501-1170.txt    s3://crownproject/tcga/scripts/

#REMOTE: (same machine as before)
# aws s3 cp s3://crownproject/tcga/scripts/tcga_run_501-1170.txt ./
# screen -L
# 
# bash queenB.sh tcga_run_501-1170.txt

Completed 41.9 KiB/41.9 KiB with 1 file(s) remainingupload: ./tcga_run_501-1170.txt to s3://crownproject/tcga/scripts/tcga_run_501-1170.txt


In [None]:
# Notes:
#
# TCGA-BH-A1FU-11A and TCGA-BH-A1FU-01A samples
# had small log file consistent with failed runs.
#

In [3]:
#LOCAL:
aws s3 cp s3://crownproject/tcga/logs/tcga_3_501-1170.log ./logs/

cat tcga_run_501-1170.txt
echo ''
echo ''
cat logs/tcga_3_501-1170.log

# then moved all input files to logs/ folder.

Completed 256.0 KiB/400.9 KiB with 1 file(s) remainingCompleted 400.9 KiB/400.9 KiB with 1 file(s) remainingdownload: s3://crownproject/tcga/logs/tcga_3_501-1170.log to logs/tcga_3_501-1170.log
TCGA-CJ-6030-01A	TCGA-KIRC	60fc1f14-5d4e-4be5-ba22-40023c40213b
TCGA-CJ-6030-11A	TCGA-KIRC	16337210-9556-4c0a-8ea7-d6e1c8be4ee3
TCGA-CJ-6033-01A	TCGA-KIRC	694dadde-1b8a-4e3c-a617-9c17d8672d50
TCGA-CJ-6033-11A	TCGA-KIRC	a103859b-83b7-49a4-b310-879010a05c91
TCGA-CW-5580-01A	TCGA-KIRC	71ad6138-fbed-4587-b3a9-88d7708bfbe6
TCGA-CW-5580-11A	TCGA-KIRC	51b8f53b-4659-4a90-b094-f57ec3cf12ea
TCGA-CW-5581-01A	TCGA-KIRC	1a35fd2c-5469-42bd-b841-ded6e29934e9
TCGA-CW-5581-11A	TCGA-KIRC	1582a522-2b69-4d40-b58c-b750f91990bd
TCGA-CW-5584-01A	TCGA-KIRC	6524632b-b5ed-475d-b49c-d93a7133f779
TCGA-CW-5584-11A	TCGA-KIRC	70dc4d36-870d-46fe-83e0-42969f46e18c
TCGA-CW-5585-01A	TCGA-KIRC	4aa43f1f-86b8-4fa5-b691-474fa7851815
TCGA-CW-5585-11A	TCGA-KIRC	ebf2b464-b752-48eb-9e80-9d64d0328b05
TCGA-CW-5587-01A	TCGA-K

## Failed Runs

Loaded all bam files to an EC2 instance.

ran `ls -alh tcga/*/*.bam > bamlist.alh.txt`

This yields the following list of files for which the output bam file is under a megabyte (i.e. failed runs which need to be re-performed).

```
-rw-rw-r--	1	ubuntu	ubuntu	8.8	Aug	23	21:34	TCGA-COAD/TCGA-AA-3697-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	351	Sep	2	10:11	TCGA-ESCA/TCGA-L5-A43C-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	351	Sep	2	9:46	TCGA-ESCA/TCGA-L5-A4OO-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	351	Sep	5	7:21	TCGA-STAD/TCGA-BR-8060-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	351	Sep	5	6:36	TCGA-STAD/TCGA-CG-5720-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	351	Sep	5	7:46	TCGA-STAD/TCGA-CG-5722-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	351	Sep	5	7:09	TCGA-STAD/TCGA-CG-5722-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	351	Sep	5	10:18	TCGA-STAD/TCGA-HU-A4GP-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	351	Sep	5	11:02	TCGA-STAD/TCGA-IN-AB1V-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	352	Sep	2	9:20	TCGA-ESCA/TCGA-L5-A43C-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	352	Sep	2	10:09	TCGA-ESCA/TCGA-L5-A4OG-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	352	Sep	2	10:48	TCGA-ESCA/TCGA-L5-A4OJ-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	352	Sep	5	5:13	TCGA-STAD/TCGA-BR-6453-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	352	Sep	5	5:49	TCGA-STAD/TCGA-BR-6453-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	352	Sep	5	5:20	TCGA-STAD/TCGA-BR-6454-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	352	Sep	5	5:54	TCGA-STAD/TCGA-BR-6454-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	352	Sep	5	6:42	TCGA-STAD/TCGA-BR-6457-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	352	Sep	5	5:43	TCGA-STAD/TCGA-BR-6802-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	352	Sep	5	7:03	TCGA-STAD/TCGA-CG-5721-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	352	Sep	5	9:41	TCGA-STAD/TCGA-HU-8238-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	352	Sep	5	9:35	TCGA-STAD/TCGA-HU-A4GC-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	352	Sep	5	10:47	TCGA-STAD/TCGA-HU-A4GP-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	352	Sep	5	10:46	TCGA-STAD/TCGA-HU-A4GY-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	352	Sep	5	10:39	TCGA-STAD/TCGA-HU-A4HB-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	352	Sep	5	10:38	TCGA-STAD/TCGA-HU-A4HB-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	6:31	TCGA-STAD/TCGA-BR-6457-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	6:00	TCGA-STAD/TCGA-BR-7704-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	4:39	TCGA-STAD/TCGA-BR-7715-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	5:32	TCGA-STAD/TCGA-BR-7715-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	7:09	TCGA-STAD/TCGA-BR-7716-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	5:33	TCGA-STAD/TCGA-BR-7716-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	5:19	TCGA-STAD/TCGA-BR-7717-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	6:29	TCGA-STAD/TCGA-BR-7851-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	6:53	TCGA-STAD/TCGA-BR-8060-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	6:47	TCGA-STAD/TCGA-CG-5720-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	5:56	TCGA-STAD/TCGA-CG-5721-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	7:26	TCGA-STAD/TCGA-CG-5734-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	8:41	TCGA-STAD/TCGA-CG-5734-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	8:31	TCGA-STAD/TCGA-FP-7735-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	8:32	TCGA-STAD/TCGA-FP-7735-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	9:14	TCGA-STAD/TCGA-FP-7829-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	9:15	TCGA-STAD/TCGA-HU-8238-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	10:26	TCGA-STAD/TCGA-HU-A4GH-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	353	Sep	5	11:23	TCGA-STAD/TCGA-IN-AB1X-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	354	Sep	5	4:44	TCGA-STAD/TCGA-BR-6802-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	354	Sep	5	6:43	TCGA-STAD/TCGA-BR-7717-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	354	Sep	5	12:20	TCGA-STAD/TCGA-IP-7968-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	354	Sep	5	11:22	TCGA-STAD/TCGA-IP-7968-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	362	Sep	4	9:49	TCGA-LIHC/TCGA-DD-A1EC-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	363	Sep	4	6:38	TCGA-LIHC/TCGA-BC-A10Q-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	363	Sep	4	8:45	TCGA-LIHC/TCGA-DD-A113-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	364	Sep	2	5:23	TCGA-BRCA/TCGA-E9-A1RI-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	364	Sep	3	23:58	TCGA-KIRC/TCGA-CZ-5456-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	364	Aug	31	16:14	TCGA-LUSC/TCGA-22-5471-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	364	Aug	31	16:14	TCGA-LUSC/TCGA-22-5482-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	364	Aug	31	16:15	TCGA-LUSC/TCGA-22-5491-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	364	Aug	31	16:15	TCGA-LUSC/TCGA-33-6737-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	368	Sep	4	10:41	TCGA-LIHC/TCGA-DD-A3A2-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	368	Aug	31	16:15	TCGA-LUSC/TCGA-33-4587-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	368	Aug	31	16:15	TCGA-LUSC/TCGA-56-7730-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	369	Sep	1	6:53	TCGA-BRCA/TCGA-A7-A13E-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	369	Sep	4	8:06	TCGA-LIHC/TCGA-BC-A216-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	369	Sep	4	8:02	TCGA-LIHC/TCGA-DD-A11C-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	369	Sep	4	9:36	TCGA-LIHC/TCGA-DD-A1EG-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	369	Sep	4	10:19	TCGA-LIHC/TCGA-DD-A3A1-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	369	Aug	31	16:15	TCGA-LUSC/TCGA-43-7657-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	369	Sep	5	2:29	TCGA-PRAD/TCGA-HC-8260-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	370	Sep	1	3:27	TCGA-BLCA/TCGA-BT-A20R-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	370	Sep	4	10:56	TCGA-LIHC/TCGA-DD-A3A6-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	370	Sep	4	13:03	TCGA-LUAD/TCGA-38-4626-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	370	Sep	4	13:00	TCGA-LUAD/TCGA-44-2657-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	370	Sep	4	13:10	TCGA-LUAD/TCGA-44-2661-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	370	Aug	31	16:15	TCGA-LUSC/TCGA-56-8083-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	370	Aug	31	16:15	TCGA-LUSC/TCGA-58-8386-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	370	Sep	4	19:23	TCGA-PAAD/TCGA-H6-A45N-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	370	Sep	5	12:12	TCGA-THCA/TCGA-BJ-A3PR-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	370	Sep	5	18:14	TCGA-THCA/TCGA-KS-A41I-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	371	Sep	1	3:59	TCGA-BLCA/TCGA-BL-A13J-01B.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	371	Aug	23	21:34	TCGA-COAD/TCGA-A6-2684-01C.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	371	Sep	4	15:06	TCGA-LUAD/TCGA-44-2665-01B.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	371	Sep	4	14:38	TCGA-LUAD/TCGA-44-2668-01B.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	371	Sep	4	15:16	TCGA-LUAD/TCGA-44-5645-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	371	Sep	4	16:58	TCGA-LUAD/TCGA-55-6979-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	371	Sep	4	20:57	TCGA-PRAD/TCGA-CH-5767-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	371	Sep	4	22:39	TCGA-PRAD/TCGA-EJ-7786-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	371	Sep	5	15:17	TCGA-THCA/TCGA-EL-A3ZL-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	371	Sep	5	18:28	TCGA-THCA/TCGA-KS-A41L-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	372	Sep	4	15:27	TCGA-LUAD/TCGA-44-6147-01B.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	372	Sep	5	14:51	TCGA-THCA/TCGA-EL-A3ZG-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	372	Sep	5	16:05	TCGA-THCA/TCGA-EL-A3ZQ-01A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	373	Aug	31	16:15	TCGA-LUSC/TCGA-51-4079-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	373	Aug	31	16:15	TCGA-LUSC/TCGA-51-4081-11A.hgr1.bam
-rw-rw-r--	1	ubuntu	ubuntu	374	Sep	1	7:41	TCGA-BRCA/TCGA-A7-A0DC-01A.hgr1.bam
```

Going into the log files of an example file: `TCGA-HU-A4GP-01A.hgr1.bam`

```
Successfully downloaded: 1
[bam_sort_core] merging from 129 files and 3 in-memory blocks...
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 261017628 reads
130508814 reads; of these:
  130508814 (100.00%) were paired; of these:
    120390283 (92.25%) aligned concordantly 0 times
    10040766 (7.69%) aligned concordantly exactly 1 time
    77765 (0.06%) aligned concordantly >1 times
    ----
    120390283 pairs aligned concordantly 0 times; of these:
      3135524 (2.60%) aligned discordantly 1 time
    ----
    117254759 pairs aligned 0 times concordantly or discordantly; of these:
      234509518 mates make up the pairs; of these:
        233483455 (99.56%) aligned 0 times
        834118 (0.36%) aligned exactly 1 time
        191945 (0.08%) aligned >1 times
10.55% overall alignment rate

/home/ubuntu/hgr1_align_v2.tcga.sh: line 220:  2109 Broken pipe             ~/bin/samtools view align.F4.bam
      2110 Killed                  | grep -Ff read.names.tmp - > align.F4.tmp.sam
      
/home/ubuntu/hgr1_align_v2.tcga.sh: line 220:  2111 Broken pipe             ~/bin/samtools view align.F4.bam
      2112 Killed                  | grep -Ff read.names.tmp - > align.F4.tmp.sam
```
Example File: `TCGA-PRAD/TCGA-EJ-7786-11A.hgr1.bam`

```
Successfully downloaded: 1
[bam_sort_core] merging from 69 files and 3 in-memory blocks...
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 162052732 reads
81026366 reads; of these:
  81026366 (100.00%) were paired; of these:
    78798782 (97.25%) aligned concordantly 0 times
    2226089 (2.75%) aligned concordantly exactly 1 time
    1495 (0.00%) aligned concordantly >1 times
    ----
    78798782 pairs aligned concordantly 0 times; of these:
      1071687 (1.36%) aligned discordantly 1 time
    ----
    77727095 pairs aligned 0 times concordantly or discordantly; of these:
      155454190 mates make up the pairs; of these:
        155130972 (99.79%) aligned 0 times
        317186 (0.20%) aligned exactly 1 time
        6032 (0.00%) aligned >1 times
4.27% overall alignment rate

/home/ubuntu/hgr1_align_v2.tcga.sh: line 220:  1900 Broken pipe             ~/bin/samtools view align.F4.bam
      1901 Killed                  | grep -Ff read.names.tmp - > align.F4.tmp.sam
      
/home/ubuntu/hgr1_align_v2.tcga.sh: line 220:  1902 Broken pipe             ~/bin/samtools view align.F4.bam
      1903 Killed                  | grep -Ff read.names.tmp - > align.F4.tmp.sam
```

My intuition is that the `Broken Pipe` error is caused by memory buffer over-running. These files appear to be larger then most files; processing so many reads it likely to be the source of error. Bumping up to 2xlarge instance for these ~100 samples may be a good solution. Will run this is a pilot.