# 100 Genomes Alignment to hgr1 -- version 1
```
pi:ababaian
files: ~/Crown/data/1kg_hgr1/
start: 2017 03 06
complete : 2017 03 10
```
## Introduction

The CEPH-1436 trio provides good, high quality sequencing to measure intra-individual variation. To measure global inter-individual variation in humans, 104 genomes from the 1000 genomes project (4 people x 26 populations) will be processed through the hgr1 pipeline.

Of highest interest are the so-called 'conserved variants' in RNA45S, like 28S.r.59A>G which seem to be maintained at ~50% allele frequency.

### Hypothesis
- rDNA variants will be distirbuted into two distinct populations; conserved variants which are maintained at a certain frequency in all humans; variable variants which change frequency, reaching 0.0 or 1.0 in differenent individuals.
- The conserved variants make up distinct 'ribotypes' of rRNA


## Objective

- Align the PCR-Free, deep WGS data from CEPH-1436 trio to `hgr1`
- Align 100 low-coverage genomes from all 1000 genomes populations to `hgr1`
- Measure the variation/change in allele frequency of rDNA and it's change amongst populations.

## Materials and Methods

~ Data-sets

~ Scripts

~ CEPH-1436 Alignments

~ 100 Genomes Alignments

### Data-sets

Data-sets are cataloged in `~/Crown/data/1kg_hgr1/1kg_hgr1_datasets.xlsx`. Freeze of data-sets `170306_dataFreezetsv.tar.gz` made at Mon Mar  6 11:51:16 PST 2017.

#### 100 Genomes data

- 104 genomes were choosen from the 1000 genomes project data. 4 x 26 populations.
- From the 1kg [sequence.index](https://s3.amazonaws.com/1000genomes/sequence.index), samples were filtered for:
```
INSTRUMENT_PLATFORM: ILLUMINA
INSTRUMENT_MODEL: Illumina HiSeq 2000
LIBRARY_LAYOUT: PAIRED
WITHDRAWN: 0
READ_COUNT: >20,000,000
ANALYSIS_GROUP: low coverage
```
- From this filtered list four individuals from each population were randomly choosen for inclusion.
- The fastq files were moved relative to what is written in that path. New ftp paths for each file were generated using the [current.tree](ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/current.tree) file on the ftp.1000genomes.ebi.ac.uk server; all data was manually parsed into main sheet of 1kg_hgr1_datasets.xlsx.

convert.sh
```
#!/bin/bash
# Lookup current file path
# using filenames
# (1kg_files is from sequence.index)

LINES=$(wc -l 1kg_files | cut -f 1 -d ' ' )

echo $LINES

for LINE in $(seq 1 $LINES)
do
	RETURN='MT'

	SEARCH=$(sed -n "$LINE"p 1kg_files)
	SEARCH2=$(basename $SEARCH)

	RETURN=$(grep "$SEARCH2" current.tree)

	echo $RETURN >> 1kg_files.newpath
done
```

### Scripts

`1kg_align_v1.sh` - Core alignment script for hgr1 alignment and VCF

In [None]:
#!/bin/bash
# 1kg_align_v1.sh
# rDNA alignment pipeline
# 170305 build
# AMI: crown-170220 - ami-66129306
# EC2: c4.2xlarge (8cpu / 15 gb)
# EC2: c4.xlarge  (4cpu / 8  gb)
# Storage: 300 Gb
#

# Control Panel -------------------------------
# CPU
	THREADS='3'

# Sequencing Data
	LIBRARY=$1 # Library/ File name
	FASTQ1=$5
	FASTQ2=$6

    # File-names
    FQ1=$(basename $FASTQ1)
    FQ2=$(basename $FASTQ2)

# Read Group Data
	RGSM=$2   # Sample. Patient Identifer
	RGID=$3 # Read Group ID. Accession Number
	RGLB=$LIBRARY # Library Name. Accession Number
	RGPL='ILLUMINA'  # Sequencing Platform.
	RGPO=$4 # Patient Population
	# Extract Sequencing Run Info
	#  RGPU=$(gzip -dc $FQ1 | head -n1 - | cut -f1 -d':' | cut -f2 -d' ')

# Initialize wordir ---------------------------

# Make working directory
  mkdir -p align; cd align

# Copy hgrX genome and create bowtie2 index
  aws s3 cp s3://crownproject/resources/hgr1.fa ./
  samtools faidx hgr1.fa
  
  bowtie2-build hgr1.fa hgr1
  
# Download Genome Sequencing Data
  wget $FASTQ1
  wget $FASTQ2

    # Extract Sequencing Run Info
    RGPU=$(gzip -dc $FQ1| head -n1 - | cut -f1 -d':' | cut -f2 -d' ')

# Primary Alignment -------------------------

# Bowtie2: align to genome

bowtie2 --very-sensitive-local -p $THREADS --rg-id $RGID --rg LB:$RGLB --rg SM:$RGSM \
--rg PL:$RGPL --rg PU:$RGPU -x hgr1 -1 $FQ1 -2 $FQ2 | samtools view -bS - > aligned_unsorted.bam

rm $FQ1 $FQ2 # Remove fastq files to save space

# Calcualte library flagstats
  samtools flagstat aligned_unsorted.bam > aligned_unsorted.flagstat
  
# Read Subset ------------------------------
# Extract mapped reads, and their unmapped pairs

  # Extract Header
  samtools view -H aligned_unsorted.bam > align.header.tmp

  # Unmapped reads with mapped pairs
  # Extract Mapped Reads
  # and their unmapped pairs
  samtools view -b -F 4 aligned_unsorted.bam > align.F4.bam #mapped
  samtools view -b -f 4 -F 8 aligned_unsorted.bam > align.f4F8.bam #unmapped pairs
  
  # Extract just the 45S unit
  #aws s3 cp s3://crownproject/resources/rDNA_45s.bed ./
  #samtools view -b -L rDNA_45s.bed align.F4.bam > align.F4.45s.bam
  
  # What are the mapped readnames
  samtools view align.F4.bam | cut -f1 - > read.names.tmp
  
  # Extract mapped reads
  samtools view align.F4.bam | grep -Ff read.names.tmp - > align.F4.tmp.sam

  
  # Extract cases of read pairs mapped on edge of region of interest
  # -------|======= R O I ======| ----------
  # read:                  ====---====
  samtools view align.F4.bam | grep -Ff read.names.tmp - > align.F4.tmp.sam

  # Complete mapped reads list
  #cut -f1 align.F4.tmp.sam > read.names.45s.long.tmp

  # Extract unmapped reads with a mapped pair
  samtools view align.f4F8.bam | grep -Ff read.names.tmp - > align.f4F8.tmp.sam

  # Re-compile bam file
  cat align.header.tmp align.F4.tmp.sam align.f4F8.tmp.sam | samtools view -bS - > align.hgr1.tmp.bam
    samtools sort align.hgr1.tmp.bam align.hgr1
    samtools index align.hgr1.bam
    samtools flagstat align.hgr1.bam > align.hgr1.flagstat
    
  # Clean up 
  rm *tmp* align.F4.bam align.f4F8.bam

# Rename the total Bam Files
  mv aligned_unsorted.bam $LIBRARY.bam
  mv aligned_unsorted.bam.bai $LIBRARY.bam.bai
  mv aligned_unsorted.flagstat $LIBRARY.flagstat

# Rename the hgr-aligned Bam files
  mv align.hgr1.bam $LIBRARY.hgr1.bam
  mv align.hgr1.bam.bai $LIBRARY.hgr1.bam.bai
  mv align.hgr1.flagstat $LIBRARY.hgr1.flagstat
  
# Primary VCF ----------------------------

# GATK variant calling
  aws s3 cp s3://crownproject/resources/hgr1.gatk.fa ./
  aws s3 cp s3://crownproject/resources/hgr1.gatk.fa.fai ./
  aws s3 cp s3://crownproject/resources/hgr1.gatk.dict ./
  
  java -Xmx12G -jar /home/ubuntu/software/GenomeAnalysisTK.jar \
  -R hgr1.gatk.fa -T HaplotypeCaller \
  -ploidy 2 --max_alternate_alleles 6 \
  -I $LIBRARY.hgr1.bam -o $LIBRARY.hgr1.vcf

   # Memory issues, restrict to 45S region only hgr2
     # -ploidy 100, 50, 20 failed... do 2 and analyze 45S further
     
# Upload final output files to S3
 
# Alignments (Full)
 #aws s3 cp $LIBRARY.bam s3://crownproject/1kg_hgr1/
 #aws s3 cp $LIBRARY.bam.bai s3://crownproject/1kg_hgr1/
 aws s3 cp $LIBRARY.flagstat s3://crownproject/1kg_hgr1/

# Alignments (Aligned)
  aws s3 cp $LIBRARY.hgr1.bam s3://crownproject/1kg_hgr1/
  aws s3 cp $LIBRARY.hgr1.bam.bai s3://crownproject/1kg_hgr1/
  aws s3 cp $LIBRARY.hgr1.flagstat s3://crownproject/1kg_hgr1/

# VCF
 aws s3 cp $LIBRARY.hgr1.vcf s3://crownproject/1kg_hgr1/
 aws s3 cp $LIBRARY.hgr1.vcf.idx s3://crownproject/1kg_hgr1/
 
# Shutdown and Terminate instance
EC2ID=$(ec2metadata --instance-id)
aws ec2 terminate-instances --instance-ids $EC2ID

# Script complete

`queenB.sh` - ec2 auto-launch script for high-throughput

In [None]:
#!/bin/bash
# queenB.sh
# 20170306a build
# EC2 Launch / Control Script
#

# Control Panel =========================
# EC2 Run Script - script for droneB to execute
TASK="s3://crownproject/scripts/1kg_align_v1.sh"

# Parameter file, each line is given to a droneB to execute
# gather.sh by
PARAMETERS="hgr1_test.txt"
# PARAMETERS=$1 # changed in build 20170306b

# EC2 Set-up
instanceTYPE='c4.xlarge'
imageID='ami-66129306' #AMI

devNAME='/dev/sda1' # /dev/sda1 for Crown-AMI
volSIZE='300' # in Gb

# Number of instances to launch
#COUNT=2 # predetermined number
COUNT=$(wc -l $PARAMETERS | cut -f 1 -d' ' ) # for each input argument

# Security
keyNAME='CrownKey'
keyPATH="/home/artem/.ssh/CrownKey.pem"
secGROUP='crown-group'

# =======================================

for ITER in $(seq 1 $COUNT)
do

  # Extract Parameters/Arguments ----------

  ARGS=$(sed -n "$ITER"p $PARAMETERS | sed 's/\t/ /g' - )

  echo "Launch instance # $ITER"
  echo "Instance Type: $instanceTYPE"
  echo "AMI Image: $imageID"
  echo "Run Script: $TASK"
  echo "Parameters: $ARGS"

  # Launch an instance --------------------
  # NOTE: each iteration of the for loop launches one instance
  # therefore each loop launches only one instance
  aws ec2 run-instances --image-id $imageID --count 1 \
   --instance-type $instanceTYPE --key-name $keyNAME \
   --block-device-mappings DeviceName=$devNAME,Ebs={VolumeSize=$volSIZE} \
   --security-groups $secGROUP > launch.tmp

  # Another alternative is to use --user-data droneB.sh 
  # which will run at instance boot-up
  # passing arguments to it may be challenging

  # Retrieve instance ID
  instanceID=$(cat launch.tmp | \
    egrep -o -e 'InstanceId[":/A-Za-z0-9_ \\-]*' - |\
    cut -f2 -d' ' - | xargs)

  echo "Instance ID: $instanceID"


  # Add a few minute wait here to allow for Public DNS to be assigned
  # otherwise ssh doesn't work
  sleep 180s

  # Retrieve public DNS
  aws ec2 describe-instances --instance-ids $instanceID > launch2.tmp

  pubDNS=$(cat launch2.tmp | \
    egrep -o -m 1 -e 'PublicDnsName[.":/A-Za-z0-9_ \\-]*' - |\
    cut -f2 -d' ' - | xargs)

  echo "Public DNS: $pubDNS"

  # Access the instance -------------------

  LOGIN="ubuntu@$pubDNS" 

  ssh -i $keyPATH \
    -o StrictHostKeyChecking=no \
    $LOGIN 'bash -s' < droneB.sh $TASK $(echo $ARGS)

  # Cleanup
  rm *.tmp

  echo ''
  echo ''

done

# end of script

### CEPH-1436 Trio Alignment

NA12878 and NA12891

```
## ec2-52-34-12-139.us-west-2.compute.amazonaws.com

aws s3 cp s3://crownproject/scripts/1kg_align_v1.sh ./
screen
# modify to not-redownload data

sh 1kg_align_v1.sh NA12878_pp NA12878 ERR194147 CEU \
 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR194/ERR194147/ERR194147_1.fastq.gz \
 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR194/ERR194147/ERR194147_2.fastq.gz


## ec2-52-27-70-31.us-west-2.compute.amazonaws.com

sh 1kg_align_v1.sh NA12891_pp NA12891 ERR194160 CEU \
 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR194/ERR194160/ERR194160_1.fastq.gz \
 ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR194/ERR194160/ERR194160_1.fastq.gz
 

# There was a bug in the script
# first line of gatk was commented out so vcf wasn't genereted
# manually ran the commands below. Fixed in version above

  java -Xmx12G -jar /home/ubuntu/software/GenomeAnalysisTK.jar \
  -R hgr1.gatk.fa -T HaplotypeCaller \
  -ploidy 2 --max_alternate_alleles 6 \
  -I $LIBRARY.hgr1.bam -o $LIBRARY.hgr1.vcf
 
```

NA12892 was ran via queenB.sh launch script
```
artem@glitch[tmp] sh queenB.sh                                               [12:03PM]
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: NA12892_pp NA12892 ERR194161 CEU ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR194/ERR194161/ERR194161_1.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR194/ERR194161/ERR194161_2.fastq.gz
Instance ID: i-05369ae4edd7bde5f
Public DNS: ec2-35-165-87-54.us-west-2.compute.amazonaws.com
Warning: Permanently added 'ec2-35-165-87-54.us-west-2.compute.amazonaws.com,35.165.87.54' (ECDSA) to the list of known hosts.
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh
```

### 100 Genomes Runs

queenB.sh run for low coverage genomes

hgr1_batch1.txt

```
HG02283	HG02283	SRR401071	ACB	ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02283/sequence_read/SRR401071_1.filt.fastq.gz	ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02283/sequence_read/SRR401071_2.filt.fastq.gz
HG02343	HG02343	SRR424266	ACB	ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02343/sequence_read/SRR424266_1.filt.fastq.gz	ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02343/sequence_read/SRR424266_2.filt.fastq.gz
HG02508	HG02508	SRR741371	ACB	ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02508/sequence_read/SRR741371_1.filt.fastq.gz	ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02508/sequence_read/SRR741371_2.filt.fastq.gz
HG02479	HG02479	SRR741422	ACB	ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02479/sequence_read/SRR741422_1.filt.fastq.gz	ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02479/sequence_read/SRR741422_2.filt.fastq.gz
NA20357	NA20357	ERR229818	ASW	ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA20357/sequence_read/ERR229818_1.filt.fastq.gz	ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA20357/sequence_read/ERR229818_2.filt.fastq.gz
```

In [3]:
# queenB run for 1kg_batch1.txt
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh

1kg_align_v0.sh  1kg_runs_1.txt  hgr1_batch1.txt  queenB.sh
1kg_align_v1.sh  droneB.sh	 hgr1_test.txt

Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG02283 HG02283 SRR401071 ACB ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02283/sequence_read/SRR401071_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02283/sequence_read/SRR401071_2.filt.fastq.gz
Instance ID: i-0f480947940e5e834
Public DNS: ec2-52-41-31-1.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh


Launch instance # 2
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG02343 HG02343 SRR424266 ACB ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02343/sequence_read/SRR424266_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02343/sequence_read/SRR42

In [4]:
# queenB run for 1kg_batch2.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch2.txt

1kg_align_v0.sh  1kg_runs_1.txt  hgr1_batch1.txt  hgr1_test.txt
1kg_align_v1.sh  droneB.sh	 hgr1_batch2.txt  queenB.sh

Mon Mar  6 17:32:01 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: NA20359 NA20359 ERR229819 ASW ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA20359/sequence_read/ERR229819_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA20359/sequence_read/ERR229819_2.filt.fastq.gz  
Instance ID: i-0bf4eb78d93a5e9b7
Public DNS: ec2-52-43-95-48.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh


Launch instance # 2
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: NA20362 NA20362 ERR257982 ASW ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA20362/sequence_read/ERR257982_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.u

sent request to amazon for upto 25 instances at once. Will re-run launch 5-7 as batch2.5 once these complete. Then 8-10 as batch2.9.

In [5]:
# queenB run for 1kg_batch2.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch2.5.txt

1kg_align_v0.sh  droneB.sh	    hgr1_batch2.9.txt  queenB.sh
1kg_align_v1.sh  hgr1_batch1.txt    hgr1_batch2.txt
1kg_runs_1.txt	 hgr1_batch2.5.txt  hgr1_test.txt

Mon Mar  6 19:34:40 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG03604 HG03604 ERR251410 BEB ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG03604/sequence_read/ERR251410_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG03604/sequence_read/ERR251410_2.filt.fastq.gz  
Instance ID: i-0b964d0215971f537
Public DNS: ec2-35-163-93-161.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh


Launch instance # 2
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG04162 HG04162 ERR251485 BEB ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG04162/sequence_read/ERR251485_1

In [6]:
# queenB run for 1kg_batch2.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch2.9.txt

1kg_align_v0.sh  droneB.sh	    hgr1_batch2.9.txt  queenB.sh
1kg_align_v1.sh  hgr1_batch1.txt    hgr1_batch2.txt
1kg_runs_1.txt	 hgr1_batch2.5.txt  hgr1_test.txt

Mon Mar  6 21:41:50 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG02190 HG02190 ERR050151 CDX ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02190/sequence_read/ERR050151_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02190/sequence_read/ERR050151_2.filt.fastq.gz  
Instance ID: i-0d8f9267f2b4d9f96
Public DNS: ec2-52-40-43-94.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh


Launch instance # 2
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG00851 HG00851 ERR251128 CDX ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00851/sequence_read/ERR251128_1.f

In [7]:
# queenB run for 1kg_batch3.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch3.txt

1kg_align_v0.sh  droneB.sh	    hgr1_batch2.9.txt  hgr1_test.txt
1kg_align_v1.sh  hgr1_batch1.txt    hgr1_batch2.txt    queenB.sh
1kg_runs_1.txt	 hgr1_batch2.5.txt  hgr1_batch3.txt

Tue Mar  7 10:04:28 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG00982 HG00982 ERR251138 CDX ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00982/sequence_read/ERR251138_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00982/sequence_read/ERR251138_2.filt.fastq.gz  
Instance ID: i-0ad3c29a055275d49
Public DNS: ec2-52-43-178-115.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh


Launch instance # 2
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: NA06985 NA06985 ERR050082 CEU ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA06985/sequen

In [8]:
# queenB run for 1kg_batch3.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch4a.txt

1kg_align_v0.sh  hgr1_batch1.txt    hgr1_batch3.txt   queenB.sh
1kg_align_v1.sh  hgr1_batch2.5.txt  hgr1_batch4a.txt
1kg_runs_1.txt	 hgr1_batch2.9.txt  hgr1_batch4b.txt
droneB.sh	 hgr1_batch2.txt    hgr1_test.txt

Tue Mar  7 11:51:10 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: NA18622 NA18622 ERR251602 CHB ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18622/sequence_read/ERR251602_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18622/sequence_read/ERR251602_2.filt.fastq.gz  
Instance ID: i-09bca830fb3b6bb12
Public DNS: ec2-35-165-95-150.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh


Launch instance # 2
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: NA18632 NA18632 ERR251607 CHB ftp://ftp.1000genomes.ebi.ac.uk/v

In [9]:
# queenB run for 1kg_batch3.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch4b.txt

1kg_align_v0.sh  hgr1_batch1.txt    hgr1_batch3.txt   queenB.sh
1kg_align_v1.sh  hgr1_batch2.5.txt  hgr1_batch4a.txt
1kg_runs_1.txt	 hgr1_batch2.9.txt  hgr1_batch4b.txt
droneB.sh	 hgr1_batch2.txt    hgr1_test.txt

Tue Mar  7 13:11:18 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: NA18647 NA18647 SRR741398 CHB ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18647/sequence_read/SRR741398_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18647/sequence_read/SRR741398_2.filt.fastq.gz  
Instance ID: i-04b7767e17a86449d
Public DNS: ec2-52-41-46-32.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh


Launch instance # 2
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG00478 HG00478 ERR251108 CHS ftp://ftp.1000genomes.ebi.ac.uk/vol

In [10]:
# queenB run for 1kg_batch3.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch5a.txt

1kg_align_v0.sh  hgr1_batch1.txt    hgr1_batch3.txt   hgr1_batch5b.txt
1kg_align_v1.sh  hgr1_batch2.5.txt  hgr1_batch4a.txt  hgr1_test.txt
1kg_runs_1.txt	 hgr1_batch2.9.txt  hgr1_batch4b.txt  queenB.sh
droneB.sh	 hgr1_batch2.txt    hgr1_batch5a.txt

Tue Mar  7 14:51:01 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG00536 HG00536 ERR251115 CHS ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00536/sequence_read/ERR251115_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00536/sequence_read/ERR251115_2.filt.fastq.gz
Instance ID: i-03c0095dfef7a415a
Public DNS: ec2-52-26-246-152.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh


Launch instance # 2
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG00557 HG00557 ERR251120 CHS

In [11]:
# queenB run for 1kg_batch3.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch5b.txt

1kg_align_v0.sh  hgr1_batch1.txt    hgr1_batch3.txt   hgr1_batch5b.txt
1kg_align_v1.sh  hgr1_batch2.5.txt  hgr1_batch4a.txt  hgr1_test.txt
1kg_runs_1.txt	 hgr1_batch2.9.txt  hgr1_batch4b.txt  queenB.sh
droneB.sh	 hgr1_batch2.txt    hgr1_batch5a.txt

Tue Mar  7 16:43:19 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG01125 HG01125 ERR022454 CLM ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG01125/sequence_read/ERR022454_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG01125/sequence_read/ERR022454_2.filt.fastq.gz
Instance ID: i-037c54be6c31e4537
Public DNS: ec2-52-34-189-20.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh


Launch instance # 2
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG01464 HG01464 SRR407617 CLM 

In [12]:
# queenB run for 1kg_batch3.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch6.txt

1kg_align_v0.sh  hgr1_batch2.5.txt  hgr1_batch4b.txt  hgr1_test.txt
1kg_align_v1.sh  hgr1_batch2.9.txt  hgr1_batch5a.txt  queenB.sh
1kg_runs_1.txt	 hgr1_batch2.txt    hgr1_batch5b.txt
droneB.sh	 hgr1_batch3.txt    hgr1_batch6a.txt
hgr1_batch1.txt  hgr1_batch4a.txt   hgr1_batch6.txt

Tue Mar  7 20:25:23 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG01348 HG01348 SRR702042 CLM ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG01348/sequence_read/SRR702042_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG01348/sequence_read/SRR702042_2.filt.fastq.gz 
Instance ID: i-02daaaff8b3ac278e
Public DNS: ec2-52-40-14-97.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh


Launch instance # 2
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Paramet

In [1]:
# queenB run for 1kg_batch3.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch7.txt

1kg_align_v0.sh  hgr1_batch2.5.txt  hgr1_batch4b.txt  hgr1_batch7.txt
1kg_align_v1.sh  hgr1_batch2.9.txt  hgr1_batch5a.txt  hgr1_test.txt
1kg_runs_1.txt	 hgr1_batch2.txt    hgr1_batch5b.txt  queenB.sh
droneB.sh	 hgr1_batch3.txt    hgr1_batch6a.txt
hgr1_batch1.txt  hgr1_batch4a.txt   hgr1_batch6.txt

Tue Mar  7 22:38:49 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG03268 HG03268 ERR181326 ESN ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG03268/sequence_read/ERR181326_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG03268/sequence_read/ERR181326_2.filt.fastq.gz
Instance ID: i-0fee557c25eda6a1c
Public DNS: ec2-52-34-47-240.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh


Launch instance # 2
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_ali

In [1]:
# queenB run for 1kg_batch3.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch8.txt

1kg_align_v0.sh  hgr1_batch2.5.txt  hgr1_batch4b.txt  hgr1_batch7.txt
1kg_align_v1.sh  hgr1_batch2.9.txt  hgr1_batch5a.txt  hgr1_batch8.txt
1kg_runs_1.txt	 hgr1_batch2.txt    hgr1_batch5b.txt  hgr1_test.txt
droneB.sh	 hgr1_batch3.txt    hgr1_batch6a.txt  queenB.sh
hgr1_batch1.txt  hgr1_batch4a.txt   hgr1_batch6.txt

Wed Mar  8 10:44:48 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG00234 HG00234 SRR768271 GBR ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00234/sequence_read/SRR768271_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00234/sequence_read/SRR768271_2.filt.fastq.gz  
Instance ID: i-063177e07182ec9b5
Public DNS: ec2-52-43-76-57.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh


Launch instance # 2
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproje

In [2]:
# queenB run for 1kg_batch3.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch9.txt

1kg_align_v0.sh  hgr1_batch2.5.txt  hgr1_batch4b.txt  hgr1_batch7.txt
1kg_align_v1.sh  hgr1_batch2.9.txt  hgr1_batch5a.txt  hgr1_batch8.txt
1kg_runs_1.txt	 hgr1_batch2.txt    hgr1_batch5b.txt  hgr1_batch9.txt
droneB.sh	 hgr1_batch3.txt    hgr1_batch6a.txt  hgr1_test.txt
hgr1_batch1.txt  hgr1_batch4a.txt   hgr1_batch6.txt   queenB.sh

Wed Mar  8 14:25:02 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: NA21113 NA21113 SRR768151 GIH ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA21113/sequence_read/SRR768151_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA21113/sequence_read/SRR768151_2.filt.fastq.gz  
Instance ID: i-0ff38b9f754001e4c
Public DNS: ec2-52-34-82-50.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh


Launch instance # 2
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Scrip

In [3]:
# queenB run for 1kg_batch3.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch10.txt

1kg_align_v0.sh   hgr1_batch2.5.txt  hgr1_batch5a.txt  hgr1_batch9.txt
1kg_align_v1.sh   hgr1_batch2.9.txt  hgr1_batch5b.txt  hgr1_test.txt
1kg_runs_1.txt	  hgr1_batch2.txt    hgr1_batch6a.txt  queenB.sh
droneB.sh	  hgr1_batch3.txt    hgr1_batch6.txt
hgr1_batch10.txt  hgr1_batch4a.txt   hgr1_batch7.txt
hgr1_batch1.txt   hgr1_batch4b.txt   hgr1_batch8.txt

Wed Mar  8 16:47:11 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG02888 HG02888 ERR183477 GWD ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02888/sequence_read/ERR183477_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02888/sequence_read/ERR183477_2.filt.fastq.gz
Instance ID: i-0f71f30e771b4ea04
Public DNS: ec2-52-88-238-154.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh


Launch instance # 2
Instance Type: c4.xlarge
AMI Image: 

In [4]:
# queenB run for 1kg_batch3.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch11.txt

1kg_align_v0.sh   hgr1_batch1.txt    hgr1_batch4b.txt  hgr1_batch8.txt
1kg_align_v1.sh   hgr1_batch2.5.txt  hgr1_batch5a.txt  hgr1_batch9.txt
1kg_runs_1.txt	  hgr1_batch2.9.txt  hgr1_batch5b.txt  hgr1_test.txt
droneB.sh	  hgr1_batch2.txt    hgr1_batch6a.txt  queenB.sh
hgr1_batch10.txt  hgr1_batch3.txt    hgr1_batch6.txt
hgr1_batch11.txt  hgr1_batch4a.txt   hgr1_batch7.txt

Wed Mar  8 19:45:39 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG01630 HG01630 SRR708349 IBS ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG01630/sequence_read/SRR708349_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG01630/sequence_read/SRR708349_2.filt.fastq.gz
Instance ID: i-09ff953493da93b4f
Public DNS: ec2-35-165-86-79.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh


Launch instance # 2
Instance Type: c4.xl

In [5]:
# queenB run for 1kg_batch3.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch12.txt

1kg_align_v0.sh   hgr1_batch12.txt   hgr1_batch4a.txt  hgr1_batch7.txt
1kg_align_v1.sh   hgr1_batch1.txt    hgr1_batch4b.txt  hgr1_batch8.txt
1kg_runs_1.txt	  hgr1_batch2.5.txt  hgr1_batch5a.txt  hgr1_batch9.txt
droneB.sh	  hgr1_batch2.9.txt  hgr1_batch5b.txt  hgr1_test.txt
hgr1_batch10.txt  hgr1_batch2.txt    hgr1_batch6a.txt  queenB.sh
hgr1_batch11.txt  hgr1_batch3.txt    hgr1_batch6.txt

Wed Mar  8 21:25:18 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG03790 HG03790 ERR181395 ITU ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG03790/sequence_read/ERR181395_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG03790/sequence_read/ERR181395_2.filt.fastq.gz
Instance ID: i-05c0bb17c1bf93a5e
Public DNS: ec2-52-40-80-116.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh


Launch instance # 2
In

In [6]:
# queenB run for 1kg_batch3.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch13.txt

1kg_align_v0.sh   hgr1_batch12.txt   hgr1_batch3.txt   hgr1_batch6.txt
1kg_align_v1.sh   hgr1_batch13.txt   hgr1_batch4a.txt  hgr1_batch7.txt
1kg_runs_1.txt	  hgr1_batch1.txt    hgr1_batch4b.txt  hgr1_batch8.txt
droneB.sh	  hgr1_batch2.5.txt  hgr1_batch5a.txt  hgr1_batch9.txt
hgr1_batch10.txt  hgr1_batch2.9.txt  hgr1_batch5b.txt  hgr1_test.txt
hgr1_batch11.txt  hgr1_batch2.txt    hgr1_batch6a.txt  queenB.sh

Wed Mar  8 22:54:49 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG02046 HG02046 ERR070052 KHV ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02046/sequence_read/ERR070052_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02046/sequence_read/ERR070052_2.filt.fastq.gz
Instance ID: i-0549ecd57dbd9987c
Public DNS: ec2-52-37-224-191.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_align_v1.sh


Laun

In [7]:
# queenB run for 1kg_batch3.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch14.txt

1kg_align_v0.sh   hgr1_batch13.txt   hgr1_batch4a.txt  hgr1_batch8.txt
1kg_align_v1.sh   hgr1_batch14.txt   hgr1_batch4b.txt  hgr1_batch9.txt
1kg_runs_1.txt	  hgr1_batch1.txt    hgr1_batch5a.txt  hgr1_test.txt
droneB.sh	  hgr1_batch2.5.txt  hgr1_batch5b.txt  queenB.sh
hgr1_batch10.txt  hgr1_batch2.9.txt  hgr1_batch6a.txt
hgr1_batch11.txt  hgr1_batch2.txt    hgr1_batch6.txt
hgr1_batch12.txt  hgr1_batch3.txt    hgr1_batch7.txt

Thu Mar  9 00:24:40 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG02078 HG02078 ERR055341 KHV ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02078/sequence_read/ERR055341_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02078/sequence_read/ERR055341_2.filt.fastq.gz 
Instance ID: i-0ed0254037e36821e
Public DNS: ec2-52-89-36-89.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align_v1.sh to ./1kg_ali

In [1]:
# queenB run for 1kg_batchX.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch15.txt

1kg_align_v0.sh   hgr1_batch13.txt   hgr1_batch3.txt   hgr1_batch7.txt
1kg_align_v1.sh   hgr1_batch14.txt   hgr1_batch4a.txt  hgr1_batch8.txt
1kg_runs_1.txt	  hgr1_batch15.txt   hgr1_batch4b.txt  hgr1_batch9.txt
droneB.sh	  hgr1_batch1.txt    hgr1_batch5a.txt  hgr1_test.txt
hgr1_batch10.txt  hgr1_batch2.5.txt  hgr1_batch5b.txt  queenB.sh
hgr1_batch11.txt  hgr1_batch2.9.txt  hgr1_batch6a.txt
hgr1_batch12.txt  hgr1_batch2.txt    hgr1_batch6.txt

Thu Mar  9 10:00:46 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG03432 HG03432 SRR793290 MSL ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG03432/sequence_read/SRR793290_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG03432/sequence_read/SRR793290_2.filt.fastq.gz
Instance ID: i-003308b7973de4135
Public DNS: ec2-54-68-193-174.us-west-2.compute.amazonaws.com
download: s3://crownproject/scripts/1kg_align

In [2]:
# queenB run for 1kg_batchX.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch16.txt

1kg_align_v0.sh   hgr1_batch13.txt   hgr1_batch2.txt   hgr1_batch6.txt
1kg_align_v1.sh   hgr1_batch14.txt   hgr1_batch3.txt   hgr1_batch7.txt
1kg_runs_1.txt	  hgr1_batch15.txt   hgr1_batch4a.txt  hgr1_batch8.txt
droneB.sh	  hgr1_batch16.txt   hgr1_batch4b.txt  hgr1_batch9.txt
hgr1_batch10.txt  hgr1_batch1.txt    hgr1_batch5a.txt  hgr1_test.txt
hgr1_batch11.txt  hgr1_batch2.5.txt  hgr1_batch5b.txt  queenB.sh
hgr1_batch12.txt  hgr1_batch2.9.txt  hgr1_batch6a.txt

Thu Mar  9 11:12:11 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: NA19795 NA19795 SRR768206 MXL ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA19795/sequence_read/SRR768206_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA19795/sequence_read/SRR768206_2.filt.fastq.gz
Instance ID: i-03e289b9ef5efa0e9
Public DNS: ec2-54-68-153-86.us-west-2.compute.amazonaws.com
download: s3://crownproject/

In [3]:
# queenB run for 1kg_batchX.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch17.txt

1kg_align_v0.sh   hgr1_batch15.txt  hgr1_batch2.5.txt  hgr1_batch6.txt
1kg_align_v1.sh   hgr1_batch16.txt  hgr1_batch2.9.txt  hgr1_batch7.txt
1kg_runs_1.txt	  hgr1_batch17.txt  hgr1_batch2.txt    hgr1_batch8.txt
droneB.sh	  hgr1_batch18.txt  hgr1_batch3.txt    hgr1_batch9.txt
hgr1_batch10.txt  hgr1_batch19.txt  hgr1_batch4a.txt   hgr1_test.txt
hgr1_batch11.txt  hgr1_batch1.txt   hgr1_batch4b.txt   queenB.sh
hgr1_batch12.txt  hgr1_batch20.txt  hgr1_batch5a.txt
hgr1_batch13.txt  hgr1_batch21.txt  hgr1_batch5b.txt
hgr1_batch14.txt  hgr1_batch22.txt  hgr1_batch6a.txt

Thu Mar  9 14:19:08 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG02286 HG02286 SRR741396 PEL ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02286/sequence_read/SRR741396_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG02286/sequence_read/SRR741396_2.filt.fastq.gz
Instance ID: i-05

In [4]:
# queenB run for 1kg_batchX.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch18.txt

1kg_align_v0.sh   hgr1_batch15.txt  hgr1_batch2.5.txt  hgr1_batch6.txt
1kg_align_v1.sh   hgr1_batch16.txt  hgr1_batch2.9.txt  hgr1_batch7.txt
1kg_runs_1.txt	  hgr1_batch17.txt  hgr1_batch2.txt    hgr1_batch8.txt
droneB.sh	  hgr1_batch18.txt  hgr1_batch3.txt    hgr1_batch9.txt
hgr1_batch10.txt  hgr1_batch19.txt  hgr1_batch4a.txt   hgr1_test.txt
hgr1_batch11.txt  hgr1_batch1.txt   hgr1_batch4b.txt   queenB.sh
hgr1_batch12.txt  hgr1_batch20.txt  hgr1_batch5a.txt
hgr1_batch13.txt  hgr1_batch21.txt  hgr1_batch5b.txt
hgr1_batch14.txt  hgr1_batch22.txt  hgr1_batch6a.txt

Thu Mar  9 16:41:02 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG03624 HG03624 SRR794019 PJL ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG03624/sequence_read/SRR794019_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG03624/sequence_read/SRR794019_2.filt.fastq.gz
Instance ID: i-03

In [5]:
# queenB run for 1kg_batchX.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch19.txt

1kg_align_v0.sh   hgr1_batch15.txt  hgr1_batch2.5.txt  hgr1_batch6.txt
1kg_align_v1.sh   hgr1_batch16.txt  hgr1_batch2.9.txt  hgr1_batch7.txt
1kg_runs_1.txt	  hgr1_batch17.txt  hgr1_batch2.txt    hgr1_batch8.txt
droneB.sh	  hgr1_batch18.txt  hgr1_batch3.txt    hgr1_batch9.txt
hgr1_batch10.txt  hgr1_batch19.txt  hgr1_batch4a.txt   hgr1_test.txt
hgr1_batch11.txt  hgr1_batch1.txt   hgr1_batch4b.txt   queenB.sh
hgr1_batch12.txt  hgr1_batch20.txt  hgr1_batch5a.txt
hgr1_batch13.txt  hgr1_batch21.txt  hgr1_batch5b.txt
hgr1_batch14.txt  hgr1_batch22.txt  hgr1_batch6a.txt

Thu Mar  9 17:41:35 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG00740 HG00740 SRR069532 PUR ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00740/sequence_read/SRR069532_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00740/sequence_read/SRR069532_2.filt.fastq.gz
Instance ID: i-07

In [6]:
# queenB run for 1kg_batchX.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch20.txt

1kg_align_v0.sh   hgr1_batch15.txt  hgr1_batch2.5.txt  hgr1_batch6.txt
1kg_align_v1.sh   hgr1_batch16.txt  hgr1_batch2.9.txt  hgr1_batch7.txt
1kg_runs_1.txt	  hgr1_batch17.txt  hgr1_batch2.txt    hgr1_batch8.txt
droneB.sh	  hgr1_batch18.txt  hgr1_batch3.txt    hgr1_batch9.txt
hgr1_batch10.txt  hgr1_batch19.txt  hgr1_batch4a.txt   hgr1_test.txt
hgr1_batch11.txt  hgr1_batch1.txt   hgr1_batch4b.txt   queenB.sh
hgr1_batch12.txt  hgr1_batch20.txt  hgr1_batch5a.txt
hgr1_batch13.txt  hgr1_batch21.txt  hgr1_batch5b.txt
hgr1_batch14.txt  hgr1_batch22.txt  hgr1_batch6a.txt

Thu Mar  9 19:52:27 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: HG03836 HG03836 ERR181380 STU ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG03836/sequence_read/ERR181380_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG03836/sequence_read/ERR181380_2.filt.fastq.gz
Instance ID: i-0d

In [7]:
# queenB run for 1kg_batchX.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch21.txt

1kg_align_v0.sh   hgr1_batch15.txt  hgr1_batch2.5.txt  hgr1_batch6.txt
1kg_align_v1.sh   hgr1_batch16.txt  hgr1_batch2.9.txt  hgr1_batch7.txt
1kg_runs_1.txt	  hgr1_batch17.txt  hgr1_batch2.txt    hgr1_batch8.txt
droneB.sh	  hgr1_batch18.txt  hgr1_batch3.txt    hgr1_batch9.txt
hgr1_batch10.txt  hgr1_batch19.txt  hgr1_batch4a.txt   hgr1_test.txt
hgr1_batch11.txt  hgr1_batch1.txt   hgr1_batch4b.txt   queenB.sh
hgr1_batch12.txt  hgr1_batch20.txt  hgr1_batch5a.txt
hgr1_batch13.txt  hgr1_batch21.txt  hgr1_batch5b.txt
hgr1_batch14.txt  hgr1_batch22.txt  hgr1_batch6a.txt

Thu Mar  9 21:38:09 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: NA20514 NA20514 ERR229823 TSI ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA20514/sequence_read/ERR229823_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA20514/sequence_read/ERR229823_2.filt.fastq.gz
Instance ID: i-0f

In [8]:
# queenB run for 1kg_batchX.txt
# made PARAMETERS=$1
cd ~/Crown/data/tmp/
ls; echo ''
date

sh queenB.sh hgr1_batch22.txt

1kg_align_v0.sh   hgr1_batch15.txt  hgr1_batch2.5.txt  hgr1_batch6.txt
1kg_align_v1.sh   hgr1_batch16.txt  hgr1_batch2.9.txt  hgr1_batch7.txt
1kg_runs_1.txt	  hgr1_batch17.txt  hgr1_batch2.txt    hgr1_batch8.txt
droneB.sh	  hgr1_batch18.txt  hgr1_batch3.txt    hgr1_batch9.txt
hgr1_batch10.txt  hgr1_batch19.txt  hgr1_batch4a.txt   hgr1_test.txt
hgr1_batch11.txt  hgr1_batch1.txt   hgr1_batch4b.txt   queenB.sh
hgr1_batch12.txt  hgr1_batch20.txt  hgr1_batch5a.txt
hgr1_batch13.txt  hgr1_batch21.txt  hgr1_batch5b.txt
hgr1_batch14.txt  hgr1_batch22.txt  hgr1_batch6a.txt

Fri Mar 10 00:27:10 PST 2017
Launch instance # 1
Instance Type: c4.xlarge
AMI Image: ami-66129306
Run Script: s3://crownproject/scripts/1kg_align_v1.sh
Parameters: NA19153 NA19153 SRR794367 YRI ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA19153/sequence_read/SRR794367_1.filt.fastq.gz ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA19153/sequence_read/SRR794367_2.filt.fastq.gz
Instance ID: i-0f

## File stats

on AWS.

In [11]:
aws s3 ls s3://crownproject/1kg_hgr1/

2017-03-04 19:05:22          0 
2017-03-08 12:31:13        386 HG00128.flagstat
2017-03-08 12:31:13    2634032 HG00128.hgr1.bam
2017-03-08 12:31:14        656 HG00128.hgr1.bam.bai
2017-03-08 12:31:14        375 HG00128.hgr1.flagstat
2017-03-08 12:31:14      39989 HG00128.hgr1.vcf
2017-03-08 12:31:15        781 HG00128.hgr1.vcf.idx
2017-03-08 12:41:07        387 HG00139.flagstat
2017-03-08 12:41:07    7258373 HG00139.hgr1.bam
2017-03-08 12:41:08        688 HG00139.hgr1.bam.bai
2017-03-08 12:41:08        376 HG00139.hgr1.flagstat
2017-03-08 12:41:08      26601 HG00139.hgr1.vcf
2017-03-08 12:41:09        282 HG00139.hgr1.vcf.idx
2017-03-08 13:53:27        388 HG00234.flagstat
2017-03-08 13:53:27    3879715 HG00234.hgr1.bam
2017-03-08 13:53:28        632 HG00234.hgr1.bam.bai
2017-03-08 13:53:28        374 HG00234.hgr1.flagstat
2017-03-08 13:53:28      60138 HG00234.hgr1.vcf
2017-03-08 13:53:29       2300 HG00234.hgr1.vcf.idx
2017-03-08 12:17:22        386 HG00253.flagsta

In [10]:
# omg, 104 genomes are done!
# let's rock and roll!

# qed

