# 1000 Genomes: Deep Coverage PacBio Alignments 
```
pi:ababaian
start: 2016 11 04
complete : 2016 11 10
```

ALIAS NOTE: `hgr_main.fa` and `rDNA_main.bed` files renamed `hgr_45s.fa` and `rDNA_45s.bed` respectively which is a more accurate name.

## Introduction

The more I've been looking at Illumina alignments the more I think they are limited in what they can detect with respect to any structural variants. A 'hybrid' sequence approach is likely the correct approach here where long error-prone reads (pacbio) will form the scaffolds/consensus structures of rDNA with different expansion loops and then a genome with a variety of consensus sequences can be used to align illumina reads to.

This was done over the last couple of days, with LOTS of trouble-shooting but the following is what came out of that process.

There was a very interesting paper that solved exactly this problem but using the [HIV Genome](http://nar.oxfordjournals.org/content/43/20/e129.full). I should review this carefully and see if it's applicable here.


## Objectives

- Align high-depth pac-bio reads to hgr.fa genome
- Define a consensus rDNA sequence at the rRNA locus
- Identify INDELS/structural variants within the rRNA locus


## Materials and Methods

### Setting up an AWS EC2 system for rapid alignment

This was unneccesarily difficult and tiresome but it's up and running (with room for improvement). Some of the lessons for anyone interested in using Amazon Web Services for biological work are
- Don't use the Amazon Linux; use the ubuntu system
- Things will take longer then anticipated

The script below is what was used to set-up the Crown AMI (ami-59b71739). Things commented out were not ran.

This is a good starting point; learning to 'launch' an AMI with it's own set of instructions and close in parallel will greatly increase the speed with which I can process data. For now, one longer process on a single instance.

NOTE: The hgr.fa and hgr_main.fa files are incorrect. Don't use them.
I'll fix this when I update the AMI but for now re-download them from S3.



In [None]:
#!/bin/bash/
# ami-59b71739 Make Script
# Initialization script for an Ubuntu 16.04 LTS instance
# Run with at least 4 Gb of memory to compile successfully

# Update
sudo apt-get update

# Bioinformatics Software
sudo apt-get install samtools # v. 0.1.19
sudo apt-get install bowtie # v. 1.1.2
sudo apt-get install bowtie2 # v. 2.2.6
sudo apt-get install tophat
#sudo apt-get install blasr # v.

sudo apt-get install docker.io
sudo service docker start
sudo usermod -a -G docker ubuntu
# Need to re-login here. Possible split into multiple tasks

# AWS Command Line
sudo apt-get install awscli
aws configure # ENTER CREDENTIALS MANUALLY

# Small Binary Utilities Download (NCBI)
mkdir ~/bin; cd ~/bin

	wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/faToTwoBit
	wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/twoBitToFa
	wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/wigToBigWig
	wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/fastqToFa
	wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/faSplit

	chmod 755 *
cd ~

# Compiler Software
	sudo apt-get install build-essential
	sudo apt-get install gfortran
	sudo apt-get install graphviz
	sudo apt-get install libjpeg-dev
	sudo apt-get install libfreetype6-dev
	sudo apt-get install python


# Make Blastr from source
mkdir software; cd software
git clone git://github.com/PacificBiosciences/pitchfork

cd pitchfork
make init
cd deployment
sh setup-env.sh
cd ..
make blasr

cd workspace
ln -s hdf5-1.8.16 hdf5
ln -s blasr blasr_install
cat deployment/setup-env.sh >> /home/ubuntu/.bashrc

# Download hg38 (from UCSC)
	mkdir ~/resources; cd ~/resources
	wget ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit
	twoBitToFa hg38.2bit hg38.fa
	samtools faidx hg38.fa
	rm hg38.2bit

# Download hgr and rDNA (from S3)
	aws s3 cp s3://crownproject/resources/hgr.fa ./
	aws s3 cp s3://crownproject/resources/hgr_main.fa ./
	aws s3 cp s3://crownproject/resources/rDNA.fa ./


## Download Biocontainers
# git clone https://github.com/BioContainers/containers

# Install Git LFS
#	wget https://github.com/github/git-lfs/releases/download/v1.4.4/git-lfs-linux-amd64-1.4.4.tar.gz
#
#	tar -xvf git-lfs-linux-amd64-1.4.4.tar.gz
#	sudo sh git-lfs-1.4.4/install.sh	
#	rm git*



# Build Docker Container
	# docker run [OPTIONS] <IMAGE> 	<command> <arguments>
	# -V : create symbolic link between <env dir>:<container dir>
	# becomes
	# <Command> <Argument>
	
	# Build bowtie container
	# cd containers/bowtie/1.1.2/
	# docker build -t bowtie . #builds bowtie image
	# alias bowtie='docker run -V /home/ec2-user:/home/ bowtie bowtie'

	# Build Samtools 1.3 Container
	# cd containers/samtools/1.3.1/
	# docker build -t samtools . #builds samtools image


# Download Crown Project Files
#	git clone https://github.com/ababaian/Crown.git

##
##
## CROWN_INIT INSTANCE SCREENSHOT HERE
##
##

## Setting up an AWS EC2 system for rapid alignment

### 1000 Genomes project data

Browsing online I found that 1kG data is available both as and ftp at

[http://ftp.1000genomes.ebi.ac.uk/vol1/ftp](http://ftp.1000genomes.ebi.ac.uk/vol1/ftp)

and as an s3 bucket (accesible via aws s3 ...)

[s3://1000genomes/](http://www.internationalgenome.org/using-1000-genomes-data-amazon-web-service-cloud/)


In [None]:
# On the AWS AMI-machine
mkdir na12878; cd na12878

# Download hgr.fa genome from S3
aws s3 cp s3://crownproject/resources/hgr.fa ./
aws s3 cp s3://crownproject/resources/rDNA_main.bed ./

# Download s3 blasr align script
aws s3 cp s3://crownproject/scripts/s3_blasrAlign.sh ./
chmod 755 s3_blasrAlign.sh

# From 1kgenomes bucket; 
# list pacbio fastq files of NA12878 chemistry 1
aws s3 ls s3://1000genomes/phase3/integrated_sv_map/supporting/NA12878/pacbio/fastq/chemistry_1/ > aws_dir.list
sed 's/  */ /g' aws_dir.list | cut -f4 -d' ' - > fastq.list

## NOTE: There at 623 files here!
## I'll likely run ~100 stop and then run the rest later
## if there's useful information in the first 100 files
##

# Pilot Run
# sh s3_blasrAlign.sh s3://1000genomes/phase3/integrated_sv_map/supporting/NA12878/pacbio/fastq/chemistry_1 fastq.list hgr.fa


In [None]:
#!/bin/sh
# s3_blasrAlign.sh
#
# Usage: s3_blasrAlign.sh <s3_dir> <file list in s3> <genome.fa>
# For each file in file_list in the s3_dir
# download the fastq file;
# align it to the genome.fa
# compress output sam into a bam file
# remove fastq file and begin with next one
#

# INPUT =====================

# S3 directory containing fastq files
S3_DIR=$1

# List of FASTQ filenames in S3_DIR to download
# iteratively
FILE_LIST=$2

# Genome to align to
GENOME=$3

# Count Variable
COUNT='1'

# SCRIPT ====================
for FILE in $(cat $FILE_LIST)
do
	FILEPATH=$(echo $S3_DIR/$FILE)

	echo "Starting download of $FILEPATH"
	echo ""
	
	# aws configure must be run beforehand
	aws s3 cp $FILEPATH ./

	# Use blasr to align $FILE to $GENOME
	# Can set different number of processors
	blasr $FILE $GENOME --nproc 2 --sam --out alignedTMP.sam

	# Remove fastq file
	rm $FILE

	# Sort, index and bam-file the output
	samtools view -bS alignedTMP.sam | samtools sort - aligned_$COUNT

	samtools index aligned_$COUNT.bam

	# Remove sam file
	rm alignedTMP.sam
    
    # Make a subset bam file of the
    # transcribed rDNA locus (18-28S)
    samtools view -b -L rDNA_main.bed aligned_$COUNT.bam | samtools sort - hgMain_$COUNT

	# Count
	COUNT=$((COUNT+1))
    
done

# Concatenate all the individual bam files for hgrMain
samtools cat $(ls hgMain_*.bam) | samtools sort - na12878.pb.hgr_N

#~~~ End of Script ~~~ 

## Results

### Pilot Run (~100 files)

`sh s3_blasrAlign.sh s3://1000genomes/phase3/integrated_sv_map/supporting/NA12878/pacbio/fastq/chemistry_1 fastq.list hgr.fa`


blasr alignment was ran on the first 89/623 files in fastq.list (see below). That is all files upto m121229_*.fastq

I quickly checked that the output looks OK and indeed it looked great.

### Second Run: The rest of the files
I'll set-up 5 nodes to each run a sub-set of the fastq.list.

i-001fbe033f3503017 node 1: 90,200
i-02134a94f53fea067 node 2: 201,310
i-07df0af23fe170eb7 node 3: 311,420
i-088ab916c7cacf11c node 4: 421,530
i-08c948773fd5f290b node 5: 531,623


`sh s3_blasrAlign.sh s3://1000genomes/phase3/integrated_sv_map/supporting/NA12878/pacbio/fastq/chemistry_1 input.list hgr.fa`

where
(subset with: `sed -n 90,200p fastq.list > input.list`)
(I forgot to change count off-sets in each node so they count upwards from 1,
I'll have to seperate alignments into distinct folders or simply merge all bam files
into one master output (best option probably))

Additionally: a few commands were added to the run script above.

```
# Make a subset bam file of the
# transcribed rDNA locus (18-28S)
samtools view -b -L rDNA_main.bed aligned_$COUNT.bam | samtools sort - hgMain_$COUNT

```
and at the end
```
samtools cat $(ls hgMain_*.bam) | samtools sort - na12878.pb.hgr_N
```

##### rDNA_main.bed
chr13   999590  1013590 rMain   0       .


### Merging Outputs

On each 6 nodes (pilot, 0 + 1-5); create a header list retaining read-group information

`for BAM in $(ls *.bam); do echo $BAM >> header.list.N.txt; samtools view -H $BAM  | tail -n 2 >> header.list.N.txt; done`


Then merge bam files into a single file for each N node

`samtools merge na12878_hgr.N.bam *.bam`


`samtools index na12878_hgr.N.bam`

Extract hgr_main from na12878_hgr.N.bam

`samtools view -b -L rDNA_main.bed na12878_hgr.N.bam | samtools sort - na12878_main.N`

### Final Outputs
For each "N" node there is a set of output files

- `na12878_hgr.N.bam`: All fastq files from node N are aligned to hgr.fa
- `na12878_hgr.N.bam.bai`: Index for bam file
- `headers.0.txt`: filename / RG+CMD from headers for each alignment subfile
- `na12878_main.N.bam`: rDNA_main subset of the whole file. Transcribed Region

also note: I will retain ~10 alignment files from the pilot study which will be in a subfolder 'subalignments0'. Named `aligned_[1-10].bam`


All files stored in s3://crownproject/na12878/


In [None]:
### fastq.list
sh s3_blasrAlign.sh s3://1000genomes/phase3/integrated_sv_map/supporting/NA12878/pacbio/fastq/chemistry_1 fastq.list hgr.fa
m121214_011943_42156_c100436682550000001523065002011362_s1_p0.fastq
m121214_033934_42156_c100436682550000001523065002011363_s1_p0.fastq
m121214_055855_42156_c100436682550000001523065002011364_s1_p0.fastq
m121214_081725_42156_c100436682550000001523065002011365_s1_p0.fastq
m121214_103703_42156_c100436682550000001523065002011366_s1_p0.fastq
m121214_125514_42156_c100436682550000001523065002011367_s1_p0.fastq
m121214_151324_42156_c100448882550000001523062202151390_s1_p0.fastq
m121214_173305_42156_c100448882550000001523062202151391_s1_p0.fastq
m121219_195104_42147_c100448392550000001523062202151330_s1_p0.fastq
m121219_221142_42147_c100448392550000001523062202151331_s1_p0.fastq
m121220_003033_42147_c100448392550000001523062202151332_s1_p0.fastq
m121221_195740_42147_c100437222550000001523065002011330_s1_p0.fastq
m121221_221831_42147_c100437222550000001523065002011331_s1_p0.fastq
m121221_222101_42156_c100333902550000001523020412311240_s1_p0.fastq
m121222_003641_42147_c100437222550000001523065002011332_s1_p0.fastq
m121222_003948_42156_c100333902550000001523020412311241_s1_p0.fastq
m121222_025524_42147_c100437222550000001523065002011333_s1_p0.fastq
m121222_025843_42156_c100333902550000001523020412311242_s1_p0.fastq
m121222_051324_42147_c100437222550000001523065002011334_s1_p0.fastq
m121222_051754_42156_c100333902550000001523020412311243_s1_p0.fastq
m121222_073222_42147_c100437222550000001523065002011335_s1_p0.fastq
m121222_073632_42156_c100333902550000001523020412311244_s1_p0.fastq
m121222_095218_42147_c100437222550000001523065002011336_s1_p0.fastq
m121222_095609_42156_c100333902550000001523020412311245_s1_p0.fastq
m121222_121019_42147_c100437222550000001523065002011337_s1_p0.fastq
m121222_121449_42156_c100333902550000001523020412311246_s1_p0.fastq
m121222_143342_42156_c100333902550000001523020412311247_s1_p0.fastq
m121224_195214_42147_c100448022550000001523062202151390_s1_p0.fastq
m121224_221222_42147_c100448022550000001523062202151391_s1_p0.fastq
m121224_231907_42156_c100456712550000001523064703201320_s1_p0.fastq
m121225_003042_42147_c100448022550000001523062202151392_s1_p0.fastq
m121225_013655_42156_c100456712550000001523064703201321_s1_p0.fastq
m121225_024856_42147_c100448022550000001523062202151393_s1_p0.fastq
m121225_035751_42156_c100456712550000001523064703201322_s1_p0.fastq
m121225_050718_42147_c100448022550000001523062202151394_s1_p0.fastq
m121225_061549_42156_c100456712550000001523064703201323_s1_p0.fastq
m121225_072640_42147_c100448022550000001523062202151395_s1_p0.fastq
m121225_083333_42156_c100456712550000001523064703201324_s1_p0.fastq
m121225_094600_42147_c100448022550000001523062202151396_s1_p0.fastq
m121225_105253_42156_c100456712550000001523064703201325_s1_p0.fastq
m121225_120419_42147_c100448022550000001523062202151397_s1_p0.fastq
m121225_131122_42156_c100456712550000001523064703201326_s1_p0.fastq
m121225_153017_42156_c100456712550000001523064703201327_s1_p0.fastq
m121225_162941_42147_c100433742550000001523063302011332_s1_p0.fastq
m121225_184907_42147_c100433742550000001523063302011333_s1_p0.fastq
m121225_195722_42156_c100333942550000001523020412311200_s1_p0.fastq
m121225_210825_42147_c100433742550000001523063302011334_s1_p0.fastq
m121225_221641_42156_c100333942550000001523020412311201_s1_p0.fastq
m121225_232607_42147_c100433742550000001523063302011335_s1_p0.fastq
m121226_003519_42156_c100333942550000001523020412311202_s1_p0.fastq
m121226_014447_42147_c100433742550000001523063302011336_s1_p0.fastq
m121226_025417_42156_c100333942550000001523020412311203_s1_p0.fastq
m121226_040459_42147_c100433742550000001523063302011337_s1_p0.fastq
m121226_051246_42156_c100333942550000001523020412311204_s1_p0.fastq
m121226_073216_42156_c100333942550000001523020412311205_s1_p0.fastq
m121227_192126_42156_c100333942550000001523020412311206_s1_p0.fastq
m121227_203824_42147_c100334002550000001523020412311200_s1_p0.fastq
m121227_214342_42156_c100333942550000001523020412311207_s1_p0.fastq
m121227_225758_42147_c100334002550000001523020412311201_s1_p0.fastq
m121228_000330_42156_c100448882550000001523062202151396_s1_p0.fastq
m121228_022258_42156_c100448882550000001523062202151397_s1_p0.fastq
m121228_033442_42147_c100334002550000001523020412311203_s1_p0.fastq
m121228_044124_42156_c100457472550000001523065103201310_s1_p0.fastq
m121228_055444_42147_c100334002550000001523020412311204_s1_p0.fastq
m121228_065920_42156_c100457472550000001523065103201311_s1_p0.fastq
m121228_081253_42147_c100334002550000001523020412311205_s1_p0.fastq
m121228_091814_42156_c100457472550000001523065103201312_s1_p0.fastq
m121228_103122_42147_c100334002550000001523020412311206_s1_p0.fastq
m121228_113854_42156_c100457472550000001523065103201313_s1_p0.fastq
m121228_125057_42147_c100334002550000001523020412311207_s1_p0.fastq
m121228_211454_42147_c100433652550000001523063302011350_s1_p0.fastq
m121228_233405_42147_c100433652550000001523063302011351_s1_p0.fastq
m121229_011410_42156_c100433372550000001523063302011320_s1_p0.fastq
m121229_015427_42147_c100433652550000001523063302011352_s1_p0.fastq
m121229_033325_42156_c100433372550000001523063302011321_s1_p0.fastq
m121229_041135_42147_c100433652550000001523063302011353_s1_p0.fastq
m121229_055149_42156_c100433372550000001523063302011322_s1_p0.fastq
m121229_063014_42147_c100433652550000001523063302011354_s1_p0.fastq
m121229_081044_42156_c100433372550000001523063302011323_s1_p0.fastq
m121229_085124_42147_c100433652550000001523063302011355_s1_p0.fastq
m121229_103007_42156_c100433372550000001523063302011324_s1_p0.fastq
m121229_110851_42147_c100433652550000001523063302011356_s1_p0.fastq
m121229_124748_42156_c100433372550000001523063302011325_s1_p0.fastq
m121229_132735_42147_c100433652550000001523063302011357_s1_p0.fastq
m121229_150719_42156_c100433372550000001523063302011326_s1_p0.fastq
m121229_172717_42156_c100433372550000001523063302011327_s1_p0.fastq
m121229_201356_42147_c100437152550000001523065002011331_s1_p0.fastq
m121229_215330_42156_c100458152550000001523043703201310_s1_p0.fastq
m121229_223348_42147_c100437152550000001523065002011332_s1_p0.fastq
m121230_001242_42156_c100458152550000001523043703201311_s1_p0.fastq
m121230_005228_42147_c100437152550000001523065002011333_s1_p0.fastq
m121230_023139_42156_c100458152550000001523043703201312_s1_p0.fastq
m121230_031027_42147_c100437152550000001523065002011334_s1_p0.fastq
m121230_045027_42156_c100458152550000001523043703201313_s1_p0.fastq
m121230_053022_42147_c100437152550000001523065002011335_s1_p0.fastq
m121230_070900_42156_c100458152550000001523043703201314_s1_p0.fastq
m121230_092732_42156_c100458152550000001523043703201315_s1_p0.fastq
m130103_025322_42156_c100333372550000001523020412311351_s1_p0.fastq
m130103_051151_42156_c100333372550000001523020412311352_s1_p0.fastq
m130103_073124_42156_c100333372550000001523020412311353_s1_p0.fastq
m130103_095006_42156_c100333372550000001523020412311354_s1_p0.fastq
m130103_121011_42156_c100333372550000001523020412311355_s1_p0.fastq
m130103_142838_42156_c100333372550000001523020412311356_s1_p0.fastq
m130103_164730_42156_c100333372550000001523020412311357_s1_p0.fastq
m130103_211303_42156_c100333372550000001523020412311351_s1_p0.fastq
m130103_233240_42156_c100333372550000001523020412311352_s1_p0.fastq
m130104_004536_42147_c100419222550000001523045401151310_s1_p0.fastq
m130104_015030_42156_c100333372550000001523020412311353_s1_p0.fastq
m130104_030510_42147_c100419222550000001523045401151311_s1_p0.fastq
m130104_041023_42156_c100333372550000001523020412311354_s1_p0.fastq
m130104_052352_42147_c100419222550000001523045401151312_s1_p0.fastq
m130104_062941_42156_c100333372550000001523020412311355_s1_p0.fastq
m130104_074208_42147_c100419222550000001523045401151313_s1_p0.fastq
m130104_084821_42156_c100333372550000001523020412311356_s1_p0.fastq
m130104_100135_42147_c100419222550000001523045401151314_s1_p0.fastq
m130104_122040_42147_c100419222550000001523045401151315_s1_p0.fastq
m130104_143941_42147_c100419222550000001523045401151316_s1_p0.fastq
m130104_165829_42147_c100419222550000001523045401151317_s1_p0.fastq
m130104_200720_42156_c100457242550000001523064703201333_s1_p0.fastq
m130104_212416_42147_c100437152550000001523065002011336_s1_p0.fastq
m130104_222751_42156_c100457242550000001523064703201334_s1_p0.fastq
m130104_234333_42147_c100437152550000001523065002011337_s1_p0.fastq
m130105_004618_42156_c100457242550000001523064703201335_s1_p0.fastq
m130105_020118_42147_c100458152550000001523043703201314_s1_p0.fastq
m130105_030431_42156_c100457242550000001523064703201336_s1_p0.fastq
m130105_042058_42147_c100458152550000001523043703201315_s1_p0.fastq
m130105_052305_42156_c100457242550000001523064703201337_s1_p0.fastq
m130105_074135_42156_c100398922550000001523035411101355_s1_p0.fastq
m130105_100104_42156_c100398922550000001523035411101356_s1_p0.fastq
m130105_121914_42156_c100398922550000001523035411101357_s1_p0.fastq
m130108_012734_42147_c100457912550000001523068703201310_s1_p0.fastq
m130108_034657_42147_c100457912550000001523068703201311_s1_p0.fastq
m130108_060536_42147_c100457912550000001523068703201312_s1_p0.fastq
m130108_082428_42147_c100457912550000001523068703201313_s1_p0.fastq
m130108_104409_42147_c100457912550000001523068703201314_s1_p0.fastq
m130108_130411_42147_c100457912550000001523068703201315_s1_p0.fastq
m130108_152225_42147_c100457912550000001523068703201316_s1_p0.fastq
m130108_174039_42147_c100457912550000001523068703201317_s1_p0.fastq
m130108_194319_42156_c100457222550000001523068703201310_s1_p0.fastq
m130108_220125_42156_c100457222550000001523068703201311_s1_p0.fastq
m130109_002005_42156_c100457222550000001523068703201312_s1_p0.fastq
m130109_024004_42156_c100457222550000001523068703201313_s1_p0.fastq
m130109_045825_42156_c100457222550000001523068703201314_s1_p0.fastq
m130109_071656_42156_c100457222550000001523068703201315_s1_p0.fastq
m130109_235759_42147_c100475172550000001823071206131380_s1_p0.fastq
m130110_012933_42156_c100457362550000001523068703201340_s1_p0.fastq
m130110_021726_42147_c100475172550000001823071206131381_s1_p0.fastq
m130110_034928_42156_c100457362550000001523068703201341_s1_p0.fastq
m130110_043559_42147_c100475172550000001823071206131382_s1_p0.fastq
m130110_060801_42156_c100457362550000001523068703201342_s1_p0.fastq
m130110_065455_42147_c100475172550000001823071206131383_s1_p0.fastq
m130110_082722_42156_c100457362550000001523068703201343_s1_p0.fastq
m130110_091430_42147_c100475172550000001823071206131384_s1_p0.fastq
m130110_104632_42156_c100457362550000001523068703201344_s1_p0.fastq
m130110_113430_42147_c100475172550000001823071206131385_s1_p0.fastq
m130110_130455_42156_c100457362550000001523068703201345_s1_p0.fastq
m130110_135254_42147_c100475172550000001823071206131386_s1_p0.fastq
m130110_152304_42156_c100457362550000001523068703201346_s1_p0.fastq
m130110_161153_42147_c100475172550000001823071206131387_s1_p0.fastq
m130110_174133_42156_c100457362550000001523068703201347_s1_p0.fastq
m130110_220814_42147_c100457892550000001523068703201360_s1_p0.fastq
m130110_232636_42156_c100458152550000001523043703201316_s1_p0.fastq
m130111_002803_42147_c100457892550000001523068703201361_s1_p0.fastq
m130111_011008_42137_c100474962550000001823071206131360_s1_p0.fastq
m130111_014638_42156_c100458152550000001523043703201317_s1_p0.fastq
m130111_024656_42147_c100457892550000001523068703201362_s1_p0.fastq
m130111_033238_42137_c100474962550000001823071206131361_s1_p0.fastq
m130111_040439_42156_c100457472550000001523065103201316_s1_p0.fastq
m130111_050542_42147_c100457892550000001523068703201363_s1_p0.fastq
m130111_055600_42137_c100474962550000001823071206131362_s1_p0.fastq
m130111_062254_42156_c100457472550000001523065103201317_s1_p0.fastq
m130111_072345_42147_c100457892550000001523068703201364_s1_p0.fastq
m130111_081948_42137_c100474962550000001823071206131363_s1_p0.fastq
m130111_084213_42156_c100475012550000001823071206131370_s1_p0.fastq
m130111_094220_42147_c100457892550000001523068703201365_s1_p0.fastq
m130111_104307_42137_c100474962550000001823071206131364_s1_p0.fastq
m130111_110129_42156_c100475012550000001823071206131371_s1_p0.fastq
m130111_120035_42147_c100457892550000001523068703201366_s1_p0.fastq
m130111_130623_42137_c100474962550000001823071206131365_s1_p0.fastq
m130111_131924_42156_c100475012550000001823071206131372_s1_p0.fastq
m130111_141952_42147_c100457892550000001523068703201367_s1_p0.fastq
m130111_153212_42137_c100474962550000001823071206131366_s1_p0.fastq
m130111_153802_42156_c100475012550000001823071206131373_s1_p0.fastq
m130111_175604_42137_c100474962550000001823071206131367_s1_p0.fastq
m130111_212136_42156_c100475002550000001823071206131380_s1_p0.fastq
m130111_234053_42156_c100475002550000001823071206131381_s1_p0.fastq
m130112_001858_42147_c100457552550000001523068703201390_s1_p0.fastq
m130112_002953_42137_c100456952550000001523068703201380_s1_p0.fastq
m130112_015912_42156_c100475002550000001823071206131382_s1_p0.fastq
m130112_023801_42147_c100457552550000001523068703201391_s1_p0.fastq
m130112_025319_42137_c100456952550000001523068703201381_s1_p0.fastq
m130112_041903_42156_c100475002550000001823071206131383_s1_p0.fastq
m130112_045554_42147_c100457552550000001523068703201392_s1_p0.fastq
m130112_051717_42137_c100456952550000001523068703201382_s1_p0.fastq
m130112_063728_42156_c100475002550000001823071206131384_s1_p0.fastq
m130112_071504_42147_c100457552550000001523068703201393_s1_p0.fastq
m130112_074041_42137_c100456952550000001523068703201383_s1_p0.fastq
m130112_085606_42156_c100475002550000001823071206131385_s1_p0.fastq
m130112_093333_42147_c100457552550000001523068703201394_s1_p0.fastq
m130112_100341_42137_c100456952550000001523068703201384_s1_p0.fastq
m130112_111412_42156_c100475002550000001823071206131386_s1_p0.fastq
m130112_115149_42147_c100457552550000001523068703201395_s1_p0.fastq
m130112_122713_42137_c100456952550000001523068703201385_s1_p0.fastq
m130112_133323_42156_c100475002550000001823071206131387_s1_p0.fastq
m130112_141113_42147_c100457552550000001523068703201396_s1_p0.fastq
m130112_145313_42137_c100456952550000001523068703201386_s1_p0.fastq
m130112_163027_42147_c100457552550000001523068703201397_s1_p0.fastq
m130112_171545_42137_c100456952550000001523068703201387_s1_p0.fastq
m130112_194406_42156_c100457592550000001523068703201350_s1_p0.fastq
m130112_205735_42147_c100457442550000001523068703201330_s1_p0.fastq
m130112_214821_42137_c100466522550000001523052205101390_s1_p0.fastq
m130112_220409_42156_c100457592550000001523068703201351_s1_p0.fastq
m130112_231604_42147_c100457442550000001523068703201331_s1_p0.fastq
m130113_001120_42137_c100466522550000001523052205101391_s1_p0.fastq
m130113_002201_42156_c100457592550000001523068703201352_s1_p0.fastq
m130113_013607_42147_c100457442550000001523068703201332_s1_p0.fastq
m130113_023433_42137_c100466522550000001523052205101392_s1_p0.fastq
m130113_024126_42156_c100457592550000001523068703201353_s1_p0.fastq
m130113_035305_42147_c100457442550000001523068703201333_s1_p0.fastq
m130113_045738_42137_c100466522550000001523052205101393_s1_p0.fastq
m130113_045945_42156_c100457592550000001523068703201354_s1_p0.fastq
m130113_061308_42147_c100457442550000001523068703201334_s1_p0.fastq
m130113_071841_42156_c100457592550000001523068703201355_s1_p0.fastq
m130113_083116_42147_c100457442550000001523068703201335_s1_p0.fastq
m130113_093741_42156_c100457592550000001523068703201356_s1_p0.fastq
m130113_115621_42156_c100457592550000001523068703201357_s1_p0.fastq
m130113_162135_42156_c100457002550000001523068703201390_s1_p0.fastq
m130113_184108_42156_c100457002550000001523068703201391_s1_p0.fastq
m130113_210036_42156_c100457002550000001523068703201392_s1_p0.fastq
m130113_231911_42156_c100457002550000001523068703201393_s1_p0.fastq
m130114_013913_42156_c100457002550000001523068703201394_s1_p0.fastq
m130114_035750_42156_c100457002550000001523068703201395_s1_p0.fastq
m130115_010056_42156_c100458012550000001523068703201370_s1_p0.fastq
m130115_032027_42156_c100458012550000001523068703201371_s1_p0.fastq
m130115_053841_42156_c100458012550000001523068703201372_s1_p0.fastq
m130115_075833_42156_c100458012550000001523068703201373_s1_p0.fastq
m130115_101815_42156_c100458012550000001523068703201374_s1_p0.fastq
m130115_123615_42156_c100458012550000001523068703201375_s1_p0.fastq
m130115_145353_42156_c100458012550000001523068703201376_s1_p0.fastq
m130115_171239_42156_c100458012550000001523068703201377_s1_p0.fastq
m130116_001729_42156_c100475342550000001823071206131350_s1_p0.fastq
m130116_013005_42147_c100475692550000001823071206131310_s1_p0.fastq
m130116_023641_42156_c100475342550000001823071206131351_s1_p0.fastq
m130116_034853_42147_c100475692550000001823071206131311_s1_p0.fastq
m130116_045501_42156_c100475342550000001823071206131352_s1_p0.fastq
m130116_060812_42147_c100475692550000001823071206131312_s1_p0.fastq
m130116_071443_42156_c100475342550000001823071206131353_s1_p0.fastq
m130116_082633_42147_c100475692550000001823071206131313_s1_p0.fastq
m130116_093321_42156_c100475342550000001823071206131354_s1_p0.fastq
m130116_104633_42147_c100475692550000001823071206131314_s1_p0.fastq
m130116_115144_42156_c100475342550000001823071206131355_s1_p0.fastq
m130116_130428_42147_c100475692550000001823071206131315_s1_p0.fastq
m130116_141005_42156_c100475342550000001823071206131356_s1_p0.fastq
m130116_152245_42147_c100475692550000001823071206131316_s1_p0.fastq
m130116_162930_42156_c100475342550000001823071206131357_s1_p0.fastq
m130116_174212_42147_c100475692550000001823071206131317_s1_p0.fastq
m130116_205554_42156_c100475662550000001823071206131340_s1_p0.fastq
m130116_220941_42147_c100475812550000001823071206131330_s1_p0.fastq
m130116_231402_42156_c100475662550000001823071206131341_s1_p0.fastq
m130117_002735_42147_c100475812550000001823071206131331_s1_p0.fastq
m130117_013318_42156_c100475662550000001823071206131342_s1_p0.fastq
m130117_024714_42147_c100475812550000001823071206131332_s1_p0.fastq
m130117_035224_42156_c100475662550000001823071206131343_s1_p0.fastq
m130117_050547_42147_c100475812550000001823071206131333_s1_p0.fastq
m130117_061042_42156_c100475662550000001823071206131344_s1_p0.fastq
m130117_072531_42147_c100475812550000001823071206131334_s1_p0.fastq
m130117_082848_42156_c100475662550000001823071206131345_s1_p0.fastq
m130117_094357_42147_c100475812550000001823071206131335_s1_p0.fastq
m130118_014224_42147_c100475022550000001823071206131360_s1_p0.fastq
m130118_015023_42156_c100474882550000001823071206131370_s1_p0.fastq
m130118_040108_42147_c100475022550000001823071206131361_s1_p0.fastq
m130118_040910_42156_c100474882550000001823071206131371_s1_p0.fastq
m130118_062033_42147_c100475022550000001823071206131362_s1_p0.fastq
m130118_062810_42156_c100474882550000001823071206131372_s1_p0.fastq
m130118_084011_42147_c100475022550000001823071206131363_s1_p0.fastq
m130118_084748_42156_c100474882550000001823071206131373_s1_p0.fastq
m130118_105857_42147_c100475022550000001823071206131364_s1_p0.fastq
m130118_110614_42156_c100474882550000001823071206131374_s1_p0.fastq
m130118_131834_42147_c100475022550000001823071206131365_s1_p0.fastq
m130118_132509_42156_c100474882550000001823071206131375_s1_p0.fastq
m130118_153701_42147_c100475022550000001823071206131366_s1_p0.fastq
m130118_154307_42156_c100474882550000001823071206131376_s1_p0.fastq
m130118_175542_42147_c100475022550000001823071206131367_s1_p0.fastq
m130118_180120_42156_c100474882550000001823071206131377_s1_p0.fastq
m130118_233136_42156_c100475032550000001823071206131350_s1_p0.fastq
m130119_015127_42156_c100475032550000001823071206131351_s1_p0.fastq
m130119_041008_42156_c100475032550000001823071206131352_s1_p0.fastq
m130119_062949_42156_c100475032550000001823071206131353_s1_p0.fastq
m130119_084724_42156_c100475032550000001823071206131354_s1_p0.fastq
m130119_110607_42156_c100475032550000001823071206131355_s1_p0.fastq
m130119_132527_42156_c100475032550000001823071206131356_s1_p0.fastq
m130119_154427_42156_c100475032550000001823071206131357_s1_p0.fastq
m130119_210207_42156_c100474922550000001823071206131300_s1_p0.fastq
m130119_232057_42156_c100474922550000001823071206131301_s1_p0.fastq
m130120_014041_42156_c100474922550000001823071206131302_s1_p0.fastq
m130120_035839_42156_c100474922550000001823071206131303_s1_p0.fastq
m130120_061714_42156_c100474922550000001823071206131304_s1_p0.fastq
m130120_083627_42156_c100474922550000001823071206131305_s1_p0.fastq
m130120_105426_42156_c100474922550000001823071206131306_s1_p0.fastq
m130120_131348_42156_c100474922550000001823071206131307_s1_p0.fastq
m130120_173828_42156_c100475012550000001823071206131374_s1_p0.fastq
m130120_195857_42156_c100475012550000001823071206131375_s1_p0.fastq
m130120_202930_42147_c100474912550000001823071206131310_s1_p0.fastq
m130120_221637_42156_c100475012550000001823071206131376_s1_p0.fastq
m130120_225024_42147_c100474912550000001823071206131311_s1_p0.fastq
m130121_003536_42156_c100475012550000001823071206131377_s1_p0.fastq
m130121_025501_42156_c100475662550000001823071206131346_s1_p0.fastq
m130121_032458_42147_c100474912550000001823071206131313_s1_p0.fastq
m130121_051255_42156_c100475662550000001823071206131347_s1_p0.fastq
m130121_054354_42147_c100474912550000001823071206131314_s1_p0.fastq
m130121_080531_42147_c100474912550000001823071206131315_s1_p0.fastq
m130121_102822_42147_c100474912550000001823071206131316_s1_p0.fastq
m130121_125013_42147_c100474912550000001823071206131317_s1_p0.fastq
m130121_172100_42147_c100475042550000001823071206131340_s1_p0.fastq
m130121_194556_42147_c100475042550000001823071206131341_s1_p0.fastq
m130121_220825_42147_c100475042550000001823071206131342_s1_p0.fastq
m130122_003009_42147_c100475042550000001823071206131343_s1_p0.fastq
m130122_025136_42147_c100475042550000001823071206131344_s1_p0.fastq
m130122_051257_42147_c100475042550000001823071206131345_s1_p0.fastq
m130123_010458_42147_c100475712550000001823071206131360_s1_p0.fastq
m130123_032402_42147_c100475712550000001823071206131361_s1_p0.fastq
m130123_054603_42147_c100475712550000001823071206131362_s1_p0.fastq
m130123_080814_42147_c100475712550000001823071206131363_s1_p0.fastq
m130123_102700_42147_c100475712550000001823071206131364_s1_p0.fastq
m130123_124834_42147_c100475712550000001823071206131365_s1_p0.fastq
m130123_151155_42147_c100475712550000001823071206131366_s1_p0.fastq
m130123_173355_42147_c100475712550000001823071206131367_s1_p0.fastq
m130124_004631_42147_c100475542550000001823069506131350_s1_p0.fastq
m130124_030813_42147_c100475542550000001823069506131351_s1_p0.fastq
m130124_052947_42147_c100475542550000001823069506131352_s1_p0.fastq
m130124_075326_42147_c100475542550000001823069506131353_s1_p0.fastq
m130124_101515_42147_c100475542550000001823069506131354_s1_p0.fastq
m130124_123809_42147_c100475542550000001823069506131355_s1_p0.fastq
m130124_150057_42147_c100475542550000001823069506131356_s1_p0.fastq
m130124_172325_42147_c100475542550000001823069506131357_s1_p0.fastq
m130124_190905_42156_c100476192550000001823069506131312_s1_p0.fastq
m130124_212819_42156_c100476192550000001823069506131313_s1_p0.fastq
m130124_215645_42147_c100476262550000001823069506131310_s1_p0.fastq
m130124_234753_42156_c100476192550000001823069506131314_s1_p0.fastq
m130125_001849_42147_c100476262550000001823069506131311_s1_p0.fastq
m130125_020647_42156_c100476192550000001823069506131315_s1_p0.fastq
m130125_024235_42147_c100476262550000001823069506131312_s1_p0.fastq
m130125_042528_42156_c100476192550000001823069506131316_s1_p0.fastq
m130125_050728_42147_c100476262550000001823069506131313_s1_p0.fastq
m130125_064410_42156_c100476192550000001823069506131317_s1_p0.fastq
m130125_073019_42147_c100476262550000001823069506131314_s1_p0.fastq
m130125_090229_42156_c100475812550000001823071206131336_s1_p0.fastq
m130125_095308_42147_c100476262550000001823069506131315_s1_p0.fastq
m130125_112212_42156_c100475812550000001823071206131337_s1_p0.fastq
m130125_210309_42156_c100475972550000001823069506131300_s1_p0.fastq
m130125_223202_42147_c100475552550000001823069506131340_s1_p0.fastq
m130125_232248_42156_c100475972550000001823069506131301_s1_p0.fastq
m130126_005332_42147_c100475552550000001823069506131341_s1_p0.fastq
m130126_011034_42137_c100466662550000001523052205101320_s1_p0.fastq
m130126_014058_42156_c100475972550000001823069506131302_s1_p0.fastq
m130126_031723_42147_c100475552550000001823069506131342_s1_p0.fastq
m130126_033039_42137_c100466662550000001523052205101321_s1_p0.fastq
m130126_040058_42156_c100475972550000001823069506131303_s1_p0.fastq
m130126_054912_42137_c100466662550000001523052205101322_s1_p0.fastq
m130126_061949_42156_c100475972550000001823069506131304_s1_p0.fastq
m130126_080611_42147_c100475552550000001823069506131344_s1_p0.fastq
m130126_080815_42137_c100466662550000001523052205101323_s1_p0.fastq
m130126_083748_42156_c100475972550000001823069506131305_s1_p0.fastq
m130126_102637_42137_c100466662550000001523052205101324_s1_p0.fastq
m130126_103017_42147_c100475552550000001823069506131345_s1_p0.fastq
m130126_105733_42156_c100475972550000001823069506131306_s1_p0.fastq
m130126_124512_42137_c100466662550000001523052205101325_s1_p0.fastq
m130126_125738_42147_c100475552550000001823069506131346_s1_p0.fastq
m130126_150332_42137_c100466662550000001523052205101326_s1_p0.fastq
m130126_152218_42147_c100475552550000001823069506131347_s1_p0.fastq
m130126_153859_42156_c100475932550000001823069506131340_s1_p0.fastq
m130126_172220_42137_c100466662550000001523052205101327_s1_p0.fastq
m130126_175810_42156_c100475932550000001823069506131341_s1_p0.fastq
m130126_195729_42147_c100476162550000001823069506131340_s1_p0.fastq
m130126_201724_42156_c100475932550000001823069506131342_s1_p0.fastq
m130126_222006_42147_c100476162550000001823069506131341_s1_p0.fastq
m130126_223539_42156_c100475932550000001823069506131343_s1_p0.fastq
m130127_004330_42147_c100476162550000001823069506131342_s1_p0.fastq
m130127_005518_42156_c100475932550000001823069506131344_s1_p0.fastq
m130127_030736_42147_c100476162550000001823069506131343_s1_p0.fastq
m130127_031405_42156_c100475932550000001823069506131345_s1_p0.fastq
m130127_053041_42147_c100476162550000001823069506131344_s1_p0.fastq
m130127_074516_42137_c100473322550000001823070606131380_s1_p0.fastq
m130127_075418_42147_c100476162550000001823069506131345_s1_p0.fastq
m130127_100442_42137_c100473322550000001823070606131381_s1_p0.fastq
m130127_211500_42156_c100476102550000001823069506131300_s1_p0.fastq
m130127_233407_42156_c100476102550000001823069506131301_s1_p0.fastq
m130128_015355_42156_c100476102550000001823069506131302_s1_p0.fastq
m130128_041231_42156_c100476102550000001823069506131303_s1_p0.fastq
m130128_063218_42156_c100476102550000001823069506131304_s1_p0.fastq
m130128_085101_42156_c100476102550000001823069506131305_s1_p0.fastq
m130128_110901_42156_c100476102550000001823069506131306_s1_p0.fastq
m130128_132739_42156_c100476102550000001823069506131307_s1_p0.fastq
m130128_175437_42156_c100475572550000001823069506131320_s1_p0.fastq
m130128_201415_42156_c100475572550000001823069506131321_s1_p0.fastq
m130128_223251_42156_c100475572550000001823069506131322_s1_p0.fastq
m130129_005059_42156_c100475572550000001823069506131323_s1_p0.fastq
m130129_013443_42147_c100477952550000001823070006131310_s1_p0.fastq
m130129_030945_42156_c100475572550000001823069506131324_s1_p0.fastq
m130129_035701_42147_c100477952550000001823070006131311_s1_p0.fastq
m130129_052836_42156_c100475572550000001823069506131325_s1_p0.fastq
m130129_062011_42147_c100477952550000001823070006131312_s1_p0.fastq
m130129_084401_42147_c100477952550000001823070006131313_s1_p0.fastq
m130129_110723_42147_c100477952550000001823070006131314_s1_p0.fastq
m130129_133053_42147_c100477952550000001823070006131315_s1_p0.fastq
m130129_155718_42147_c100477952550000001823070006131316_s1_p0.fastq
m130129_182052_42147_c100477952550000001823070006131317_s1_p0.fastq
m130129_210852_42156_c100478082550000001823070006131340_s1_p0.fastq
m130129_231456_42147_c100477892550000001823070006131300_s1_p0.fastq
m130129_232737_42156_c100478082550000001823070006131341_s1_p0.fastq
m130130_000913_42137_c100474042550000001823070606131340_s1_p0.fastq
m130130_013740_42147_c100477892550000001823070006131301_s1_p0.fastq
m130130_014709_42156_c100478082550000001823070006131342_s1_p0.fastq
m130130_022842_42137_c100474042550000001823070606131341_s1_p0.fastq
m130130_040100_42147_c100477892550000001823070006131302_s1_p0.fastq
m130130_040655_42156_c100478082550000001823070006131343_s1_p0.fastq
m130130_044656_42137_c100474042550000001823070606131342_s1_p0.fastq
m130130_062426_42147_c100477892550000001823070006131303_s1_p0.fastq
m130130_062527_42156_c100478082550000001823070006131344_s1_p0.fastq
m130130_070725_42137_c100474042550000001823070606131343_s1_p0.fastq
m130130_084538_42156_c100478082550000001823070006131345_s1_p0.fastq
m130130_084803_42147_c100477892550000001823070006131304_s1_p0.fastq
m130130_092438_42137_c100474042550000001823070606131344_s1_p0.fastq
m130130_110348_42156_c100478082550000001823070006131346_s1_p0.fastq
m130130_111227_42147_c100477892550000001823070006131305_s1_p0.fastq
m130130_114324_42137_c100474042550000001823070606131345_s1_p0.fastq
m130130_132227_42156_c100478082550000001823070006131347_s1_p0.fastq
m130130_133902_42147_c100477892550000001823070006131306_s1_p0.fastq
m130130_140315_42137_c100474042550000001823070606131346_s1_p0.fastq
m130130_160328_42147_c100477892550000001823070006131307_s1_p0.fastq
m130130_162147_42137_c100474042550000001823070606131347_s1_p0.fastq
m130130_201801_42156_c100476162550000001823069506131346_s1_p0.fastq
m130130_223800_42156_c100476162550000001823069506131347_s1_p0.fastq
m130130_225950_42137_c100473832550000001823070606131320_s1_p0.fastq
m130131_005634_42156_c100475572550000001823069506131326_s1_p0.fastq
m130131_005701_42147_c100477862550000001823070006131330_s1_p0.fastq
m130131_012041_42137_c100473832550000001823070606131321_s1_p0.fastq
m130131_031640_42156_c100475572550000001823069506131327_s1_p0.fastq
m130131_033808_42137_c100473832550000001823070606131322_s1_p0.fastq
m130131_053606_42156_c100475932550000001823069506131346_s1_p0.fastq
m130131_055648_42137_c100473832550000001823070606131323_s1_p0.fastq
m130131_075503_42156_c100475932550000001823069506131347_s1_p0.fastq
m130131_081538_42137_c100473832550000001823070606131324_s1_p0.fastq
m130131_101352_42156_c100478232550000001823070006131330_s1_p0.fastq
m130131_103426_42137_c100473832550000001823070606131325_s1_p0.fastq
m130131_123157_42156_c100478232550000001823070006131331_s1_p0.fastq
m130131_125239_42137_c100473832550000001823070606131326_s1_p0.fastq
m130131_151205_42137_c100473832550000001823070606131327_s1_p0.fastq
m130131_205550_42156_c100478232550000001823070006131332_s1_p0.fastq
m130131_231610_42156_c100478232550000001823070006131333_s1_p0.fastq
m130201_011853_42137_c100474152550000001823070606131300_s1_p0.fastq
m130201_013435_42156_c100478232550000001823070006131334_s1_p0.fastq
m130201_033916_42137_c100474152550000001823070606131301_s1_p0.fastq
m130201_035212_42156_c100478232550000001823070006131335_s1_p0.fastq
m130201_055748_42137_c100474152550000001823070606131302_s1_p0.fastq
m130201_061147_42156_c100478232550000001823070006131336_s1_p0.fastq
m130201_081555_42137_c100474152550000001823070606131303_s1_p0.fastq
m130201_083014_42156_c100478232550000001823070006131337_s1_p0.fastq
m130201_103502_42137_c100474152550000001823070606131304_s1_p0.fastq
m130201_104829_42156_c100477972550000001823070006131390_s1_p0.fastq
m130201_125337_42137_c100474152550000001823070606131305_s1_p0.fastq
m130201_130749_42156_c100477972550000001823070006131391_s1_p0.fastq
m130201_151203_42137_c100474152550000001823070606131306_s1_p0.fastq
m130201_173035_42137_c100474152550000001823070606131307_s1_p0.fastq
m130202_002231_42147_c100478012550000001823070006131310_s1_p0.fastq
m130202_005553_42137_c100474122550000001823070606131330_s1_p0.fastq
m130202_024526_42147_c100478012550000001823070006131311_s1_p0.fastq
m130202_031602_42137_c100474122550000001823070606131331_s1_p0.fastq
m130202_050909_42147_c100478012550000001823070006131312_s1_p0.fastq
m130202_053423_42137_c100474122550000001823070606131332_s1_p0.fastq
m130202_073253_42147_c100478012550000001823070006131313_s1_p0.fastq
m130202_075326_42137_c100474122550000001823070606131333_s1_p0.fastq
m130202_095549_42147_c100478012550000001823070006131314_s1_p0.fastq
m130202_101156_42137_c100474122550000001823070606131334_s1_p0.fastq
m130202_121918_42147_c100478012550000001823070006131315_s1_p0.fastq
m130202_123027_42137_c100474122550000001823070606131335_s1_p0.fastq
m130202_144539_42147_c100478012550000001823070006131316_s1_p0.fastq
m130202_145004_42137_c100474122550000001823070606131336_s1_p0.fastq
m130202_170919_42137_c100474122550000001823070606131337_s1_p0.fastq
m130202_170948_42147_c100478012550000001823070006131317_s1_p0.fastq
m130202_214446_42147_c100478052550000001823070006131370_s1_p0.fastq
m130203_000820_42147_c100478052550000001823070006131371_s1_p0.fastq
m130203_023311_42147_c100478052550000001823070006131372_s1_p0.fastq
m130203_045652_42147_c100478052550000001823070006131373_s1_p0.fastq
m130203_072023_42147_c100478052550000001823070006131374_s1_p0.fastq
m130203_094432_42147_c100478052550000001823070006131375_s1_p0.fastq
m130204_235736_42137_c100472822550000001823070606131340_s1_p0.fastq
m130205_010706_42147_c100477942550000001823070006131320_s1_p0.fastq
m130205_021844_42137_c100472822550000001823070606131341_s1_p0.fastq
m130205_033011_42147_c100477942550000001823070006131321_s1_p0.fastq
m130205_043756_42137_c100472822550000001823070606131342_s1_p0.fastq
m130205_065558_42137_c100472822550000001823070606131343_s1_p0.fastq
m130205_080918_42147_c100477942550000001823070006131323_s1_p0.fastq
m130205_091353_42137_c100472822550000001823070606131344_s1_p0.fastq
m130205_103218_42147_c100477942550000001823070006131324_s1_p0.fastq
m130205_113203_42137_c100472822550000001823070606131345_s1_p0.fastq
m130205_125534_42147_c100477942550000001823070006131325_s1_p0.fastq
m130205_135053_42137_c100472822550000001823070606131346_s1_p0.fastq
m130205_152157_42147_c100477942550000001823070006131326_s1_p0.fastq
m130205_161055_42137_c100472822550000001823070606131347_s1_p0.fastq
m130205_174616_42147_c100477942550000001823070006131327_s1_p0.fastq
m130205_222009_42147_c100478002550000001823070006131320_s1_p0.fastq
m130206_001116_42137_c100473772550000001823070606131310_s1_p0.fastq
m130206_004245_42147_c100478002550000001823070006131321_s1_p0.fastq
m130206_022948_42137_c100473772550000001823070606131311_s1_p0.fastq
m130206_030658_42147_c100478002550000001823070006131322_s1_p0.fastq
m130206_044856_42137_c100473772550000001823070606131312_s1_p0.fastq
m130206_053108_42147_c100478002550000001823070006131323_s1_p0.fastq
m130206_070707_42137_c100473772550000001823070606131313_s1_p0.fastq
m130206_075454_42147_c100478002550000001823070006131324_s1_p0.fastq
m130206_092558_42137_c100473772550000001823070606131314_s1_p0.fastq
m130206_101810_42147_c100478002550000001823070006131325_s1_p0.fastq
m130206_114553_42137_c100473772550000001823070606131315_s1_p0.fastq
m130206_140525_42137_c100473772550000001823070606131316_s1_p0.fastq
m130206_162342_42137_c100473772550000001823070606131317_s1_p0.fastq
m130207_000530_42137_c100473682550000001823070606131330_s1_p0.fastq
m130207_005154_42147_c100470162550000001823071006131300_s1_p0.fastq
m130207_022431_42137_c100473682550000001823070606131331_s1_p0.fastq
m130207_031514_42147_c100470162550000001823071006131301_s1_p0.fastq
m130207_044244_42137_c100473682550000001823070606131332_s1_p0.fastq
m130207_053947_42147_c100470162550000001823071006131302_s1_p0.fastq
m130207_070225_42137_c100473682550000001823070606131333_s1_p0.fastq
m130207_080408_42147_c100470162550000001823071006131303_s1_p0.fastq
m130207_092014_42137_c100473682550000001823070606131334_s1_p0.fastq
m130207_102825_42147_c100470162550000001823071006131304_s1_p0.fastq
m130207_113827_42137_c100473682550000001823070606131335_s1_p0.fastq
m130207_125158_42147_c100470162550000001823071006131305_s1_p0.fastq
m130207_135800_42137_c100473682550000001823070606131336_s1_p0.fastq
m130207_151842_42147_c100470162550000001823071006131306_s1_p0.fastq
m130207_161654_42137_c100473682550000001823070606131337_s1_p0.fastq
m130207_174300_42147_c100470162550000001823071006131307_s1_p0.fastq
m130207_215928_42156_c100470562550000001823071006131380_s1_p0.fastq
m130207_222154_42147_c100477972550000001823070006131394_s1_p0.fastq
m130207_235328_42137_c100474062550000001823070606131320_s1_p0.fastq
m130208_001738_42156_c100470562550000001823071006131381_s1_p0.fastq
m130208_004525_42147_c100477972550000001823070006131395_s1_p0.fastq
m130208_021313_42137_c100474062550000001823070606131321_s1_p0.fastq
m130208_023618_42156_c100470562550000001823071006131382_s1_p0.fastq
m130208_030940_42147_c100477972550000001823070006131396_s1_p0.fastq
m130208_043227_42137_c100474062550000001823070606131322_s1_p0.fastq
m130208_045457_42156_c100470562550000001823071006131383_s1_p0.fastq
m130208_053352_42147_c100477972550000001823070006131397_s1_p0.fastq
m130208_065039_42137_c100474062550000001823070606131323_s1_p0.fastq
m130208_071322_42156_c100470562550000001823071006131384_s1_p0.fastq
m130208_075818_42147_c100478052550000001823070006131376_s1_p0.fastq
m130208_090925_42137_c100474062550000001823070606131324_s1_p0.fastq
m130208_093223_42156_c100470562550000001823071006131385_s1_p0.fastq
m130208_102218_42147_c100478052550000001823070006131377_s1_p0.fastq
m130208_112802_42137_c100474062550000001823070606131325_s1_p0.fastq
m130208_115336_42156_c100470562550000001823071006131386_s1_p0.fastq
m130208_134627_42137_c100474062550000001823070606131326_s1_p0.fastq
m130208_141244_42156_c100470562550000001823071006131387_s1_p0.fastq
m130208_160558_42137_c100474062550000001823070606131327_s1_p0.fastq
m130208_221027_42156_c100470302550000001823071006131300_s1_p0.fastq
m130208_234840_42147_c100470542550000001823071006131300_s1_p0.fastq
m130209_002946_42156_c100470302550000001823071006131301_s1_p0.fastq
m130209_021152_42147_c100470542550000001823071006131301_s1_p0.fastq
m130209_024824_42156_c100470302550000001823071006131302_s1_p0.fastq
m130209_043542_42147_c100470542550000001823071006131302_s1_p0.fastq
m130209_050627_42156_c100470302550000001823071006131303_s1_p0.fastq
m130209_065959_42147_c100470542550000001823071006131303_s1_p0.fastq
m130209_072546_42156_c100470302550000001823071006131304_s1_p0.fastq
m130209_092420_42147_c100470542550000001823071006131304_s1_p0.fastq
m130209_094421_42156_c100470302550000001823071006131305_s1_p0.fastq
m130209_114926_42147_c100470542550000001823071006131305_s1_p0.fastq
m130209_120304_42156_c100470302550000001823071006131306_s1_p0.fastq
m130209_141605_42147_c100470542550000001823071006131306_s1_p0.fastq
m130209_142147_42156_c100470302550000001823071006131307_s1_p0.fastq
m130209_164101_42147_c100470542550000001823071006131307_s1_p0.fastq
m130209_184854_42156_c100470502550000001823071006131340_s1_p0.fastq
m130209_210740_42156_c100470502550000001823071006131341_s1_p0.fastq
m130209_211716_42147_c100470492550000001823071006131380_s1_p0.fastq
m130209_232615_42156_c100470502550000001823071006131342_s1_p0.fastq
m130209_234047_42147_c100470492550000001823071006131381_s1_p0.fastq
m130210_014622_42156_c100470502550000001823071006131343_s1_p0.fastq
m130210_020459_42147_c100470492550000001823071006131382_s1_p0.fastq
m130210_040510_42156_c100470502550000001823071006131344_s1_p0.fastq
m130210_042853_42147_c100470492550000001823071006131383_s1_p0.fastq
m130210_062210_42156_c100470502550000001823071006131345_s1_p0.fastq
m130210_065159_42147_c100470492550000001823071006131384_s1_p0.fastq
m130210_091600_42147_c100470492550000001823071006131385_s1_p0.fastq
m130212_011918_42147_c100476362550000001823072406131350_s1_p0.fastq
m130212_034319_42147_c100476362550000001823072406131351_s1_p0.fastq
m130212_060712_42147_c100476362550000001823072406131352_s1_p0.fastq
m130212_083123_42147_c100476362550000001823072406131353_s1_p0.fastq
m130212_105530_42147_c100476362550000001823072406131354_s1_p0.fastq
m130212_132006_42147_c100476362550000001823072406131355_s1_p0.fastq
m130212_154759_42147_c100476362550000001823072406131356_s1_p0.fastq
m130212_181249_42147_c100476362550000001823072406131357_s1_p0.fastq
m130213_011503_42147_c100469932550000001823072506131370_s1_p0.fastq
m130213_033833_42147_c100469932550000001823072506131371_s1_p0.fastq
m130213_060150_42147_c100469932550000001823072506131372_s1_p0.fastq
m130213_082542_42147_c100469932550000001823072506131373_s1_p0.fastq
m130213_104751_42147_c100469932550000001823072506131374_s1_p0.fastq
m130213_131013_42147_c100469932550000001823072506131375_s1_p0.fastq
m130213_153435_42147_c100469932550000001823072506131376_s1_p0.fastq
m130213_175623_42147_c100469932550000001823072506131377_s1_p0.fastq
m130213_191206_42156_c100470522550000001823071006131320_s1_p0.fastq
m130213_213125_42156_c100470522550000001823071006131321_s1_p0.fastq
m130213_222843_42147_c100470222550000001823072506131350_s1_p0.fastq
m130213_234951_42156_c100470522550000001823071006131322_s1_p0.fastq
m130214_005012_42147_c100470222550000001823072506131351_s1_p0.fastq
m130214_020903_42156_c100470522550000001823071006131323_s1_p0.fastq
m130214_031212_42147_c100470222550000001823072506131352_s1_p0.fastq
m130214_042949_42156_c100470522550000001823071006131324_s1_p0.fastq
m130214_053410_42147_c100470222550000001823072506131353_s1_p0.fastq
m130214_064818_42156_c100470522550000001823071006131325_s1_p0.fastq
m130214_075706_42147_c100470222550000001823072506131354_s1_p0.fastq
m130214_090606_42156_c100470522550000001823071006131326_s1_p0.fastq
m130214_101954_42147_c100470222550000001823072506131355_s1_p0.fastq
m130214_112629_42156_c100470522550000001823071006131327_s1_p0.fastq
m130214_211959_42156_c100476842550000001823070806131320_s1_p0.fastq
m130214_233823_42156_c100476842550000001823070806131321_s1_p0.fastq
m130215_015850_42156_c100476842550000001823070806131322_s1_p0.fastq
m130215_020515_42147_c100476732550000001823072406131360_s1_p0.fastq
m130215_041820_42156_c100476842550000001823070806131323_s1_p0.fastq
m130215_042755_42147_c100476732550000001823072406131361_s1_p0.fastq
m130215_063830_42156_c100476842550000001823070806131324_s1_p0.fastq
m130215_065124_42147_c100476732550000001823072406131362_s1_p0.fastq
m130215_085750_42156_c100476842550000001823070806131325_s1_p0.fastq
m130215_091503_42147_c100476732550000001823072406131363_s1_p0.fastq
m130215_111637_42156_c100476842550000001823070806131326_s1_p0.fastq
m130215_133523_42156_c100476842550000001823070806131327_s1_p0.fastq

### Output List
Final output files were moved to s3 server. The main transcribed locus alignments (na12878_main.bam) will additionally be downloaded and added to the git repo lfs.

NOTE: The na12878_main.N files had triplicates of the exact same read. There is a bug somewhere in the code above (but I alreadyd generated everything). Ugly-simple fix is to use the `samtools rmdup -s` command on the files and they are made smaller.

`aws s3 ls s3://crownproject/na12878`

```
                           PRE subalignments0/
2016-11-07 23:15:08      42364 fastq.list
2016-11-09 20:53:49      32921 header.list.0.txt
2016-11-09 20:45:59      82784 header.list.1.txt
2016-11-09 20:48:34      81304 header.list.2.txt
2016-11-09 20:53:36      82043 header.list.3.txt
2016-11-09 20:55:21      80931 header.list.4.txt
2016-11-09 20:56:03      69085 header.list.5.txt
2016-11-07 23:15:08    1083166 hgr.fa
2016-11-09 20:45:44   13125239 na12878.pb.hgr_1.bam
2016-11-09 20:50:00   14354713 na12878.pb.hgr_2.bam
2016-11-09 20:58:57   14669745 na12878.pb.hgr_3.bam
2016-11-09 20:57:30   11398700 na12878.pb.hgr_4.bam
2016-11-09 20:57:41    9934603 na12878.pb.hgr_5.bam
2016-11-08 02:42:56 5280170572 na12878_hgr.0.bam
2016-11-08 02:46:26      11808 na12878_hgr.0.bam.bai
2016-11-09 20:31:32 6192591922 na12878_hgr.1.bam
2016-11-09 20:44:51      14304 na12878_hgr.1.bam.bai
2016-11-09 20:32:07 5282131953 na12878_hgr.2.bam
2016-11-09 20:47:27      13776 na12878_hgr.2.bam.bai
2016-11-09 20:51:04 6258194977 na12878_hgr.3.bam
2016-11-09 20:53:18      15152 na12878_hgr.3.bam.bai
2016-11-09 20:55:31 5257478504 na12878_hgr.4.bam
2016-11-09 20:55:15      12960 na12878_hgr.4.bam.bai
2016-11-09 20:53:03 5168341615 na12878_hgr.5.bam
2016-11-09 20:55:56      12608 na12878_hgr.5.bam.bai
2016-11-08 02:46:34   10905171 na12878_main.0.bam
2016-11-08 02:46:41       5296 na12878_main.0.bam.bai
2016-11-09 20:45:32   21956559 na12878_main.1.bam
2016-11-09 20:46:59   23720888 na12878_main.2.bam
2016-11-09 20:53:27   24404125 na12878_main.3.bam
2016-11-09 20:59:26   18557275 na12878_main.4.bam
2016-11-09 20:56:15   16850258 na12878_main.5.bam
2016-11-09 20:54:15         31 rDNA_main.bed
2016-11-09 20:54:22       1399 s3_blasrAlign.sh
```

### Visualizing the Alignment
**NA12878 Pacbio 1kGenomes Pilot Alignment over 18S. File: na12878_main.0.bam**
![NA12878 Pacbio 1kGenomes Pilot Alignment over 18S](../figure/20161107_NA12878_pacbio_pilot_align.png)
    This is with soft-clipping (removes large indels).

## Discussion

**NA12878 Pacbio Alignments and NA19240 Illumina Alignment over 18S**
![PB Alignments](../figure/20161110_NA12878_pacbio_all_align.png)

**NA12878 Pacbio Alignments and NA19240 Illumina Alignment over 28S**
![PB Alignments](../figure/20161110_NA12878_pacbio_all_28Salign.png)


As seen above. The variations are pretty consistent across the fastq files (technical replicates) and some but not all are seen in the NA19240 illumina library as well as the NA12878 pacbio library. So some of those variations appear to be bonafide in DNA at different levels.

![ g.1012388 Variant ](../figure/20161110_g13012388.png)

| Variant | NA12878 (%) | NA19240 (%) |
| -------------------- |:-------------:| -----:|
| chr13:g.1,012,388G   | 10 %  | 0 % |
| chr13:g.1,012,388G>A | 80 % | 99 % |
| chr13:g.1,012,388G>C | 9 %  | 0 % |
| chr13:g.1,012,388G>T | 0 %  | 0 % |


### Conclusions

- Long pacbio reads rule-out the hypothesis that observed illumina-variants are simply pseudogenes/genomic variants since they phase the variations as occuring within an rDNA array context. The 45S rRNA variants are within a larger rDNA array, not elsewhere in the genome.
- Within 45S rDNA there is intra-indivual variation concordance between pac-bio and illumnia reads (NA12878)
- There are differences between individuals NA12878, and NA19240 suggesting two levels of rDNA variation in humans: intra- and inter-individual variation.

### Future Experiments
- Use the pacbio alignnments to do structural variation calls
- Derive a consensus sequence and variants from NA12878 for future alignments (graph-based aligners?)
- Develop a method for rapidly aligning 1kg data to hgr and do automated variant calling and analysis

