# Primary rDNA Alignment
```
pi:ababaian
start: 2016 05 09
complete : 2016 07 06
```
## Introduction

In the RNA-seq data (Variant Analysis 1) a few notable and recurrant variants were identified. Most importantly, 18S[U1248C/-] at chr13:1,004,904.

This hyper-modified base is likely 'variable' due to errors by the RT reaction. To test this hypothesis I will align gDNA to hgr and see if there is a T/C/- polymorphism present at the rDNA level. If it's absent completely then it stands to reason that the base is modified to C or deletion, or more likely, it's an RT error.

Note: If this modification is absent at the DNA level completely (at background) this will be really informative since it means that the RNAseq data from poly-A selected sequencing also has true rRNA and not contaminating pseudo-rRNA.

## Objective

* Align human genome sequencing data to hgr and test if there exists polymorphic U1248C or if this is a RNA-specific phenominon.

*  Measure the level of ribosomal RNA sequence variation at the level of rDNA.

## Materials and Methods


### Data Aquisition

The easiest way to aquire DNA sequence is from the [1000 genomes project](www.1000genomes.org). First pass will be using a normal whole-genome sequence run.


#### [NA19240](http://www.1000genomes.org/data-portal/sample/NA19240)

Yoruba (YOR) Female. Mother: NA19238 and Father: NA19239

[Data Description Index](ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/20130502.phase3.sequence.index). [FTP Master Directory](ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/)

[ INSERT DATA DESCRIPTION HERE ]

```
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA19240/sequence_read/SRR794330_1.filt.fastq.gz
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA19240/sequence_read/SRR794330_2.filt.fastq.gz
```
Data moved to `~/data/1kgenomes/` 

Added to .gitignore
```
data/1kgenomes/*.fastq.gz
data/1kgenomes/*.bam
```

### Alignment

Bowtie2 alignment Command to hgr.fa genome
```
#Move to 1kGenomes directory
cd ~/Crown/data/1kgenomes/

# Bowtie2 Alignment to hgr genome
bowtie2 -x ~/Crown/resources/hgr/hgr -1 SRR794330_1.filt.fastq.gz -2 SRR794330_2.filt.fastq.gz --very-sensitive | samtools view -bS - > NA19240_hgr.bam

# Sort and Indexalignment
samtools sort NA19240_hgr.bam -o NA19240_hgr.sort.bam
mv NA19240_hgr.sort.bam NA19240.bam
samtools index NA19240.bam

```

In [3]:
samtools flagstat NA19240_hgr.bam

71087208 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplimentary
0 + 0 duplicates
596582 + 0 mapped (0.84%:-nan%)
71087208 + 0 paired in sequencing
35543604 + 0 read1
35543604 + 0 read2
322054 + 0 properly paired (0.45%:-nan%)
347818 + 0 with itself and mate mapped
248764 + 0 singletons (0.35%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)


## Results

#### 18S U1248
```
chr13:1,004,904
Total count: 626
A      : 1  (0%,     1+,   0- )
C      : 2  (0%,     0+,   2- )
G      : 0
T      : 623  (100%,     343+,   280- )
N      : 0
---------------
```
![Variation at genomic 18S U12848](../figure/20160509_NA19240_18S_U1248.png)
```
chr13:1,004,903
Total count: 631
A      : 0
C      : 630  (100%,     347+,   283- )
G      : 0
T      : 1  (0%,     1+,   0- )
N      : 0
---------------
DEL: 0
INS: 1
```


#### Other Variants

## Discussion

At least in this person's rDNA the variant at U1248 isn't present. All together this seems like RNA samples consistently have the U1248 variation yet it's absent in DNA.

This is actually quite interesting since U1248 is a 'hypermodified' uracil with problems of reverse transcription the variant is likely an artifact. While this may not be biologically 'pertinent' with respect to functional variation this means that the RNA seq experiments which have this variation present (even if poly-A selection occurs) contain true, modified, rRNA and thus the rest of the rRNA sequencing is indicitive of what mature rRNA looks like in the cell.

Aquire RNA-seq from NA19240 to confirm that her RNA se  