/
IntronicPASenrichment.Rmd
71 lines (46 loc) · 2.65 KB
/
IntronicPASenrichment.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
title: "IntronicPASenrichment"
author: "Briana Mittleman"
date: "9/16/2019"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r}
library(tidyverse)
```
In this analysis I will look at the enrichment of intron pas over all PAS. I need to get the size of all intronic regions and the size of all regions. I can do this by merging the annotations.
I can use bedtools merge on
ncbiRefSeq_FormatedallAnnotation.sort.bed and the intron regions of this.
```{r}
annotation=read.table("/project2/gilad/briana/genome_anotation_data/RefSeq_annotations/ncbiRefSeq_FormatedallAnnotation.sort.bed", col.names = c("chr", "start", "end", "id", "score", "strand")) %>% separate(id,into=c("loc", "gene"),sep=":")
intron=annotation %>% filter(loc=="intron")
write.table(intron, "/project2/gilad/briana/genome_anotation_data/RefSeq_annotations/ncbiRefSeq_intron.bed", col.names = F, row.names = F,quote = F, sep="\t")
```
sort:
```{bash,eval=F}
sort -k1,1 -k2,2n /project2/gilad/briana/genome_anotation_data/RefSeq_annotations/ncbiRefseq_intron.dms > /project2/gilad/briana/genome_anotation_data/RefSeq_annotations/ncbiRefseq_intron.sort.dms
sed 's/^chr//' /project2/gilad/briana/genome_anotation_data/RefSeq_annotations/ncbiRefseq_intron.sort.dms > /project2/gilad/briana/genome_anotation_data/RefSeq_annotations/ncbiRefseq_intron.sort.noCHR.bed
bedtools merge -i /project2/gilad/briana/genome_anotation_data/RefSeq_annotations/ncbiRefseq_intron.sort.noCHR.bed -s > /project2/gilad/briana/genome_anotation_data/RefSeq_annotations/ncbiRefSeq_Merged_intron.sort.bed
```
```{r}
chroms=as.character(seq(1,22))
intronMerged=read.table("/project2/gilad/briana/genome_anotation_data/RefSeq_annotations/ncbiRefSeq_Merged_intron.sort.bed",col.names = c("chr", "start", "end")) %>% mutate(length=end-start) %>% filter(chr %in% chroms)
allMerged=read.table("/project2/gilad/briana/genome_anotation_data/RefSeq_annotations/ncbiRefSeq_Merged.FormatedallAnnotation.sort.bed" , col.names = c("chr", "start", "end")) %>% mutate(length=end-start) %>% filter(chr %in% chroms)
intronLength=sum(intronMerged$length)
totalLength=sum(allMerged$length)
```
N pas in intron
```{r}
intronpas=read.table("../data/PAS/APAPAS_GeneLocAnno.5perc.bed", col.names = c("chr", "start", "end", "id","score", "strand")) %>% separate(id, into = c("pasnum", "geneID"), sep=":") %>% separate(geneID, into=c("gene", "loc"),sep="_") %>% filter(loc=="intron")
allPAS=read.table("../data/PAS/APAPAS_GeneLocAnno.5perc.bed")
```
Values:
```{r}
pas_length=nrow(allPAS)/totalLength
intron_length=nrow(intronpas)/intronLength
enrichment=intron_length/pas_length
enrichment
```
This is a 0.35 enrichment.