-
Notifications
You must be signed in to change notification settings - Fork 2
/
newQTLheatmap.Rmd
140 lines (95 loc) · 3.94 KB
/
newQTLheatmap.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
title: "New QTL heatmap"
author: "Briana Mittleman"
date: "4/29/2019"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r}
library(gdata)
library(workflowr)
library(gplots)
library(tidyverse)
```
##Compare QTLs to those found with previous batch data
I have about double the QTLs hear compared to before resequencing batch 4. I will look at the new QTL to see if there is evidence for them being false positives. I am going to see if there is structure in the genotypes for these QTLs.
The old QTLs are from the threeprimeseq repository.
###Total
Import old QTLs
```{r}
oldtot=read.table("../../threeprimeseq/data/perm_APAqtl_GeneLocAnno_noMP_5percUs/filtered_APApeaks_merged_allchrom_refseqGenes.GeneLocAnno.NoMP_sm_quant.Total.fixed.pheno_5perc_permResBH.txt", header=T,stringsAsFactors = F) %>% separate(pid, into=c("Chr", "Start", "End", "PeakID"), sep=":") %>% separate(PeakID, into=c("Gene", "Strand","Peak"), sep="_")
OldTotQTLs= oldtot %>% filter(-log10(bh)>=1)
nrow(OldTotQTLs)
```
Import new QTLs:
```{r}
newTotQTLs=read.table("../data/apaQTLs/Total_apaQTLs_5fdr.txt", stringsAsFactors = F, header = T)
nrow(newTotQTLs)
```
Filter out those matching from the old:
```{r}
UniqueNewTot=newTotQTLs %>% semi_join(OldTotQTLs, by="sid")
```
There are only 105 new snps This makes sense because 1 sno associates with multiple peaks.
Write these out to fetch the genotypes:
```{r}
write.table(UniqueNewTot, file="../data/apaQTLs/Total_apaQTLs_5fdr_NewUnique.txt", quote = F, col.names = F, row.names = F)
```
###Nuclear
```{r}
oldnuc=read.table("../../threeprimeseq/data/perm_APAqtl_GeneLocAnno_noMP_5percUs/filtered_APApeaks_merged_allchrom_refseqGenes.GeneLocAnno.NoMP_sm_quant.Nuclear.fixed.pheno_5perc_permResBH.txt", header=T,stringsAsFactors = F) %>% separate(pid, into=c("Chr", "Start", "End", "PeakID"), sep=":") %>% separate(PeakID, into=c("Gene", "Strand","Peak"), sep="_")
OldNucQTLs= oldnuc %>% filter(-log10(bh)>=1)
nrow(OldNucQTLs)
```
Import new QTLs:
```{r}
newNucQTLs=read.table("../data/apaQTLs/Nuclear_apaQTLs_5fdr.txt", stringsAsFactors = F, header = T)
nrow(newNucQTLs)
```
Filter out those matching from the old:
```{r}
UniqueNewNuc=newNucQTLs %>% semi_join(OldNucQTLs, by="sid")
```
There are 200 new snps in this set.
```{r}
write.table(UniqueNewNuc, file="../data/apaQTLs/Nuclear_apaQTLs_5fdr_NewUnique.txt", quote = F, col.names = F, row.names = F)
```
##Extract genotypes:
I wrote a script to pull the doses from the vcf file. Run it with:
```{bash,eval=F}
python extractGenotypes.py ../data/apaQTLs/Nuclear_apaQTLs_5fdr_NewUnique.txt ../data/QTLGenotypes/Genotypes_NuclearapaQTLS_newunique.txt
python extractGenotypes.py ../data/apaQTLs/Total_apaQTLs_5fdr_NewUnique.txt ../data/QTLGenotypes/Genotypes_TotalapaQTLS_newunique.txt
```
I also need the header from the VCF to have the individuals:
```{bash,eval=F}
head -n14 /project2/gilad/briana/YRI_geno_hg19/allChrom.dose.filt.vcf | tail -n1 > ../data/QTLGenotypes/vcfheader.txt
#manually remove # and unneaded columns, keep snp and ind.
```
```{r}
vcfhead=read.table("../data/QTLGenotypes/vcfheader.txt", header = T)
```
input sample list:
```{r}
samples=read.table("../data/phenotype/SAMPLE.txt")
samplist=as.vector(samples$V1)
```
###Total:
```{r}
totgeno=read.table("../data/QTLGenotypes/Genotypes_TotalapaQTLS_newunique.txt", col.names = colnames(vcfhead)) %>% select(samplist) %>% t()
```
Correlation:
```{r}
totgeneCorr=round(cor(totgeno),2)
heatmap.2(as.matrix(totgeneCorr),trace="none", dendrogram =c("none"), main="Genotype correlation\n for new Total QTL snps")
```
###Nuclear
```{r}
nucgeno=read.table("../data/QTLGenotypes/Genotypes_NuclearapaQTLS_newunique.txt", col.names = colnames(vcfhead)) %>% select(samplist) %>% t()
```
Correlation:
```{r}
nucgeneCorr=round(cor(nucgeno),2)
heatmap.2(as.matrix(nucgeneCorr),trace="none", dendrogram =c("none"),main="Genotype correlation \n for new Nuclear QTL snps")
```