# Comparison between Amish and GTEx cis-eQTL comparison

This notebook summarizes the comparison between Amish and GTEx cis-eQTL stats.

## All the relevant code for this pipeline can be viewed at:

### https://github.com/bee-hive/RNAseq_pipeline/tree/master/Scripts/eqtls/amish_pipeline/

## 1. Pre-processing steps

For each gene, the eQTL stats for all cis-SNPs (in the same chromosome) are written in:

<code>/tigress/BEE/amish/analyses/ciseqtl/gemma_output_per_gene/</code>

The following script takes a cis-window (150kb for now) and writes all associations by chromosomes in the directory:

<code>/tigress/BEE/amish/analyses/ciseqtl/genomewide/cis_eqtls_150kb_chrN.txt</code>

<code>https://github.com/bee-hive/RNAseq_pipeline/tree/master/Scripts/eqtls/amish_pipeline/write_amish_ciseqtls.py</code>

In [None]:
%%bash

python /tigress/BEE/RNAseq/Scripts/eqtls/amish_pipeline/write_amish_ciseqtls.py

head -10 /tigress/BEE/amish/analyses/ciseqtl/genomewide/cis_eqtls_150kb_chr1.txt

<code>
gene_id	chr	rs	ps	n_miss	allele1	allele0	af	beta	se	l_remle	l_mle	p_wald	p_lrt	p_score
ENSG00000143751.9	1	1_226020555_T_C_b37	226020555	0	T	C	0.051	2.335962e-01	3.284770e-01	2.354206e-01	1.362031e-01	4.787525e-01	4.434343e-01	4.617409e-01
ENSG00000143751.9	1	1_226020754_T_C_b37	226020754	0	T	C	0.020	-1.698723e-01	5.515366e-01	2.803462e-01	1.633995e-01	7.587651e-01	7.036687e-01	7.196894e-01
ENSG00000143751.9	1	1_226020963_G_T_b37	226020963	0	G	T	0.015	3.097863e-01	6.238420e-01	3.241143e-01	2.081204e-01	6.206458e-01	6.030988e-01	6.048826e-01
ENSG00000143751.9	1	1_226020988_A_G_b37	226020988	0	A	G	0.308	1.705230e-01	1.612769e-01	2.697861e-01	1.799973e-01	2.930703e-01	2.728733e-01	2.788139e-01
ENSG00000143751.9	1	1_226021159_A_G_b37	226021159	0	A	G	0.076	4.753731e-01	2.954591e-01	1.955902e-01	1.033283e-01	1.109851e-01	9.796505e-02	1.084263e-01
ENSG00000143751.9	1	1_226021338_C_T_b37	226021338	0	C	T	0.051	-2.696761e-01	4.358970e-01	3.421205e-01	2.403019e-01	5.376305e-01	5.395539e-01	5.429766e-01
ENSG00000143751.9	1	1_226021581_G_A_b37	226021581	0	G	A	0.066	3.173607e-01	3.192957e-01	3.663678e-01	2.607994e-01	3.228038e-01	3.216562e-01	3.278849e-01
ENSG00000143751.9	1	1_226021627_A_G_b37	226021627	0	A	G	0.015	3.097863e-01	6.238420e-01	3.241143e-01	2.081204e-01	6.206458e-01	6.030988e-01	6.048826e-01
ENSG00000143751.9	1	1_226021835_T_C_b37	226021835	0	T	C	0.141	-1.397015e-03	2.203955e-01	3.221105e-01	2.159763e-01	9.949559e-01	9.824597e-01	9.826598e-01
</code>

For this pipeline, we'll also write out the corresponding genotype files - combine all variants available in the Amish comparison set from both 1% and 5% GTEx v6p genotype vcf files, in the following directory:

<code>/tigress/BEE/RNAseq/Data/Genotype/gtex/amish_comparison/chrN.txt</code>

<code>https://github.com/bee-hive/RNAseq_pipeline/tree/master/Scripts/eqtls/amish_pipeline/write_genotypes_amish_comparison_wrapper.py</code>

<code>https://github.com/bee-hive/RNAseq_pipeline/tree/master/Scripts/eqtls/amish_pipeline/write_genotypes_amish_comparison.py</code>

In [None]:
%%bash

python /tigress/BEE/RNAseq/Scripts/eqtls/amish_pipeline/write_genotypes_amish_comparison_wrapper.py

## 2. cis-eQTL mapping

Now that the eQTL comparison set and genotype files are ready, we can map the GTEx cis-eQTLs for v6p to compare them to the Amish stats. Specifically, we use the LCL samples from GTEx:

<code>https://github.com/bee-hive/RNAseq_pipeline/tree/master/Scripts/eqtls/amish_pipeline/cis_eqtls_amish_comparison_wrapper.py</code>

<code>https://github.com/bee-hive/RNAseq_pipeline/tree/master/Scripts/eqtls/amish_pipeline/cis_eqtls_amish_comparison.R</code>

The output, which consists of combined data frames for GTEx and Amish cis-eQTLs, are saved in the directory:

<code>/tigress/BEE/RNAseq/Output/cis-mapping/amish/cellsebvtransformedlymphocytes/</code>

In [None]:
%%bash

python /tigress/BEE/RNAseq/Scripts/eqtls/amish_pipeline/cis_eqtls_amish_comparison_wrapper.py cellsebvtransformedlymphocytes

wc -l /tigress/BEE/RNAseq/Output/cis-mapping/amish/cellsebvtransformedlymphocytes/*

The number of pairs available to both GTEx v6p LCLs and Amish data:

<code>
    74250 gtex_amish_comparison_ciseqtls_chr_10.txt
    97983 gtex_amish_comparison_ciseqtls_chr_11.txt
    98324 gtex_amish_comparison_ciseqtls_chr_12.txt
    40555 gtex_amish_comparison_ciseqtls_chr_13.txt
    68069 gtex_amish_comparison_ciseqtls_chr_14.txt
    63426 gtex_amish_comparison_ciseqtls_chr_15.txt
    80651 gtex_amish_comparison_ciseqtls_chr_16.txt
   112472 gtex_amish_comparison_ciseqtls_chr_17.txt
    33509 gtex_amish_comparison_ciseqtls_chr_18.txt
   131912 gtex_amish_comparison_ciseqtls_chr_19.txt
   172097 gtex_amish_comparison_ciseqtls_chr_1.txt
    50282 gtex_amish_comparison_ciseqtls_chr_20.txt
    24100 gtex_amish_comparison_ciseqtls_chr_21.txt
    51655 gtex_amish_comparison_ciseqtls_chr_22.txt
   124820 gtex_amish_comparison_ciseqtls_chr_2.txt
   112543 gtex_amish_comparison_ciseqtls_chr_3.txt
    72401 gtex_amish_comparison_ciseqtls_chr_4.txt
    88928 gtex_amish_comparison_ciseqtls_chr_5.txt
   116660 gtex_amish_comparison_ciseqtls_chr_6.txt
    79034 gtex_amish_comparison_ciseqtls_chr_7.txt
    69959 gtex_amish_comparison_ciseqtls_chr_8.txt
    71893 gtex_amish_comparison_ciseqtls_chr_9.txt
  1835523 total
</code>

## 3. cis-eQTL stats comparison

The files from step 2 have the following columns:
 [1] "snps"               "gene"               "statistic"         
 [4] "pvalue"             "FDR"                "beta"              
 [7] "snp_chr"            "snp_pos"            "gene_chr"          
[10] "gene_start"         "gene_end"           "MAF"               
[13] "statistic_permuted" "pvalue_permuted"    "beta_permuted"     
[16] "amish_af"           "amish_allele1"      "amish_beta"        
[19] "amish_se"           "l_remle"            "l_mle"             
[22] "p_wald"             "p_lrt"

We're going to generate a few plots comparing the results from the two different datasets (and two different analysis methods). This is the (growing) list of comparisons/figures made:

1. QQ plots of - Amish/GTEx v6p, Amish/GTEx v6p permuted, GTEx v6p/GTEx v6p permuted
2. Extreme value QQ plots of - Amish/GTEx v6p, Amish/GTEx v6p permuted, GTEx v6p/GTEx v6p permuted
3. Effect size comparisons of - Amish/GTEx v6p, Amish/GTEx v6p significant pairs only
4. Enrichment analysis using Fisher's exact or hypergeometric test

In [None]:
## Rscript:

library(ggplot2)
save_dir = '/tigress/BEE/RNAseq/Output/cis-mapping/amish/cellsebvtransformedlymphocytes/'
comp_table_all = read.table(paste0(save_dir, 'gtex_amish_comparison_ciseqtls_chr_1.txt'), stringsAsFactors=F, sep='\t', header=T)

for (i in c(2:22)) {
	print(i)
	comp_table = read.table(paste0(save_dir, 'gtex_amish_comparison_ciseqtls_chr_', i, '.txt'), stringsAsFactors=F, sep='\t', header=T)
	comp_table_all = rbind(comp_table_all, comp_table)
}

# For amish data, the categories are ~5%, ~10%, and rest
comp_table_all$amish_af_cat = 'amish_common'
comp_table_all$amish_af_cat[comp_table_all$amish_af <= 0.05] = 'amish_rare'

# For GTEx data, the categories are ~
comp_table_all$gtex_af_cat = 'gtex_common'
comp_table_all$gtex_af_cat[comp_table_all$MAF <= 0.05] = 'gtex_rare'
comp_table_all$gtex_af_cat[comp_table_all$MAF <= 0.01] = 'gtex_very_rare'

g = ggplot(comp_table_all, aes(-log10(pvalue), -log10(p_wald))) + geom_point() + facet_grid(gtex_af_cat ~ amish_af_cat) + xlab('GTEx') + ylab('amish')
ggsave(g, filename = '/tigress/BEE/RNAseq/Analysis/Figures/amish_comparison/log_pval_comparison.png', width=8, height=8)
g = ggplot(comp_table_all, aes(abs(statistic), abs(amish_beta/amish_se))) + geom_point() + facet_grid(gtex_af_cat ~ amish_af_cat) + xlab('GTEx') + ylab('amish')
ggsave(g, filename = '/tigress/BEE/RNAseq/Analysis/Figures/amish_comparison/abs_beta_comparison.png', width=8, height=8)

vec1 = comp_table_all$amish_af_cat != 'amish_common'
vec2 = comp_table_all$gtex_af_cat != 'gtex_common'
vec = vec1 + vec2
subset = comp_table_all[(vec!=0),]

g = ggplot(subset, aes(-log10(pvalue), -log10(p_wald))) + geom_point() + facet_grid(gtex_af_cat ~ amish_af_cat) + xlab('GTEx') + ylab('amish')
ggsave(g, filename = '/tigress/BEE/RNAseq/Analysis/Figures/amish_comparison/log_pval_comparison_alt.png', width=8, height=8)
g = ggplot(subset, aes(abs(statistic), abs(amish_beta/amish_se))) + geom_point() + facet_grid(gtex_af_cat ~ amish_af_cat) + xlab('GTEx') + ylab('amish')
ggsave(g, filename = '/tigress/BEE/RNAseq/Analysis/Figures/amish_comparison/abs_beta_comparison_alt.png', width=8, height=8)
