## Eigengene SVA QTL Info

**Created**: 25 May 2022

## Environment

In [39]:
library(tidyverse)
library(data.table)

setwd("~/eQTL_pQTL_Characterization/")

source("04_Expression/scripts/utils/ggplot_theme.R")

## Load Data

In [17]:
mqtl <- read.csv("~/gains_team282/nikhil/expression/eigengene_sva/all_mqtl.csv")
mqtl.pcs <- read.csv("~/gains_team282/nikhil/expression/eigengene_sva/all_mqtl_all_pcs.csv")

mqtl.sum <- read.table("~/gains_team282/nikhil/expression/eigengene_sva/mqtl_full_summary_statistics_snps.txt", header=T)
mqtl.pcs.sum <- read.table("~/gains_team282/nikhil/expression/eigengene_sva/mqtl_all_pcs_full_summary_statistics_snps.txt", header=T)

map.snps <- read.csv("~/gains_team282/nikhil/expression/eigengene_sva/mqtl_snp_table.csv")

In [7]:
nrow(mqtl)
nrow(mqtl.pcs)

In [9]:
head(map.snps)

Unnamed: 0_level_0,snps,source,egene,conditional_number,accession
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<int>,<chr>
1,rs3131972,Lead cis-eQTL SNP,ENSG00000237491,,
2,rs3131972,Lead cis-eQTL SNP,ENSG00000230092,,
3,rs3131972,Lead cis-eQTL SNP,ENSG00000225880,,
4,rs2272757,Lead cis-eQTL SNP,ENSG00000188976,,
5,rs13303327,Lead cis-eQTL SNP,ENSG00000187961,,
6,rs13303056,Lead cis-eQTL SNP,ENSG00000187583,,


In [36]:
ebi.studies <- fread("04_Expression/data/gwas_catalog_v1.0.2-studies_r2022-02-21.tsv", header=TRUE, quote="") %>%
    as.data.frame()

In [37]:
head(ebi.studies, n=1)

Unnamed: 0_level_0,DATE ADDED TO CATALOG,PUBMEDID,FIRST AUTHOR,DATE,JOURNAL,LINK,STUDY,DISEASE/TRAIT,INITIAL SAMPLE SIZE,REPLICATION SAMPLE SIZE,PLATFORM [SNPS PASSING QC],ASSOCIATION COUNT,MAPPED_TRAIT,MAPPED_TRAIT_URI,STUDY ACCESSION,GENOTYPING TECHNOLOGY
Unnamed: 0_level_1,<date>,<int>,<chr>,<date>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<chr>,<chr>,<chr>
1,2021-08-24,34124712,Sherva R,2021-02-28,Explor Med,www.ncbi.nlm.nih.gov/pubmed/34124712,Genome-wide association study of phenotypes measuring progression from first cocaine or opioid use to dependence reveals novel risk genes.,Cocaine dependence (time to event),"3,554 African American cases, 478 African American controls, 2,712 European ancestry cases, 915 European ancestry controls","572 African American cases, 416 African American controls, 759 European ancestry cases, 1,620 European ancestry controls",Illumina [NR] (imputed),5,cocaine dependence,http://www.ebi.ac.uk/efo/EFO_0002610,GCST012225,Genome-wide genotyping array


In [43]:
modules <- read.csv("~/gains_team282/nikhil/expression/gene_expression/modules.csv") %>%
    dplyr::mutate(Eigengene=gsub("Module_", "ME_", Module))

In [44]:
head(modules)

Unnamed: 0_level_0,Gene,Module,Eigengene
Unnamed: 0_level_1,<chr>,<chr>,<chr>
1,ENSG00000001167,Module_1,ME_1
2,ENSG00000002330,Module_1,ME_1
3,ENSG00000002822,Module_1,ME_1
4,ENSG00000005175,Module_1,ME_1
5,ENSG00000005194,Module_1,ME_1
6,ENSG00000005893,Module_1,ME_1


# Module QTL

First, I ask how many SNPs were significantly associated with any given module. 876 SNPs were associated with at least one module eigengene.

In [45]:
length(unique(mqtl$snp))

Next, how many modules had an association with at least one SNP? 31 modules had an association with at least one SNP.

In [46]:
length(unique(mqtl$me))

A module might have more than one association. That is the case in this analysis. There are 31 loci, one for each module.

In [47]:
length(unique(mqtl.sum$QTL.ID))

What proportion of the module QTL are lead cis-eQTL? Lead conditional cis-eQTL? Trait-associated variants?

In [48]:
merge(mqtl, map.snps, by.x="snp", by.y="snps") %>%
    dplyr::select(snp, source) %>%
    unique() %>%
    dplyr::group_by(source) %>%
    dplyr::summarize(N=n()) %>%
    dplyr::mutate(Prop=N / length(unique(mqtl$snp)))

source,N,Prop
<chr>,<int>,<dbl>
Conditional cis-eQTL SNP,236,0.2694064
EBI GWAS Catalog,657,0.75
Lead cis-eQTL SNP,139,0.1586758


How many of the modules are associated with a cis-eQTL and also contain the corresponding eGene?

In [55]:
merge(mqtl, map.snps, by.x="snp", by.y="snps") %>%
    merge(., modules, by.x="me", by.y="Eigengene") %>%
    dplyr::filter(egene == Gene) %>%
    dplyr::select(me) %>%
    unique() %>%
    nrow()

# Module QTL from All PCs

First, I ask how many SNPs were significantly associated with any given module. 1935 SNPs were associated with at least one module eigengene.

In [21]:
length(unique(mqtl.pcs$snp))

Next, how many modules had an association with at least one SNP? 48 modules had an association with at least one SNP.

In [22]:
length(unique(mqtl.pcs$me))

A module might have more than one association. That is the case in this analysis. There are 76 loci, one for each module.

In [23]:
length(unique(mqtl.pcs.sum$QTL.ID))

What proportion of the module QTL are lead cis-eQTL? Lead conditional cis-eQTL? Trait-associated variants?

In [33]:
merge(mqtl.pcs, map.snps, by.x="snp", by.y="snps") %>%
    dplyr::select(snp, source) %>%
    unique() %>%
    dplyr::group_by(source) %>%
    dplyr::summarize(N=n()) %>%
    dplyr::mutate(Prop=N / length(unique(mqtl.pcs$snp)))

source,N,Prop
<chr>,<int>,<dbl>
Conditional cis-eQTL SNP,486,0.2511628
EBI GWAS Catalog,1479,0.7643411
Lead cis-eQTL SNP,292,0.1509044


How many of the modules are associated with a cis-eQTL and also contain the corresponding eGene?

In [56]:
merge(mqtl.pcs, map.snps, by.x="snp", by.y="snps") %>%
    merge(., modules, by.x="me", by.y="Eigengene") %>%
    dplyr::filter(egene == Gene) %>%
    dplyr::select(me) %>%
    unique() %>%
    nrow()

In [116]:
merge(mqtl.pcs, map.snps, by.x="snp", by.y="snps") %>%
    dplyr::filter(source == "EBI GWAS Catalog") %>%
    dplyr::select(snp, me, accession) %>%
    merge(., ebi.studies, by.x="accession", by.y="STUDY ACCESSION") %>%
    dplyr::group_by(MAPPED_TRAIT, me) %>%
    dplyr::summarize(N=n(), .groups="drop") %>%
    tidyr::spread(key="me", value="N", fill=0) -> trait.mtx

colSums(trait.mtx[,-1])
ncol(trait.mtx[,-1])

In [114]:
trait.mtx %>%
    dplyr::filter(ME_63 > 0)

MAPPED_TRAIT,ME_100,ME_101,ME_102,ME_103,ME_104,ME_105,ME_106,ME_21,ME_46,⋯,ME_86,ME_87,ME_88,ME_89,ME_91,ME_92,ME_94,ME_97,ME_98,ME_99
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
aspartate aminotransferase measurement,0,2,1,0,0,0,0,0,0,⋯,0,0,0,0,0,0,0,0,0,0
"aspartate aminotransferase measurement, serum alanine aminotransferase measurement, low density lipoprotein triglyceride measurement, body fat percentage, high density lipoprotein cholesterol measurement, sex hormone-binding globulin measurement",0,1,0,0,0,0,0,0,0,⋯,0,0,0,0,1,0,0,0,0,0
body fat percentage,0,0,0,0,0,0,0,0,0,⋯,0,0,0,0,2,0,1,0,0,0
body height,0,0,1,0,1,0,0,0,0,⋯,2,0,0,0,1,0,0,1,0,0
body mass index,0,4,0,1,0,0,0,0,0,⋯,2,0,0,0,30,0,2,0,0,0
chronotype measurement,0,0,0,0,0,0,0,0,0,⋯,0,0,0,0,0,0,0,0,0,0
"comparative body size at age 10, self-reported",0,0,0,0,0,0,0,0,0,⋯,0,0,0,0,1,1,0,0,0,0
high density lipoprotein cholesterol measurement,0,0,0,0,0,0,0,0,0,⋯,1,0,1,0,0,0,0,0,0,0
intelligence,0,2,0,0,0,0,0,0,0,⋯,0,0,0,0,13,5,13,1,0,0
metabolic syndrome,0,0,0,0,0,0,0,0,0,⋯,0,0,0,0,0,0,0,0,0,0
