# GoShifter

**Created**: 1 July 2022

## Environment

In [1]:
library(tidyverse)
library(data.table)
library(ComplexHeatmap)
library(circlize)

setwd("~/eQTL_pQTL_Characterization/")

source("03_Functional_Interpretation/scripts/utils/ggplot_theme.R")

── [1mAttaching packages[22m ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.6     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.7     [32m✔[39m [34mdplyr  [39m 1.0.8
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.1     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()


Attaching package: ‘data.table’


The following objects are masked from ‘package:dplyr’

## Load Data

In [141]:
meta <- read.csv("03_Functional_Interpretation/metadata/reads_atac_seq.txt")

In [2]:
files.dir <- "/nfs/users/nfs_n/nm18/gains_team282/epigenetics/enrichment/go_shifter/conditional_snps_ld/"
files <- list.files(files.dir)
files <- files[grepl("overlap_scores", files)]

overlap.scores <- lapply(files, function(file) {
    fread(paste0(files.dir, "/", file)) %>%
        as.data.frame() %>%
        dplyr::mutate(Group=gsub("conditional_snps_ld_", "", gsub("_overlap_scores.tsv", "", file)))
}) %>%
    do.call(rbind, .) %>%
    dplyr::mutate(Overlap_Score=ifelse(Overlap == 1, Overlap_Score, 1))

In [3]:
loci <- read.table("/nfs/users/nfs_n/nm18/gains_team282/epigenetics/enrichment/go_shifter/snp_lists/conditional_snps_ld.txt", header=T)

In [4]:
c.cis.eqtl <- read.table("/nfs/users/nfs_n/nm18/gains_team282/eqtl/cisresults/conditionalanalysis/conditional_eQTL_results_final.txt")

In [22]:
gene.info <- read.table("/nfs/team282/data/gains_team282/gene_info_864_20412_hla.txt") %>%
    dplyr::select(gene_id, gene_name)

## Identify Specificity of Peaks

The "overlap score" is the probability that an observed overlap for a locus would occur by chance. It is calculated empirically by GoShifter based on how many permutations also generate the overlap. Thus, a lower overlap score suggests that the overlap occuring in the observed annotation is more unique.

I take the complementary score (so that more interesting loci have higher values). I then use the specificity method implemented in CHEERS (Euclidean normalisation) to identify peaks that are uniquely important to one anno

In [5]:
score.mtx <- overlap.scores %>%
    dplyr::select(Locus, Group, Overlap_Score) %>%
    tidyr::spread(Group, Overlap_Score)

rownames(score.mtx) <- score.mtx$Locus
score.mtx$Locus <- NULL
score.mtx <- 1 - as.matrix(score.mtx)

score.mtx <- score.mtx[rowSums(score.mtx) != 0, ]

In [192]:
options(repr.plot.width=18, repr.plot.height=24)

col_fun = colorRamp2(c(0, 1), c("white", "royalblue4"))

ht = Heatmap(
    score.mtx, name="Score", 
    use_raster=TRUE, col=col_fun,
    show_row_dend=F, show_column_dend=F, show_row_names=F,
    column_names_max_height = max_text_width(
        colnames(specificity.mtx), 
        gp = gpar(fontsize = 12)
    ),
    column_names_rot = 45, column_title=NULL,
)

pdf("04_Expression/results/goshifter_score_matrix.pdf", width=18, height=24)
draw(ht, padding = unit(c(4, 30, 2, 2), "mm"))
dev.off()

Take the Euclidean norm of each row.