# TWAS for xQTL

This module contains the software implementations to perform transcriptome-wide association analysis (TWAS). These methods are designed to perform rigorous causal inference connecting genes to complex traits.    

## Overview

The goal of this module is to perform PTWAS analysis from SuSiE objects, including:
* **Step 1: PTWAS (twas_z_format)**
1. load GWAS z-score
2. load the corresponding LD matrices within the TAD region of each gene
3. use allele_qc() function to QC the LD matrix with summary stats
4. extract susie, lasso, enet and mr_ash weights
5. use twas_z() to compute TWAS results from multipile weights
* **Step 2: MR (twas_candidate_genes)**
1. use some p-value cutoff to loosely pick TWAS regions of interest then if the region passes the cutoff, save the QC-ed GWAS data in the format compatabile to [mr.R](https://github.com/cumc/pecotmr/blob/main/R/mr.R)


### Input
* **Step 1: PTWAS**
1. QTL susie table：
2. GWAS sumstats results (tsv format)    
3. LD reference
4. TAD region
* **Step 2: MR**
1. output of Step 1

### Ouput
* **Step 1: PTWAS**
1. TWAS results from multiple weights
2. GWAS summary statistics after QC
3. the input format of twas_z() function
* **Step 2: MR**
1. the candidate genes of TWAS
2. the input format of [mr.R](https://github.com/cumc/pecotmr/blob/main/R/mr.R)
3. the MR results


In [None]:
[global]
# Workdir
parameter: susie_path = paths
parameter: cwd = path("output")
parameter: container = ''
import re
parameter: entrypoint= ('micromamba run -a "" -n' + ' ' + re.sub(r'(_apptainer:latest|_docker:latest|\.sif)$', '', container.split('/')[-1])) if container else ""
parameter: job_size = 100
parameter: walltime = "1h"
parameter: mem = "16G"
parameter: numThreads = 20
parameter: allele_qc_R = path("~/pecotmr/R/allele_qc.R")
parameter: 
allele_qc_R = f"{allele_qc_R:a}"

## PTWAS 



### Input

- susie_path: A list of file paths for susie results
- GWAS_path: gwas summary statistics path (to load the gwas sumstats dataframe)
- LD_path: LD block matrix path (a list of file paths of LD block)
- TAD_path: TAD region path (a dataframe  of TAD region of each gene, this one is applied to subset the LD matrix for each gene)


### Output

- twas_z_format: a dataframe of input format for the twas_z() function
- gene_weights_pq: a dataframe of output of twas_z() function, we apply twas_z function to the  weights of four methods (susie, lasso, enet and mr_ash) to calculate the pvalue, after obtaining the pvalue, we also calculate the corresponding qvalue
- AD_allele_flip: AD gwas sumstats after QC

In [None]:
[twas_z_format_1]

import pandas as pd
s_path = pd.read_csv(susie_path, header=None)
# split_path = s_path[0].str.split(".", expand=True)
# ID = pd.DataFrame({'ID': split_path[1]})
# path = pd.DataFrame({'path': s_path[0]})
# input_df = pd.concat([ID, path], axis=1)
input_df = s_path.values.tolist()

parameter: GWAS_path = path
parameter: LD_path = path
parameter: TAD_path = path
parameter: allele_qc_R = path("~/pecotmr/R/allele_qc.R")
allele_qc_R = f"{allele_qc_R:a}"
#input: susie_path
input: [x for x in input_df], group_by = 1
output: f'{cwd}/{step_name[:-2]}/{_input:bnn}.twas_z.rds'
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
R: expand= "${ }", stderr = f'{_output:nn}.stderr', stdout = f'{_output:nn}.stdout',input=allele_qc_R
#twas_z_format = ptwas(susie_path = "${_input:ar}",TAD_path = "${TAD_path:ar}", GWAS_path = "${GWAS_path:ar}", LD_path = "${LD_path:ar}")
#fwrite(twas_z_format,${_output:ar},row.names=F,sep="\t",quote=F)

library(data.table)
library(plink2R)
library(dplyr)
library(pillar)
library(tibble)
library(stringr)
library(ggplot2)
library(reticulate)
library(Matrix)
library(matrixcalc)
library(genio)
library(gdata)
library(bigsnpr)
library(Rlab)
library(qvalue)
np = import("numpy")
options(bedtools.path = "/home/aw3600/bedtools2/bin/")
library(bedtoolsr)



twas_z <- function(weights, z, R=NULL, X=NULL) {
    if (is.null(R)) {
        # mean impute X
        genetype_data_imputed <- apply(X, 2, function(x){
            pos <- which(is.na(x))
            if (length(pos) != 0){
                x[pos] <- mean(x,na.rm = TRUE)
                }
                return(x)
            })
        # need to add `large = T` in Rfast::cora, or could get wrong results 
        # R <- cora(genetype_data_imputed, large = T)
        # FIXME: test and enable using cora if Rfast is installed
        # See how susieR did it
        R <- cor(genetype_data_imputed) 
        colnames(R) <- rownames(R) <- colnames(genetype_data_imputed)
    }
    stat <- t(weights) %*% z
    denom <- t(weights) %*% R %*% weights
    zscore <- stat/sqrt(denom)
    pval <- pchisq( zscore * zscore, 1, lower.tail = FALSE)
    return(list(z=zscore, pval=pval))
}



ptwas = function(susie_path,TAD_path,GWAS_path,LD_path){
#####load susie_res_path
susie = str_split(susie_path,"\\.",simplify=T)[,2]%>%cbind(.,susie_path)%>%data.frame()%>%setNames(c("ID","path"))
gene_name = susie$ID
susie_res = readRDS(susie$path)
if(!is.null(names(susie_res[[1]]$sets))){
#####load LD path list
LD_list = read.table(LD_path,header=F,sep="\t")
LD_list_pos.bed = str_split(LD_list$V1,"_",simplify=T)%>%cbind(.,LD_list)
#####load TAD_region
TAD_region = fread(TAD_path)%>%rename("ID"="gene_id")
qtl_select = TAD_region%>%filter(ID==gene_name)
chr = str_sub(qtl_select$`#chr`,4)

qtl_reference = str_split(susie_res[[1]]$variant_names,":",simplify = T)%>%data.frame()%>%setNames(c("chr","pos","A1","A2"))
######load GWAS summary statistics
AD_dataset = fread(paste0(GWAS_path,"/ADGWAS_Bellenguez_2022.",chr,"/ADGWAS2022.chr",chr,".sumstat.tsv",sep=""))
##transform the AD_dataset to the form of allele_qc format
AD_data = AD_dataset%>%mutate(chr = paste0("chr",chromosome))%>%mutate(pos = as.character(position))%>%select(-chromosome)%>%rename("A1"="ref","A2"="alt")%>%mutate(z=beta/se)%>%select(-position)
AD_allele_flip = allele_qc(AD_data,qtl_reference,match.min.prop=0.2,remove_dups=TRUE,flip=TRUE,remove=TRUE)%>%
              mutate(variant_allele_flip = paste(chr,pos,A1.sumstats,A2.sumstats,sep=":"))
####load LD matrix
LD.files <- bedtoolsr::bt.intersect(a = LD_list_pos.bed, b = qtl_select)
LD.files.name = unique(LD.files$V5)
LD.list = list()
LD.matrix.names=NULL
for (k in 1:length(LD.files.name)){
 npz = np$load(LD.files.name[k])
 LD.matrix = npz$f[["arr_0"]]
 LD.snps = str_split(LD.files.name[k],"[.]",simplify = T)%>%.[,-c(length(.),(length(.)-1))]%>%paste(.,collapse=".")%>%paste0(.,".bim",sep="")%>%read.table(.)
 #head(LD.snps)
 LD_names = colnames(LD.matrix) = rownames(LD.matrix) = gsub("_",":",LD.snps$V2)
 snp_merge = intersect(LD_names,AD_allele_flip$variant_allele_flip)
 LD.select = as.matrix(LD.matrix[snp_merge,snp_merge])
 LD.list[[k]] = LD.select
 LD.matrix.names = append(LD.matrix.names,snp_merge)
}
   LD.block = as.matrix(bdiag(LD.list))
   upperTriangle(LD.block,byrow=TRUE) = lowerTriangle(LD.block)
   colnames(LD.block) = rownames(LD.block) = LD.matrix.names
####generate the twas_z format input
twas_z_format = data.frame(LD.matrix.names)%>%mutate(gene_name = gene_name)%>%
    mutate(chr = chr)%>%
    mutate(AD_allele_flip[match(LD.matrix.names,AD_allele_flip$variant_allele_flip),]%>%select(beta,se,z))%>%
    mutate(susie_weights = susie_res[[1]]$susie_weights[match(LD.matrix.names,susie_res[[1]]$variant_names)])%>%
    mutate(enet_weights =  susie_res[[1]]$enet_weights[match(LD.matrix.names,susie_res[[1]]$variant_names)])%>%
    mutate(lasso_weights = susie_res[[1]]$lasso_weights[match(LD.matrix.names,susie_res[[1]]$variant_names)])%>%
    mutate(mr_ash_weights = susie_res[[1]]$mr_ash_weights[match(LD.matrix.names,susie_res[[1]]$variant_names)])%>%
    rename("variants_name"="LD.matrix.names")
weights = apply(twas_z_format[,c("susie_weights","enet_weights","lasso_weights","mr_ash_weights")],2,function(x) twas_z(x,twas_z_format$z,R = LD.block))
twas_weights = data.frame(gene_name=gene_name,chr = chr,weights$susie_weights$pval,weights$susie_weights$z,
                          weights$lasso_weights$pval,weights$lasso_weights$z,weights$enet_weights$pval,weights$enet_weights$z,
                          weights$mr_ash_weights$pval,weights$mr_ash_weights$z)
  names(twas_weights) = c("gene_name","chr","susie_pval","susie_z","lasso_pval","lasso_z","enet_pval","enet_z","mr_ash_pval","mr_ash_z")
  p_values = twas_weights[,c("susie_pval","lasso_pval","enet_pval","mr_ash_pval")]
  p_values[is.na(p_values)] = 1
  q_values = qvalue(p_values,lambda=0)$qvalues
  gene_weights_pq = data.frame(twas_weights,qvalue = q_values)
  names(gene_weights_pq)[11:14]=c("susie_qval","lasso_qval","enet_qval","mr_ash_qval")
####calculate the pvalue and zscore using twas_z function    
return(list(twas_z_format = twas_z_format,
            gene_weights_pq = gene_weights_pq,
            AD_allele_flip = AD_allele_flip))
 }
else {
    cat("The 'susie_result' is NULL, so no output is generated.\n")
}
}

twas_z = ptwas(susie_path = ${_input:ar},TAD_path = ${TAD_path:ar},GWAS_path = ${GWAS_path:ar},LD_path = ${LD_path:ar})
saveRDS(twas_z, ${_output:ar}, compress='xz')

In [None]:
[twas_z_format_2]
# Path to the input molecular phenotype data.
input: group_by = "all"
output: f'{cwd}/{step_name[:-2]}/twas_z_files.txt'
python: expand= "$[ ]", stderr = f'{_output:n}.stderr', stdout = f'{_output:n}.stdout', container = container, entrypoint = entrypoint
import pandas as pd
pd.DataFrame({"output" : [$[_input:ar,]]}).to_csv("$[_output]",index = False ,header = False, sep = "\t")

## PTWAS 

### Input

- weights_path: A list of file paths of the PTWAS output
- pval_threshold: pvalue threshold for selecting the candidate genes, it is default to be 0.05
- cpip_cutoff: cpip cutoff threshold for MR method

### Output

- cand_genes.rds: the candidate significant genes obtained using the multiple weights
- mr_output.rds: the MR results using the candidate genes (cand_genes.rds)

In [None]:
[twas_candidate_gene]
parameter: weights_path = path
parameter: pval_threshold = 0.05
parameter: cpip_cutoff = 0.5
parameter: mr_R = path("~/pecotmr/R/mr.R")
mr_R = f"{mr_R:a}"

input: weights_path
output: cand_genes = f'{cwd:a}/{step_name}.rds',
        mr_output = f'{cwd:a}/mr_output.rds'
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
R: expand= "${ }", stderr = f'{_output[0]:n}.stderr', stdout = f'{_output[0]:n}.stdout',input=mr_R


library(data.table)
library(dplyr)
library(stringr)
library(reticulate)
library(qvalue)
library(doMC)

twas_mr_format_input = function(cand_genes,susie_path,weights_file_path){
mr_format_input = NULL
susie_path = fread(susie_path,header=F)
susie = str_split(susie_path$V1,"\\.",simplify=T)[,2]%>%cbind(.,susie_path)%>%setNames(c("ID","path"))
for (k in 1:length(cand_genes)){
gene_name = cand_genes[k]
qtl_susie_res = readRDS(susie$path[susie$ID==gene_name])
if(!is.null(names(qtl_susie_res[[1]]$sets$cs))){
qtl_finemap = qtl_susie_res[[1]]$top_loci%>%filter(cs_index_primary>=1)%>%select(-cs_index_secondary)%>%mutate(X_ID=gene_name)%>%
           rename("snp"="variant_id","bhat_x" = "bhat","sbhat_x"="sbhat","cs" = "cs_index_primary")
AD_allele_flip = readRDS(weights_file_path$path[weights_file_path$ID==gene_name])$AD_allele_flip
merge_snp = intersect(qtl_finemap$snp,AD_allele_flip$variant_allele_flip)
format_input = qtl_finemap[match(merge_snp,qtl_finemap$snp),]%>%cbind(.,AD_allele_flip[match(merge_snp,AD_allele_flip$variant_allele_flip),]%>%select(beta,se)%>%rename("bhat_y"="beta","sbhat_y"="se"))
mr_format_input = rbind(mr_format_input, format_input)
    }
}
return(mr_format_input)
}

#detectCores()
registerDoMC(100)
weights_file = fread(${_input:ar},header=F)
weights_file_path = str_split(weights_file$V1,"\\.",simplify=T)[,2]%>%cbind(.,weights_file)%>%setNames(c("ID","path"))
#ptm = proc.time()
gene_pq = foreach(k=1:dim(weights_file)[1], .combine=rbind) %dopar% {
   weights_pq = readRDS(weights_file_path$path[k])$gene_weights_pq
}
#proc.time() - ptm
padj = apply(gene_pq[,c("susie_pval","lasso_pval","enet_pval","mr_ash_pval")],2,function(x) p.adjust(x, method = "bonferroni",))%>%data.frame()
gene_pq_adj = gene_pq%>%cbind(.,padj)
names(gene_pq_adj)[15:18] = c("susie_pval_adj","lasso_pval_adj","enet_pval_adj","mr_ash_pval_adj")

cand_genes = NULL
cand_genes$gene_pq_adj = gene_pq_adj
cand_genes$susie= cand_genes$gene_pq_adj%>%filter(susie_pval<${pval_threshold})%>%select(gene_name)
cand_genes$lasso= cand_genes$gene_pq_adj%>%filter(lasso_pval<${pval_threshold})%>%select(gene_name)
cand_genes$enet= cand_genes$gene_pq_adj%>%filter(enet_pval<${pval_threshold})%>%select(gene_name)
cand_genes$mr_ash= cand_genes$gene_pq_adj%>%filter(mr_ash_pval<${pval_threshold})%>%select(gene_name)
cand_genes$susie_adj= cand_genes$gene_pq_adj%>%filter(susie_pval_adj<${pval_threshold})%>%select(gene_name)
cand_genes$lasso_adj= cand_genes$gene_pq_adj%>%filter(lasso_pval_adj<${pval_threshold})%>%select(gene_name)
cand_genes$enet_adj= cand_genes$gene_pq_adj%>%filter(enet_pval_adj<${pval_threshold})%>%select(gene_name)
cand_genes$mr_ash_adj= cand_genes$gene_pq_adj%>%filter(mr_ash_pval_adj<${pval_threshold})%>%select(gene_name)

cand_genes$susie_format_input = twas_mr_format_input(cand_genes$susie$gene_name,${susie_path:ar},weights_file_path)
cand_genes$lasso_format_input = twas_mr_format_input(cand_genes$lasso$gene_name,${susie_path:ar},weights_file_path)
cand_genes$enet_format_input = twas_mr_format_input(cand_genes$enet$gene_name,${susie_path:ar},weights_file_path)
cand_genes$mr_ash_format_input = twas_mr_format_input(cand_genes$mr_ash$gene_name,${susie_path:ar},weights_file_path)
cand_genes$susie_adj_format_input = twas_mr_format_input(cand_genes$susie_adj$gene_name,${susie_path:ar},weights_file_path)
cand_genes$lasso_adj_format_input = twas_mr_format_input(cand_genes$lasso_adj$gene_name,${susie_path:ar},weights_file_path)
cand_genes$enet_adj_format_input = twas_mr_format_input(cand_genes$enet_adj$gene_name,${susie_path:ar},weights_file_path)
cand_genes$mr_ash_adj_format_input = twas_mr_format_input(cand_genes$mr_ash_adj$gene_name,${susie_path:ar},weights_file_path)

saveRDS(cand_genes, ${_output[0]:ar}, compress='xz')

mr_output = NULL
mr_output$susie = fine_mr(cand_genes$susie_format_input,${cpip_cutoff})
mr_output$lasso = fine_mr(cand_genes$lasso_format_input,${cpip_cutoff})
mr_output$enet = fine_mr(cand_genes$enet_format_input,${cpip_cutoff})
mr_output$mr_ash = fine_mr(cand_genes$mr_ash_format_input,${cpip_cutoff})
mr_output$susie_adj = fine_mr(cand_genes$susie_adj_format_input,${cpip_cutoff})
mr_output$lasso_adj = fine_mr(cand_genes$lasso_adj_format_input,${cpip_cutoff})
mr_output$enet_adj = fine_mr(cand_genes$enet_adj_format_input,${cpip_cutoff})
mr_output$mr_ash_adj = fine_mr(cand_genes$mr_ash_adj_format_input,${cpip_cutoff})

saveRDS(mr_output, ${_output[1]:ar}, compress='xz')

### Example
#### bulk data set

* fine mapping only on the 612 genes

In [None]:
sos run ~/xqtl-pipeline/pipeline/SuSiE.ipynb susie \
     --name AD_DLPFC_bulk \
     --genoFile  /mnt/vast/hpc/csg/FunGen_xQTL/ROSMAP/Genotype/geno_by_region/TADB_enhanced_cis_genotype_by_region/ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.genotype_by_region_files.txt \
     --phenoFile   ~/MR_KMT_analysis/pheno/process/DLPFC_bulk_geneid_eqtl_data.region_list.txt \
     --covFile  ~/MR_KMT_analysis/covariate/DLPFC_covs_genoPCs_marchPCs.txt \
     --phenotype_names AD \
     --utils-R  ~/xqtl-pipeline/pipeline/xqtl_utils.R \
     --maf 0.01 \
     --pip_cutoff 0.1 \
     --coverage 0.95 \
     --region-list ~/MR_KMT_analysis/PTWAS/AD_phenotype_select.txt \
     --no-indel \
     --cwd ~/MR_KMT_analysis/PTWAS/DLPFC_bulk_AD_SuSiE_results \
     --mem 200G -J 50 -c /mnt/vast/hpc/csg/molecular_phenotype_calling/csg.yml -q csg 

In [None]:
sos run ~/xqtl-pipeline/pipeline/ptwas.ipynb twas_z_format \
        --susie_path  ~/MR_KMT_analysis/PTWAS/DLPFC_bulk_AD_SuSiE_results/AD_DLPFC_bulk.susie_output.txt \
        --TAD_path ~/fungen-xqtl-analysis/resource/TADB_enhanced_cis.bed \
        --GWAS_path /mnt/vast/hpc/csg/xqtl_workflow_testing/ADGWAS/data_intergration/ADGWAS2022 \
        --LD_path /mnt/vast/hpc/csg/molecular_phenotype_calling/LD/output/1300_hg38_EUR_LD_blocks_LD/ROSMAP_NIA_WGS.leftnorm.filtered.filtered.ld.list \
        --cwd ~/MR_KMT_analysis/PTWAS \
        --mem 80G

In [None]:
sos run ~/xqtl-pipeline/pipeline/ptwas.ipynb twas_candidate_gene \
        --susie_path  ~/MR_KMT_analysis/PTWAS/DLPFC_bulk_AD_SuSiE_results/AD_DLPFC_bulk.susie_output.txt \
        --weights_path ~/MR_KMT_analysis/PTWAS/twas_z_format/AD_DLPFC_bulk.twas_z_files.txt \
        --pval_threshold 0.05 \
        --cpip_cutoff 0.5 \
        --mr_R ~/pecotmr/R/mr.R \
        --cwd ~/MR_KMT_analysis/PTWAS/DLPFC_bulk_data \
        --mem 80G

In [7]:
library(data.table)
DLPFC_genes = readRDS("~/MR_KMT_analysis/PTWAS/DLPFC_bulk_data/twas_candidate_gene.rds")
dim(DLPFC_genes$gene_pq_adj)
head(DLPFC_genes$gene_pq_adj)

Unnamed: 0_level_0,gene_name,chr,susie_pval,susie_z,lasso_pval,lasso_z,enet_pval,enet_z,mr_ash_pval,mr_ash_z,susie_qval,lasso_qval,enet_qval,mr_ash_qval,susie_pval_adj,lasso_pval_adj,enet_pval_adj,mr_ash_pval_adj
Unnamed: 0_level_1,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,ENSG00000188157,1,0.94223145,0.07246551,,,,,0.896210155,-0.1304503,1.0,1.0,1.0,1.0,1,,,1.0
2,ENSG00000074800,1,0.45220217,-0.75174874,0.89523247,0.13168624,0.6625478,-0.43639828,0.49960535,-0.6751108,0.88339705,0.89523247,0.88339705,0.883397048,1,1.0,1.0,1.0
3,ENSG00000177000,1,0.12225353,-1.54538351,0.03928815,-2.06115615,0.0426312,-2.02730527,0.085125875,-1.721689,0.12225353,0.08526241,0.08526241,0.113501166,1,1.0,1.0,1.0
4,ENSG00000117118,1,0.73754554,-0.33510546,0.95718135,-0.05369101,0.9518644,0.06036566,0.761034824,-0.3041222,0.95718135,0.95718135,0.95718135,0.957181348,1,1.0,1.0,1.0
5,ENSG00000117115,1,0.26937576,-1.10450124,0.07594689,-1.77470331,0.1178218,-1.56398218,0.52204252,0.6402001,0.35916769,0.23564351,0.23564351,0.52204252,1,1.0,1.0,1.0
6,ENSG00000158748,1,0.08305581,1.73322428,0.06390926,1.85281234,0.063412,1.85629165,0.001551992,3.164782,0.08305581,0.08305581,0.08305581,0.006207967,1,1.0,1.0,0.74806


After fine mapping, there are only 482 genes have significant results. Thus, we only list the pvalue and qvalue of 482 genes.

We set pvalue threshold to be 0.05 and compare the bonferroni adjusted pvalue with the 0.05, 
* For the susie method, we obtained 28 candidate significant genes;
* For the lasso method, we obtained 12 candidate significant genes;
* For the enet method, we obtained 12 candidate significant genes;
* For the mr_ash method, we obtained 23 candidate significant genes

In [27]:
length(DLPFC_genes$susie_adj$gene_name)
length(DLPFC_genes$lasso_adj$gene_name)
length(DLPFC_genes$enet_adj$gene_name)
length(DLPFC_genes$mr_ash_adj$gene_name)

* After merge the candidate genes of four methods, we finally obtained genes

In [23]:
candidate_genes_merge = unique(c(DLPFC_genes$susie_adj$gene_name, DLPFC_genes$lasso_adj$gene_name, DLPFC_genes$enet_adj$gene_name, DLPFC_genes$mr_ash_adj$gene_name))
length(candidate_genes_merge)

In [24]:
candidate_genes_merge

In [37]:
mr_output = readRDS("~/MR_KMT_analysis/PTWAS/DLPFC_bulk_data/mr_output.rds")

In [38]:
head(DLPFC_genes$susie_adj_format_input)

Unnamed: 0_level_0,snp,maf,bhat_x,sbhat_x,pip,cs,X_ID,bhat_y,sbhat_y
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>
1,chr1:207505962:A:G,0.2008929,-0.183342,0.03515377,0.03532058,1,ENSG00000117322,-0.1093,0.0102
2,chr1:207510847:T:G,0.1932398,-0.1878704,0.03512318,0.05683321,1,ENSG00000117322,-0.1234,0.0104
3,chr1:207512441:T:C,0.2028061,-0.1862302,0.03513435,0.05114225,1,ENSG00000117322,-0.1115,0.0102
4,chr1:207512620:A:C,0.1934866,-0.1923521,0.03509215,0.1123645,1,ENSG00000117322,-0.124,0.0104
5,chr1:207518704:A:G,0.192602,-0.1950117,0.03507337,0.16032051,1,ENSG00000117322,-0.1253,0.0104
6,chr1:207524699:T:C,0.2021684,-0.187723,0.03512419,0.06340718,1,ENSG00000117322,-0.1126,0.0102


* Apply mr method to the susie_adj_format_input

In [39]:
mr_output$susie_adj

X_ID,num_CS,num_IV,meta_eff,se_meta_eff,meta_pval,meta_qval,Q,Q_pval,I2
<chr>,<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
ENSG00000117322,1,19,0.022,0.005,0,0,0,0,0
ENSG00000203710,1,12,0.01,0.001,0,0,0,1,0
ENSG00000110079,1,124,0.013,0.004,0,0,0,1,0
ENSG00000103550,1,36,-0.005,0.001,0,0,0,1,0
ENSG00000108556,1,11,0.005,0.001,0,0,0,1,0
ENSG00000161640,1,5,0.005,0.001,0,0,0,1,0


We can obtain six causal genes using the mr method. In fact, meta_pvals of ENSG00000108556 and ENSG00000161640 are equal to 0.0002 and 0.0003. Because we use round(x,3) to keep meta_pval to three decimal, all the meta_pval are zero in the table. And the Q value of first gene is equal to $5.538\times10^{-31}$, so the Q_pval is equal to 0.