Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
R
 
 
 
 
 
 
man
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

coloc

The coloc package can be used to perform genetic colocalisation analysis of two potentially related phenotypes, to ask whether they share common genetic causal variant(s) in a given region.

version 4

This is an updated version of coloc. I have tested it, but there may be bugs. Please test it, and let me know whether it works or not (both kinds of feedback useful!).

It is not yet on CRAN. To install the new version, do

if(!require("remotes"))
   install.packages("remotes") # if necessary
library(remotes)
install_github("chr1swallace/coloc")

Background

The new ideas are described in

Wallace C (2020) Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses. PLOS Genetics 16(4): e1008720

For usage, please see the vignette at https://chr1swallace.github.io/coloc

Key previous references are:

Build Status CRAN_Status_Badge

Frequently Asked Questions

If I understand correctly, coloc.abf() can be run with correlated variants, that is, no prerequisite for taking through LD pruning/clumping is required. Am I correct in my understanding ?

Yes, coloc.abf() and coloc.signals() assume they are given a dense map of all SNPs in a region that could be causal. Do not prune and clump.

Assume I identify a sentinel variant for a block of genome, can I do a comparison with just one variant using coloc.abf()?

No, coloc.abf() and coloc.signals() assume they are given a dense map of all SNPs in a region that could be causal. This means you need to give all SNPs in a region. You can imagine they ask whether the patterns "match" across this region of SNPs, and a single variant does not represent a pattern.

Can the process of identifying colocalized variants be carried out genome wide or is it meant to be done in defined small regions?

You need to break the genome into smaller regions, within which it is reasonable to assume there is at most one (coloc.abf) or a small number (coloc.signals) of causal variants per trait. One way to do this is to use the boundaries defined by recombination hotspots, proxied by this map created by lddetect.

How is coloc.abf accounting for the correlated variants?

That's really how coloc works - by exploiting a dense SNP map - please see the original paper

How to define priors, is it dependent on sample size or any other parameters?

This is described in detail in the latest paper

What does high PP4 mean?

The summary printed on the screen by coloc.abf() and coloc.signals() shows the posterior probability of whether a shared causal variant exists in the region. High PP4 does not mean all variants are causal and shared - to check which variants are most likely to be causal look at the SNP.PP column in the returned detailed results data.frame.

Notes to self

Note to self: to generate vignettes:

cp vignettes/colocqq-tests.R.tospin vignettes/colocqq-tests.R && Rscript -e 'knitr::spin("vignettes/colocqq-tests.R",knit=FALSE); devtools::build_vignettes()'

Note to self: to generate website:

https://chr1swallace.github.io/coloc/

Rscript -e "pkgdown::build_site()"
You can’t perform that action at this time.