# Colocalization with FastENLOC 

## Aim

The purpose of this notebook is to demonstrate a colocalization analysis workflow utilizing fastENLOC

## Input

1. GWAS summary statistics file including the following columns (minimum)

- snp_id: variant ID, format must match in all files
- chr: chromosome number
- pos: base pair position
- z_score: z-score values

2. eQTL Annotation file produced by DAP-G including the following columns:

- chr: chromosome number
- pos: base pair position
- snp_id: variant ID
- a1: effect allele
- a2: second allele
- pip: posterior inclusion probabilitiy, in the format (remove spaces around symbols): 

  GENE : CLUSTER @ TISSUE = INDIVID_PIP[ CLUSTER_PIP : NUM ] | GENE : CLUSTER @ TISSUE2 = INDIVID_PIP[ CLUSTER_PIP : NUM ]...
  
  where NUM = total number of variants within that cluster
  

3. LD correlation matrix file for all SNPs included in eQTL data file.


## Output 

1) Enrichment analysis result `prefix.enloc.enrich.rst`: estimated enrichment parameters and standard errors.

2) Signal-level colocalization result `prefix.enloc.sig.out`: the main output from the colocalization analysis wi th the following format
- column 1: signal cluster name (from eQTL analysis)
- column 2: number of member SNPs
- column 3: cluster PIP of eQTLs
- column 4: cluster PIP of GWAS hits (without eQTL prior)
- column 5: cluster PIP of GWAS hits (with eQTL prior)
- column 6: regional colocalization probability (RCP)

3) SNP-level colocalization result `prefix.enloc.snp.out`: SNP-level colocalization output with the following form at
- column 1: signal cluster name
- column 2: SNP name
- column 3: SNP-level PIP of eQTLs
- column 4: SNP-level PIP of GWAS (without eQTL prior)
- column 5: SNP-level PIP of GWAS (with eQTL prior)
- column 6: SNP-level colocalization probability

4) Sorted list of colocalization signals with  

  ```sort -grk6 prefix.enloc.sig.out ```

## Workflow

### Step 1: Prepare GWAS PIP 

Result file should have the following columns at minimum:

- snp_id 
- LD_block 
- z_score

In [None]:
[gwas_pip]
bash:
    perl format2torus.pl gwas_z.txt > gwas.zval
    gzip gwas.zval
    torus -d gwas.zval.gz --load_zval -dump_pip gwas.pip

### Step 2: Prepare eQTL Annotation File

#### Method 1: Use pre-computed GTEx multi-tissue eQTL annotation files

download: 

- hg38 Position ID: https://drive.google.com/open?id=1kfH_CffxyCtZcx3z7k63rIARNidLv1_P
- rsID: https://drive.google.com/open?id=1rSaHenk8xOFtQo7VuDZevRkjUz6iwuj0

#### Method 2: Derive annotations based on own eQTL data, using DAP-G

DAP-G annotations are produced through 2 parts: 

__Part 1__: Estimate priors with `torus`

Input file format:
1) SNP data file with the following columns: 

- gene: gene name

- snp: variant ID

- tss: distance to TSS (transcription starting site)

- pval: p-value

- beta: beta-hat

- se: standard error of beta-hat

2) SNP map file with the following columns:

- snp: variant ID
- chr: chromosome number
- pos: base pair position

3) Gene map file with the following columns:

- gene: gene name
- chr: chromosome number
- tss: distiance to TSS
- tss: repeat of the above column



In [None]:
[estimate_prior]
bash: 
    torus -d prior.input.gz -smap snp_map.gz -gmap gene_map.gz -dump_prior data

__Part 2__: Annotate with `DAP-G`

In [None]:
[dap_annot]
bash: 
    dap-g -d_z z_file.txt -d_ld chr1_ld.ld.bin -p ./data -ld_control 0.5 --all -t 4 > dap_annot 
    perl summarize_dap2enloc.pl -dir dap_rst_dir -vcf vcfs/$gene.vcf.gz | gzip - > annot/$gene.annot.vcf.gz

### Step 3: Colocalization with fastENLOC

In [None]:
[fastenloc]
bash: 
    fastenloc -eqtl annot/$gene.annot.vcf.gz -gwas G1.gwas.pip.gz 

## Minimum Working Example: 

In [None]:
[example]
bash: 
    torus -d zval.gz --load_zval -dump_pip gwas.pip
    gzip gwas.pip
    torus -d for_prior/ENSG00000110079.prior.txt.gz --fastqtl -dump_prior dumpENSG00000110079
    dap-g -d_z z_files/ENSG00000110079.z.txt -d_ld lds/ENSG00000110079.ld -p dumpENSG00000110079/ENSG00000110079.prior -ld_control 0.5 --all -t 4 > dap_rst_dir/ENSG00000110079.dap
    perl summarize_dap2enloc.pl -dir dap_rst_dir -vcf vcfs/ENSG00000110079.vcf.gz | gzip - > annot/ENSG00000110079.annot.vcf.gz
    fastenloc -eqtl annot/ENSG00000110079.annot.vcf.gz -gwas gwas.pip.gz

#### Summary: 

In [4]:
head gwas.enloc.enrich.out

                Intercept    -9.075           -
Enrichment (no shrinkage)     3.892       0.136
Enrichment (w/ shrinkage)     3.822       0.135


In [6]:
head gwas.enloc.sig.out

Signal	Num_SNP	CPIP_qtl	CPIP_gwas_marginal	CPIP_gwas_qtl_prior	RCP
ENSG00000238009:1      1  9.999e-01 2.290e-05    9.479e-04      9.479e-04
ENSG00000227232:1      3  1.000e+00 6.493e-05    9.496e-04      9.107e-04
ENSG00000238009:2      1  9.999e-01 2.167e-05    8.967e-04      8.967e-04
ENSG00000269981:2      1  9.999e-01 2.085e-05    8.629e-04      8.629e-04
ENSG00000238009:3      1  9.999e-01 2.108e-05    8.724e-04      8.724e-04
ENSG00000269981:1      1  9.999e-01 2.108e-05    8.724e-04      8.724e-04
ENSG00000268903:1      1  9.999e-01 2.108e-05    8.723e-04      8.723e-04
ENSG00000239906:1      2  9.998e-01 4.184e-05    8.909e-04      8.721e-04
ENSG00000279457:1      1  2.633e-01 2.027e-05    2.345e-04      2.209e-04


In [7]:
head gwas.enloc.snp.out

Signal	SNP	PIP_qtl	PIP_gwas_marginal	PIP_gwas_qtl_prior	SCP
ENSG00000238009:1   chr1_63671_G_A_b38   9.999e-01 2.290e-05    9.479e-04      9.479e-04
ENSG00000227232:1   chr1_665098_G_A_b38   3.025e-01 2.223e-05    2.923e-04      2.783e-04
ENSG00000227232:1   chr1_666028_G_A_b38   6.410e-01 2.202e-05    5.913e-04      5.841e-04
ENSG00000238009:2   chr1_108826_G_C_b38   9.999e-01 2.167e-05    8.967e-04      8.967e-04
ENSG00000269981:2   chr1_133160_G_A_b38   9.999e-01 2.085e-05    8.629e-04      8.629e-04
ENSG00000238009:3   chr1_135203_G_A_b38   9.999e-01 2.108e-05    8.724e-04      8.724e-04
ENSG00000269981:1   chr1_135203_G_A_b38   9.999e-01 2.108e-05    8.724e-04      8.724e-04
ENSG00000268903:1   chr1_135203_G_A_b38   9.999e-01 2.108e-05    8.723e-04      8.723e-04
ENSG00000239906:1   chr1_135203_G_A_b38   9.885e-01 2.108e-05    8.626e-04      8.624e-04
