# GeneticsMakie.jl

In [1]:
versioninfo()

Julia Version 1.7.0
Commit 3bf9d17731 (2021-11-30 12:12 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.5.0)
  CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)


# Why Makie.jl?

### *1. Plotting millions of data points is easy*

   <p align="center">
   <img src="./MHC-LD.png" width="400">
   </p>
   
*LD structure for ~66,000 SNPs in MHC region → ≈2 billion unique data points*

### *2. Plotting figures with complex layouts is easy*
   <p align="center">
   <img src="./complex-layout.png" width="600">
   </p>
   
*Raw publication-quality figure using Makie.jl's default layout tools w/o further modifications*

### Looks interesting? Check out Makie.jl [documentation](https://makie.juliaplots.org/stable/)!

# Why GeneticsMakie.jl?

+ The purpose of GeneticsMakie.jl is to facilitate visualization and interpretation of genetic association results
+ This is achieved by visualizing $\geq 100$s of genetic and genomic data simultaneously
+ GeneticsMakie.jl is supposed to work with other OpenMendel and Julia Data Science packages

### Example phenome-scale LocusZoom plots
   <p align="center">
   <img src="./GRIN2A-locuszoom.png" width="800">
   </p>
   
*GRIN2A is a high-confidence schizophrenia risk gene*

   <p align="center">
   <img src="./MHC-locuszoom.png" width="800">
   </p>

*MHC region is one of the most pleiotropic regions in the human genome*

### Looks intriguing? Check out GeneticsMakie.jl [documentation](https://minsookim.info/GeneticsMakie.jl/dev/)!

# Example code for ADAMTSL3 locus in inguinal hernia

   <p align="center">
   <img src="./hernia.png" width="600">
   </p>

Visualizing the backbone of a LocusZoom plot requires genetic association data, gene annotation data, and LD reference panel. 

In [2]:
using GeneticsMakie, CairoMakie, CSV, DataFrames, SnpArrays

┌ Info: Precompiling GeneticsMakie [8ca62643-82d8-47b5-a233-a06d1654fb35]
└ @ Base loading.jl:1423
[33m[1m└ [22m[39m[90m@ MathTeXEngine ~/.julia/packages/MathTeXEngine/ZP0gS/src/parser/commands_registration.jl:48[39m
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mCompiling VCF parser...


In [3]:
set_theme!(font = "Arial")
@info "Loading GENCODE v39 annotation for chromosome 15"
@time gencode = CSV.read("./gencode.v39lift37.annotation.chr15.gtf.gz", DataFrame,
    header = ["seqnames", "source", "feature", "start", "end", "score", "strand", "phase", "info"],
    delim = "\t", skipto = 6)
gencode[!, :info]

  6.749336 seconds (3.57 M allocations: 269.635 MiB, 1.25% gc time, 94.20% compilation time)


┌ Info: Loading GENCODE v39 annotation for chromosome 15
└ @ Main In[3]:2


117418-element Vector{String}:
 "gene_id \"ENSG00000215567.5_1\"; " ⋯ 454 bytes ⋯ "\"; remap_status \"full_contig\";"
 "gene_id \"ENSG00000215567.5_1\"; " ⋯ 454 bytes ⋯ "\"; remap_status \"full_contig\";"
 "gene_id \"ENSG00000215567.5_1\"; " ⋯ 454 bytes ⋯ "\"; remap_status \"full_contig\";"
 "gene_id \"ENSG00000201241.1\"; ge" ⋯ 122 bytes ⋯ "stituted_missing_target \"V38\";"
 "gene_id \"ENSG00000201241.1\"; tr" ⋯ 255 bytes ⋯ "stituted_missing_target \"V38\";"
 "gene_id \"ENSG00000201241.1\"; tr" ⋯ 299 bytes ⋯ "stituted_missing_target \"V38\";"
 "gene_id \"ENSG00000258463.1_1\"; " ⋯ 160 bytes ⋯ "remap_target_status \"overlap\";"
 "gene_id \"ENSG00000258463.1_1\"; " ⋯ 404 bytes ⋯ "remap_target_status \"overlap\";"
 "gene_id \"ENSG00000258463.1_1\"; " ⋯ 450 bytes ⋯ "\"; remap_status \"full_contig\";"
 "gene_id \"ENSG00000274347.1_1\"; " ⋯ 158 bytes ⋯ " 1; remap_target_status \"new\";"
 "gene_id \"ENSG00000274347.1_1\"; " ⋯ 404 bytes ⋯ "g\"; remap_target_status \"new\";"
 "gene_id \"ENSG00

In [4]:
GeneticsMakie.parsegtf!(gencode)
names(gencode)

14-element Vector{String}:
 "seqnames"
 "source"
 "feature"
 "start"
 "end"
 "score"
 "strand"
 "phase"
 "info"
 "gene_id"
 "gene_name"
 "gene_type"
 "transcript_id"
 "transcript_support_level"

In [5]:
select!(gencode, :seqnames, :feature, :start, :end, :strand, :gene_id, :gene_name, :gene_type, :transcript_id)
@assert size(gencode) == (117_418, 9)

In [6]:
@info "Loading 1000 Genomes European reference panel for chromosome 15"
kgp = SnpData("./kgp.chr15")
@assert (503, 200_311) == size(kgp)

┌ Info: Loading 1000 Genomes European reference panel for chromosome 15
└ @ Main In[6]:1


In [7]:
@info "Loading GWAS results for chromosome 15"
gwas = CSV.read("./hernia.chr15.gz", DataFrame, comment = "##", missingstring = "NA")

┌ Info: Loading GWAS results for chromosome 15
└ @ Main In[7]:1


Unnamed: 0_level_0,CHR,POS,SNPID,Allele1,Allele2,AFAllele2,BETA,p
Unnamed: 0_level_1,Int64,Int64,String,String,String,Float64,Float64,Float64
1,15,20001226,rs28896870,C,T,0.113114,0.0110027,0.487073
2,15,20001774,rs28812614,T,C,0.257554,0.0216504,0.0621705
3,15,20004721,rs145629091,A,G,0.137294,0.0259993,0.0793635
4,15,20014120,rs12594432,G,A,0.138131,0.0267138,0.0689262
5,15,20017513,rs12900040,T,C,0.118101,0.0100042,0.522537
6,15,20021591,rs11535026,T,A,0.116481,0.00981448,0.531855
7,15,20021749,rs12595413,C,T,0.137905,0.0264183,0.0724226
8,15,20026191,rs533345786,A,AAAG,0.111671,0.0109291,0.504701
9,15,20026200,rs543944619,A,C,0.112151,0.010696,0.511987
10,15,20026202,rs565344963,G,A,0.112151,0.010696,0.511987


In [8]:
GeneticsMakie.mungesumstats!(gwas)

In [9]:
loci = GeneticsMakie.findgwasloci(gwas)

Unnamed: 0_level_0,CHR,BP,P
Unnamed: 0_level_1,String,Int64,Float64
1,15,84419314,4.24936e-08


In [10]:
GeneticsMakie.findgwasloci(gwas; p = 1e-6)

Unnamed: 0_level_0,CHR,BP,P
Unnamed: 0_level_1,String,Int64,Float64
1,15,84419314,4.24936e-08
2,15,67467297,2.48762e-07


In [11]:
GeneticsMakie.findclosestgene(loci, gencode)

Unnamed: 0_level_0,CHR,BP,gene,distance
Unnamed: 0_level_1,String,Int64,String,Int64
1,15,84419314,TUBAP4,10363


In [12]:
GeneticsMakie.findclosestgene(loci, gencode; start = true)

Unnamed: 0_level_0,CHR,BP,gene,distance
Unnamed: 0_level_1,String,Int64,String,Int64
1,15,84419314,TUBAP4,10778


In [13]:
GeneticsMakie.findclosestgene(loci, gencode; proteincoding = true)

Unnamed: 0_level_0,CHR,BP,gene,distance
Unnamed: 0_level_1,String,Int64,String,Int64
1,15,84419314,ADAMTSL3,96474


In [14]:
loci = GeneticsMakie.findclosestgene(loci, gencode; start = true, proteincoding = true)

Unnamed: 0_level_0,CHR,BP,gene,distance
Unnamed: 0_level_1,String,Int64,String,Int64
1,15,84419314,ADAMTSL3,96474


In [15]:
function locuszoom(genes)
    for gene in genes
        @info "Working on $gene gene"
        window = 1e6
        chr, start, stop = GeneticsMakie.findgene(gene, gencode)
        range1, range2 = start - window, stop + window
        @info "Subsetting GWAS results"
        gwas_subset = gwas[findall((gwas.CHR .== chr) .& (gwas.BP .>= range1) .& (gwas.BP .<= range2)), :]
        @info "Plotting LocusZoom"
        f = Figure(resolution = (306, 1500))
        axs = [Axis(f[i, 1]) for i in 1:3]
        GeneticsMakie.plotlocus!(axs[1], chr, range1, range2, gwas_subset)
        GeneticsMakie.plotlocus!(axs[2], chr, range1, range2, gwas_subset; ld = kgp)
        for i in 1:2
            Label(f[i, 1, Top()], "Inguinal hernia (2022)", textsize = 6, halign = :left, padding = (7.5, 0, -5, 0))
            rowsize!(f.layout, i, 30)
        end
        rs = GeneticsMakie.plotgenes!(axs[3], chr, range1, range2, gencode; height = 0.1)
        rowsize!(f.layout, 3, rs)
        GeneticsMakie.labelgenome(f[3, 1, Bottom()], chr, range1, range2)
        Colorbar(f[1:2, 2], limits = (0, 1), ticks = 0:1:1, height = 20,
            colormap = (:gray60, :red2), label = "LD", ticksize = 0, tickwidth = 0,
            tickalign = 0, ticklabelsize = 6, flip_vertical_label = true,
            labelsize = 6, width = 5, spinewidth = 0.5)
        Label(f[1:2, 0], text = "-log[p]", textsize = 6, rotation = pi / 2)
        for i in 1:3
            vlines!(axs[i], start, color = (:gold, 0.5), linewidth = 0.5)
            vlines!(axs[i], stop, color = (:gold, 0.5), linewidth = 0.5)
        end
        for i in 1:2
            lines!(axs[i], [range1, range2], fill(-log(10, 5e-8), 2), color = (:purple, 0.5), linewidth = 0.5)
        end
        rowgap!(f.layout, 5)
        colgap!(f.layout, 5)
        resize_to_layout!(f)
        save("./$(gene)-locuszoom.png", f, px_per_unit = 4)
    end
end

@time locuszoom(loci.gene)

 73.789035 seconds (245.55 M allocations: 12.107 GiB, 3.41% gc time, 84.40% compilation time)


┌ Info: Working on ADAMTSL3 gene
└ @ Main In[15]:3
┌ Info: Subsetting GWAS results
└ @ Main In[15]:7
┌ Info: Plotting LocusZoom
└ @ Main In[15]:9


In [16]:
@time locuszoom(loci.gene)

 10.403400 seconds (57.06 M allocations: 2.102 GiB, 3.55% gc time)


┌ Info: Working on ADAMTSL3 gene
└ @ Main In[15]:3
┌ Info: Subsetting GWAS results
└ @ Main In[15]:7
┌ Info: Plotting LocusZoom
└ @ Main In[15]:9


   <p align="center">
   <img src="./ADAMTSL3-locuszoom.png" width="600">
   </p>

# An example workflow for phenome-scale LocusZoom
Hypothetical scenario: you have run a GWAS and would like to visualize genome-wide significant loci automatically with other GWAS results and functional genomic annotations

1. Munge GWAS summary statistics (using mungesumstats! function)
2. Save each GWAS result as an Arrow or Parquet file (using Arrow.jl or Parquet.jl packages)
3. Find GWAS loci for your phenotypes of interest (using findgwasloci function)
4. Iterate through GWAS loci, subsetting genomic regions from Arrow or Parquet files
5. Add other functional genomic data as separate layers as needed

Checkout example code in https://github.com/mmkim1210/GeneticsMakieExamples!

# Other functionalities
 
   <p align="center">
   <img src="./manhattan.png" width="800">
   </p>