# Manhattan Plot

Manhattan plots have become an essential tool to visualize GWAS summary statistics. Then, it is essential to have a highly customizable tool to draw this kind of plots. Even though there are many resources available to draw Manhattan plots, it still is a valuable functionality to have in out toolkit.

In this notebook we intend to show how can be used the library `IDEAL-GENOM` to generate a Manhattan Plot. 

In [None]:
import sys
import os

import pandas as pd

# add parent directory to path
library_path = os.path.abspath('..')
if library_path not in sys.path:
    sys.path.append(library_path)

from ideal_genom.visualization.manhattan_type import manhattan_draw
from ideal_genom.get_examples import get_trumpet_quantitative_example, get_top_loci_trumpet_quantitative, get_top_cond_trumpet_quantitative

The first example corresponds to a trumpet plot for a quantitative trait: height. We are going to use the summary statistics provided in the study:

Akiyama, M., et. al. (2019): Characterizing rare and low-frequency height-associated variants in the Japanese population. *Nature communications*, 10(1), 4393.

In [None]:
example_path = get_trumpet_quantitative_example()

In [None]:
chunk_size = 2000

filtered_chunks=[]

for chunk in pd.read_csv(example_path, sep=r'\s+', engine='python', chunksize=chunk_size):
    
    filtered_chunk = chunk[['CHR', 'POS', 'P_BOLT', 'Variants']].copy()
    filtered_chunk = filtered_chunk[filtered_chunk['P_BOLT'] < 0.05].reset_index(drop=True)
    filtered_chunks.append(filtered_chunk)

df_gwas = pd.concat(filtered_chunks, ignore_index=True)

Please, provide the columns to load from the **GWAS** summary statistics. The essential columns are those that contains chromosome, base-pair position, rsID and p-value. It is important to remark that columns names most coincide in both files, at least for the four columns that will be used to generate the Manhattan plot.

In [None]:
print("Number of SNPs in GWAS data: ", df_gwas.shape[0])
print("Columns in GWAS data: ", df_gwas.columns.to_list())

In [None]:
manhattan_draw(
    data_df=df_gwas, 
    snp_col='Variants', 
    chr_col='CHR', 
    pos_col='POS', 
    p_col='P_BOLT', 
    plot_dir=example_path.parent.as_posix(), 
    to_highlight=None, 
    save_name='manhattan_plot_simple.pdf', 
    genome_line = 5e-8, 
    yaxis_margin = 10
)

In [None]:
loci_path = get_top_loci_trumpet_quantitative()
cond_path = get_top_cond_trumpet_quantitative()

In [None]:
loci_hits = pd.read_csv(loci_path, sep=r'\s+', engine='python')
cond_hits = pd.read_csv(cond_path, sep=r'\s+', engine='python')

In [None]:
loci_hits = pd.merge(df_gwas, loci_hits, on='Variants', how='inner')
cond_hits = pd.merge(df_gwas, cond_hits, on=['CHR', 'POS'], how='inner')

In [None]:
loci_hits = loci_hits[['CHR', 'POS', 'P_BOLT', 'Variants']].copy()
loci_hits['hue']='loci'
cond_hits = cond_hits[['CHR', 'POS', 'P_BOLT', 'Variants']].copy()
cond_hits['hue']='cond'

highlight = pd.concat([loci_hits, cond_hits], ignore_index=True, axis=0)

In [None]:
manhattan_draw(
    data_df=df_gwas, 
    snp_col='Variants', 
    chr_col='CHR', 
    pos_col='POS', 
    p_col='P_BOLT', 
    plot_dir=example_path.parent.as_posix(), 
    save_name='manhattan_plot_high.pdf', 
    genome_line = 5e-8, 
    yaxis_margin = 10,
    to_highlight=highlight
)