# GWAS
 - **Project:** GP2 Parkinson's disease meta-GWAS on European, Ashkenazi Jewish, Finnish, and Icelandic individuals 
 - **Versions:** Python/3.10
 - **Last Updated:** November 2024 

### Notebook Overview
Use PoPS (polygenic priority score) to prioritize genes at GWAS loci.

### CHANGELOG
* 06-MAR-2025: Notebook cleaned 
* 31-OCT-2024: Notebook started

# Set it up.

In [None]:
import pandas as pd
import numpy as np
from scipy.stats import norm
import os
from google.colab import drive
from scipy.stats import zscore

! ls -lsth

Uncompress the tar.gzs that have been uploaded - these are big files take ~3hr 

In [None]:
! tar -xzvf *.tar.gz

# Run PoPS

In [None]:
%%bash
python ./pops-master/pops.py \
--gene_annot_path ./pops_features_pathway_naive_FUMA_compatible/gene_annot.txt \
--feature_mat_prefix ./pops_features_pathway_naive_FUMA_compatible/munged_features/pops_features \
--num_feature_chunks 99 --magma_prefix magma \
--control_features ./pops_features_pathway_naive_FUMA_compatible/control.features --out_prefix pops_run


  gene_annot_df = pd.read_csv(gene_annot_path, delim_whitespace=True).set_index("ENSGID")
  magma_df = pd.read_csv(magma_prefix + ".genes.out", delim_whitespace=True)


# Now merge with MAGMA outputs.

In [None]:
pops_df = pd.read_csv("pops_run.preds", sep="\t")
pops_positives_df = pops_df[pops_df['PoPS_Score'] > 0]
pops_positives_df.describe()

Unnamed: 0,PoPS_Score,Y,Y_proj
count,9218.0,9123.0,9123.0
mean,0.169208,1.252397,0.538638
std,0.138365,1.512785,1.510464
min,5.8e-05,-3.4669,-4.346082
25%,0.067506,0.25892,-0.444119
50%,0.141847,1.0719,0.367599
75%,0.237745,1.99455,1.281452
max,2.680602,14.933,14.37441


In [None]:
magma_df = pd.read_csv("magma.genes.out", sep="\t")
magma_significant_df = magma_df[magma_df['P'] <= (0.05/18650)]
magma_significant_df.describe()

Unnamed: 0,CHR,START,STOP,NSNPS,NPARAM,N,ZSTAT,P
count,380.0,380.0,380.0,380.0,380.0,380.0,380.0,380.0
mean,10.494737,61183450.0,61265500.0,204.410526,14.952632,1835938.0,5.803971,3.611899e-07
std,6.340178,52040970.0,52065850.0,504.450865,22.985894,0.0,1.120736,6.346264e-07
min,1.0,699537.0,764428.0,1.0,1.0,1835938.0,4.554,9.999999999999999e-51
25%,4.0,28439190.0,28456410.0,20.0,5.0,1835938.0,4.908875,8.675375e-11
50%,11.0,42461100.0,42523840.0,64.5,9.0,1835938.0,5.5301,1.6006e-08
75%,17.0,95817600.0,95835880.0,188.25,17.0,1835938.0,6.3832,4.581075e-07
max,22.0,243651500.0,244014400.0,5829.0,224.0,1835938.0,14.933,2.6322e-06


In [None]:
table_proto_df = pops_positives_df.merge(magma_significant_df, left_on='ENSGID', right_on='GENE')
table_proto_df.describe()
percentile_90 = table_proto_df['PoPS_Score'].quantile(0.90)
table_reduced_df = table_proto_df[table_proto_df['PoPS_Score'] >= percentile_90]
table_reduced_df.describe()
table_reduced_df.to_csv("table_reduced_df.csv")

The cell above made the short plot for display.
A larger plot of all positive PoPs scores and all MAGMA results is built bellow for the supplemental.

In [None]:
sup_table_df = pops_positives_df.merge(magma_df, left_on='ENSGID', right_on='GENE')
sup_table_df.to_csv("sup_table_df.csv")