# Analysis of the Mapping Data

Input

-> Neurommsig_pd_full_genes.csv

- A list of Biological mechanisms (NeuroMMSig) along with genes in those mechanisms associated to PD is contained in 

-> RiskSNPs.rda

- A mapping from the above-mentioned genes to SNPs in the respective mechanisms is available in the RData object

# Goals / Statistical relevant questions
Map Mechanisms to Gene to SNPs

 - How many Mechanism can actually be mapped to genes that reveal prs scores from the PheWAS catalog. How many 
     Mechanisms are useful then?
 - For how many genes can we calculate prs scores regarding the mapped snps?
 - Side note: Whats the difference between comp and compDF?

In [102]:
#imports

In [1]:
import pyreadr
import math
import pandas as pd

In [19]:
# first calculations ignoring RiskSNP.rda. Just NeuroMMsig mapping and 3700 SNPs from Ashar without the actual mapping.
# In this case: mapping done by PheWAS.
catalog=pd.read_csv("phewascatalog_full.csv",delimiter = ',')
catalog

Unnamed: 0,chromosome,snp,phewas phenotype,cases,p-value,odds-ratio,gene_name,phewas code,gwas-associations
0,19 45395619,rs2075650,Alzheimer's disease,737,5.237000e-28,2.4100,TOMM40,290.11,"Alzheimer's disease, Alzheimer's disease bioma..."
1,19 45395619,rs2075650,Dementias,1170,2.409000e-26,2.1140,TOMM40,290.10,"Alzheimer's disease, Alzheimer's disease bioma..."
2,6 396321,rs12203592,Actinic keratosis,2505,4.141000e-26,1.6910,IRF4,702.10,"Eye color, Hair color, Freckling, Progressive ..."
3,6 26093141,rs1800562,Iron metabolism disorder,40,3.409000e-25,12.2700,HFE,275.10,"Mean corpuscular hemoglobin, Glycated hemoglob..."
4,19 45395619,rs2075650,Delirium dementia and amnestic disorders,1566,8.027000e-24,1.8410,TOMM40,290.00,"Alzheimer's disease, Alzheimer's disease bioma..."
...,...,...,...,...,...,...,...,...,...
215102,17 7417663,rs6761,"Infertility, male",38,5.000000e-02,0.5945,POLR2A,609.10,Sex hormone binding globulin
215103,3 156626091,rs12638253,Fibroadenosis of breast,26,5.000000e-02,0.5560,LEKR1,610.20,Multiple sclerosis
215104,18,rs7504990,"Inflammatory disease of cervix, vagina, and vulva",660,5.000000e-02,0.8708,DCC,614.50,Gallbladder cancer
215105,19,rs732505,Chronic liver disease and cirrhosis,377,5.000000e-02,1.2920,SAFB2,571.00,Willebrand factor levels


In [3]:
def prs(genes,catalog,filtering,log=False,p_threshold=5.000000e-02):
    
    """ Function that takes the genes list + csv as input and filters the csv by the given genes and "Parkinson's disease" gwas association.
        Optional is a p-value threshold (default=0.05) that dismisses all entries larger than the set threshold 
        and summation either over the Log(OR) or OR.
        Returns naive polygenetic risk score (prs) as simple summation of odds ratios / log odds ratios (OR)
    """
    
    assert type(genes)==list
    
    pat = '|'.join(r"{}".format(x) for x in genes) #compile search pattern for input genes of interest
    catalog_filtered = catalog[catalog["gene_name"].str.contains(pat,na=False)] #filtered catalog containing genes of interest only
    #print ("Catalog containing the inut genes only",catalog_filtered)
    #print ()
    catalog_filtered_pd=catalog_filtered[catalog_filtered[filtering].str.contains("Parkinson's disease")] #filter catalog on pd phenotype only
    catalog_filtered_pd_pvalue_series=catalog_filtered_pd["p-value"]<=p_threshold
    catalog_filtered_pd_pvalue=catalog_filtered_pd[catalog_filtered_pd_pvalue_series]
    
    snps_used=catalog_filtered_pd_pvalue['snp'].tolist()
    
    print (catalog_filtered_pd_pvalue)
    
    naive_prs=catalog_filtered_pd_pvalue["odds-ratio"].sum() #calculation of prs by choice of the user
    if log==True:
        return math.log10(naive_prs),snps_used
    else:
        return naive_prs,snps_used

In [3]:
# Mapping
rda_dic = pyreadr.read_r('LordickData/RiskSNPs.rda')
comb_df=rda_dic["comb"]
comb_df_PD=rda_dic["combPD"]

In [4]:
comb_dic=comb_df.to_dict("list")
comb_df

Unnamed: 0,rsID,GENCODE_name
0,rs10000511,DCHS2
1,rs10005890,RP11-389E17.1
2,rs10006397,SLC2A9
3,rs1001168,TNRC6C
4,rs10013533,DCHS2
...,...,...
7864,rs10093964,CHRNA2
7865,rs7007145,CHRNA2
7866,rs752994,CHRNA2
7867,rs919494,CHRNA2


In [4]:
# first approach: Using the "neurommsig_pd_full_genes.csv" MechanismToGenes mapping to map the genes to SNPs in PheWAS
# PD associated MechanismToGenes dictionary 
genes=pd.read_csv('LordickData/neurommsig_pd_full_genes.csv')
MechanismToGene=genes.to_dict("list")

In [5]:
genes.head(5)

Unnamed: 0,MAPK subgraph,Calcium-dependent signal transduction,GRB10 subgraph,Synuclein subgraph,Notch signaling subgraph,Calsyntenin subgraph,Ubiquitin subgraph,Chaperone subgraph,PINK1 subgraph,Caspase subgraph,...,Matrix metalloproteinase subgraph,RhoA subgraph,Bcl-2 subgraph,Mitochondria fusion subgraph,Unfolded protein response subgraph,Vascular endothelial growth factor subgraph,Amyloidogenic subgraph,Toll like receptor subgraph,Estrogen subgraph,Cyclic AMP subgraph
0,FRA6E,ESR2,NEDD4,MAPT,MAPT,CLSTN2,MAPK8,CRYAB,OTC,PINK1,...,PINK1,MMP9,BAX,PINK1,MAPT,HGS,HSD17B10,TLR2,ESR2,PDE4D
1,HIST3H3,ESR1,PARK11,HIST3H3,OTC,,VDAC2P1,ROS1,HNRNPF,VDAC1,...,VDAC1,RHOT2,PINK1,HUWE1,CCL2,VEGFA,APP,TLR1,ESR1,MTOR
2,MAPK8,CACNA1A,DDC,GAK,NFE2L2,,TRIP12,CHCHD4,MCL1,GMFA,...,MMP3,EFHD2,BID,MFN1,OTC,KDR,,TLR4,SCN10A,RHEB
3,MAP2K3,CXCL8,INSR,MT-ND1,HGS,,VDAC1P10,GFER,CHCHD4,BECN1,...,USP30,RAC1,MCL1,OPA1,EIF2S1,,,,,
4,LRRK2,S100B,PIK3CG,BCL2L1,LRRK2,,DNM1,PINK1,HTR2A,MAVS,...,TOMM40,NOX1,GDNF,RHOT2,DDIT3,,,,,


In [6]:
print("Mechanisms Count:",len(MechanismToGene.keys()))

Mechanisms Count: 64


In [7]:
#quickly remove nan values from gene lists to enhance efficiency
def remove_float_from_list(the_list):
    return [value for value in the_list if type(value)!=float]

MechanismToGene_noNans={x:remove_float_from_list(y) for x,y in MechanismToGene.items() }

In [8]:
count=0
genes_total=set()
for key in MechanismToGene_noNans.keys():
    values=[x for x in MechanismToGene_noNans[key] if type(x)!=float]
    genes_total.update(values)
    count+=len(values)
print ("Associations:",count)
print("total genes:",len(genes_total))

Associations: 913
total genes: 394


In [9]:
MechanismToGeneToSNPcount={x:[len(y),0] for x,y in MechanismToGene_noNans.items()}
print (MechanismToGeneToSNPcount)

{'MAPK subgraph': [32, 0], 'Calcium-dependent signal transduction': [9, 0], 'GRB10 subgraph': [16, 0], 'Synuclein subgraph': [30, 0], 'Notch signaling subgraph': [23, 0], 'Calsyntenin subgraph': [1, 0], 'Ubiquitin subgraph': [32, 0], 'Chaperone subgraph': [31, 0], 'PINK1 subgraph': [32, 0], 'Caspase subgraph': [17, 0], 'Reactive oxygen species subgraph': [16, 0], 'miRNA subgraph': [19, 0], 'Mitochondrial subgraph': [32, 0], 'Mitophagy subgraph': [15, 0], 'LRRK2 subgraph': [20, 0], 'CRH subgraph': [5, 0], 'Pre-translational events': [18, 0], 'Regulation of cytoskeleton subgraph': [32, 0], 'Akt/mTOR subgraph': [8, 0], 'Phosphatidylinositol 3 subgraph': [7, 0], 'Apoptosis signaling subgraph': [32, 0], 'Autophagy signaling subgraph': [16, 0], 'Wnt signaling subgraph': [4, 0], 'GSK3 subgraph': [4, 0], 'Tau protein subgraph': [15, 0], 'ATP13A2 subgraph': [2, 0], 'Endoplasmic reticulum-Golgi protein export': [8, 0], 'Tumor necrosis factor subgraph': [10, 0], 'Response to oxidative stress': [8

In [11]:
MechanismToGene_noNans["MAPK subgraph"]

['FRA6E',
 'HIST3H3',
 'MAPK8',
 'MAP2K3',
 'LRRK2',
 'DNAJC27',
 'MAP3K4',
 'MARK2',
 'MAP2K4',
 'MAP2K7',
 'PINK1',
 'JUN',
 'ECSIT',
 'MAP2K6',
 'NLRX1',
 'IL1B',
 'MAPK3',
 'RAD18',
 'IL18',
 'DUSP1',
 'PRKN',
 'HTRA2',
 'MAP1LC3B',
 'NLRP3',
 'RPS6KA5',
 'ATF4',
 'XBP1',
 'MAPK10',
 'MAPK9',
 'PARK7',
 'CASP1',
 'GIGYF2']

In [21]:
# calculating the prs for each mechanism using the mapped genes only and PheWAS mapping, filtering: phewas association
catalog=pd.read_csv("phewascatalog_full.csv",delimiter = ',')
count_noprs=0
count_prs=0


snp_set=set()

for mechanism in MechanismToGene_noNans.keys():
    print ("PD-associated mechanism: ", mechanism)
    print ("associated memes: ",MechanismToGene_noNans[mechanism])
    prs_= prs(MechanismToGene_noNans[mechanism],catalog,"phewas phenotype",log=False)
    MechanismToGeneToSNPcount[mechanism][1]+=len(set(prs_[1]))
    print("PRS: ",prs_[0])
    if prs_[0]>0.0:
        snp_set|= set(prs_[1])
        print("Set: ",snp_set)
        count_prs+=1
    else:
        count_noprs+=1

PD-associated mechanism:  MAPK subgraph
associated memes:  ['FRA6E', 'HIST3H3', 'MAPK8', 'MAP2K3', 'LRRK2', 'DNAJC27', 'MAP3K4', 'MARK2', 'MAP2K4', 'MAP2K7', 'PINK1', 'JUN', 'ECSIT', 'MAP2K6', 'NLRX1', 'IL1B', 'MAPK3', 'RAD18', 'IL18', 'DUSP1', 'PRKN', 'HTRA2', 'MAP1LC3B', 'NLRP3', 'RPS6KA5', 'ATF4', 'XBP1', 'MAPK10', 'MAPK9', 'PARK7', 'CASP1', 'GIGYF2']
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Calcium-dependent signal transduction
associated memes:  ['ESR2', 'ESR1', 'CACNA1A', 'CXCL8', 'S100B', 'NOX1', 'LRRK2', 'CACNB2', 'SCN10A']
        chromosome        snp     phewas phenotype  cases   p-value  \
18075  10 18759629  rs7076247  Parkinson's disease    252  0.004015   
60517  10 18730368  rs7069923  Parkinson's disease    252  0.013970   

       odds-ratio gene_name  phewas code        gwas-associations  
18075       1.307    CACNB2        332.0   

Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Apoptosis signaling subgraph
associated memes:  ['MAPT', 'OTC', 'ULK1', 'MCL1', 'CHCHD4', 'HTR2A', 'MT-ND1', 'DDIT3', 'GFER', 'FOXO3', 'PINK1', 'VDAC1', 'SNCA', 'OPTN', 'SIRT3', 'PMAIP1', 'DUSP1', 'CDKN1A', 'HTRA2', 'NDUFA8', 'AIFM3', 'AIF1', 'COX17', 'TOMM22', 'DAPK1', 'AIFM2', 'CSNK2A1', 'SEPT4', 'FAF1', 'AIFM1', 'XBP1', 'SOD2']
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Autophagy signaling subgraph
associated memes:  ['ULK1', 'PINK1', 'MAP1LC3A', 'CASP1', 'BECN1', 'WIPI1', 'ATG7', 'AMBRA1', 'NLRP3', 'LRRK2', 'IL1B', 'IL18', 'OPTN', 'ATG5', 'PRKN', 'CALCOCO2']
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-

Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Cholesterol metabolism subgraph
associated memes:  ['PINK1', 'FDFT1', 'APOE', 'CYP2E1', 'UBIAD1']
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Disaccharide metabolism subgraph
associated memes:  ['GFPT2', 'NFE2L2', 'ATF1', 'GPD2', 'RETN', 'GBA', 'H6PD', 'NDST4', 'KEAP1']
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Interferon signaling subgraph
associated memes:  ['IRF6', 'IFIH1', 'MAVS', 'NLRP3', 'HSP90AA1', 'NFKB1', 'IL1B', 'IFNB1', 'TBK1', 'IFNG', 'DDX58', 'TOMM70', 'IRF3']
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-valu

In [14]:
print ("Mechanisms with genes that can be associated to PD SNPs in PheWAS:", count_prs)
print ("Mechanisms with genes that can't be associated to PD SNPS:", count_noprs)

Mechanisms with genes that can be associated to PD SNPs in PheWAS: 8
Mechanisms with genes that can't be associated to PD SNPS: 56


In [15]:
print("SNPs actually used:",snp_set)
#only 4 for all mechanisms!

SNPs actually used: {'rs7069923', 'rs157580', 'rs9472138', 'rs7076247'}


In [22]:
print (MechanismToGeneToSNPcount) ###PLOT THIS

{'MAPK subgraph': [32, 0], 'Calcium-dependent signal transduction': [9, 2], 'GRB10 subgraph': [16, 0], 'Synuclein subgraph': [30, 0], 'Notch signaling subgraph': [23, 0], 'Calsyntenin subgraph': [1, 0], 'Ubiquitin subgraph': [32, 0], 'Chaperone subgraph': [31, 0], 'PINK1 subgraph': [32, 0], 'Caspase subgraph': [17, 0], 'Reactive oxygen species subgraph': [16, 2], 'miRNA subgraph': [19, 0], 'Mitochondrial subgraph': [32, 1], 'Mitophagy subgraph': [15, 0], 'LRRK2 subgraph': [20, 0], 'CRH subgraph': [5, 0], 'Pre-translational events': [18, 0], 'Regulation of cytoskeleton subgraph': [32, 1], 'Akt/mTOR subgraph': [8, 1], 'Phosphatidylinositol 3 subgraph': [7, 0], 'Apoptosis signaling subgraph': [32, 0], 'Autophagy signaling subgraph': [16, 0], 'Wnt signaling subgraph': [4, 0], 'GSK3 subgraph': [4, 0], 'Tau protein subgraph': [15, 0], 'ATP13A2 subgraph': [2, 0], 'Endoplasmic reticulum-Golgi protein export': [8, 0], 'Tumor necrosis factor subgraph': [10, 0], 'Response to oxidative stress': [8

In [None]:
pd.DataFrame.from_dict(data)

In [None]:
##################### filtering: gwas associaton #####################

In [30]:
# calculating the prs for each mechanism using the mapped genes only and PheWAS mapping, filtering: gwas associaton
catalog=pd.read_csv("phewascatalog_full.csv",delimiter = ',')
count_noprs=0
count_prs=0

snp_set=set()

for mechanism in MechanismToGene_noNans.keys():
    print ("PD-associated mechanism: ", mechanism)
    print ("associated memes: ",MechanismToGene_noNans[mechanism])
    prs_= prs(MechanismToGene_noNans[mechanism],catalog,"gwas-associations",log=False) 
    print("PRS: ",prs_[0])
    if prs_[0]>0.0:
        snp_set|= set(prs_[1])
        print("Set: ",snp_set)
        count_prs+=1
    else:
        count_noprs+=1

PD-associated mechanism:  MAPK subgraph
associated memes:  ['FRA6E', 'HIST3H3', 'MAPK8', 'MAP2K3', 'LRRK2', 'DNAJC27', 'MAP3K4', 'MARK2', 'MAP2K4', 'MAP2K7', 'PINK1', 'JUN', 'ECSIT', 'MAP2K6', 'NLRX1', 'IL1B', 'MAPK3', 'RAD18', 'IL18', 'DUSP1', 'PRKN', 'HTRA2', 'MAP1LC3B', 'NLRP3', 'RPS6KA5', 'ATF4', 'XBP1', 'MAPK10', 'MAPK9', 'PARK7', 'CASP1', 'GIGYF2']
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Calcium-dependent signal transduction
associated memes:  ['ESR2', 'ESR1', 'CACNA1A', 'CXCL8', 'S100B', 'NOX1', 'LRRK2', 'CACNB2', 'SCN10A']
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  GRB10 subgraph
associated memes:  ['NEDD4', 'PARK11', 'DDC', 'INSR', 'PIK3CG', 'GRB10', 'INS', 'HGS', 'AKT1', 'LMX1A', 'MAPK3', 'IGF1R', 'IGF1', 'G

Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Caspase subgraph
associated memes:  ['PINK1', 'VDAC1', 'GMFA', 'BECN1', 'MAVS', 'NLRP3', 'CYCS', 'SNCA', 'NFKB1', 'LRRK2', 'IL1B', 'CASP8', 'CASP1', 'CASP3', 'PRKN', 'PYCARD', 'MAP1LC3B']
        chromosome         snp  \
333     4 90678541   rs2736990   
2604    4 90639515  rs11931074   
2642    4 90641340    rs356220   
6056    4 90678541   rs2736990   
7510    4 90639515  rs11931074   
...            ...         ...   
211289  4 90678541   rs2736990   
213704  4 90641340    rs356220   
213942  4 90639515  rs11931074   
214144  4 90678541   rs2736990   
214801  4 90641340    rs356220   

                                         phewas phenotype  cases   p-value  \
333                              Acute reaction to stress    339  0.000018   
2604    Vascular complications of surgery and medical ...     76  0.0

        chromosome         snp  \
333     4 90678541   rs2736990   
2604    4 90639515  rs11931074   
2642    4 90641340    rs356220   
6056    4 90678541   rs2736990   
7510    4 90639515  rs11931074   
...            ...         ...   
211289  4 90678541   rs2736990   
213704  4 90641340    rs356220   
213942  4 90639515  rs11931074   
214144  4 90678541   rs2736990   
214801  4 90641340    rs356220   

                                         phewas phenotype  cases   p-value  \
333                              Acute reaction to stress    339  0.000018   
2604    Vascular complications of surgery and medical ...     76  0.000461   
2642                             Acute reaction to stress    339  0.000471   
6056              Other disorders of stomach and duodenum    270  0.001218   
7510                                    Vascular dementia    162  0.001548   
...                                                   ...    ...       ...   
211289                 Other cardiac conducti

         chromosome        snp  \
257     17 41436901  rs8070723   
2288    17 41436901  rs8070723   
3726    17 41436901  rs8070723   
4524    17 41436901  rs8070723   
7048    17 41436901  rs8070723   
...             ...        ...   
207569  17 41436901  rs8070723   
207691  17 41436901  rs8070723   
209824  17 41436901  rs8070723   
211032  17 41436901  rs8070723   
215079  17 41436901  rs8070723   

                                         phewas phenotype  cases   p-value  \
257                                  Infection of the eye   1229  0.000009   
2288                                  Sprains and strains   2841  0.000394   
3726                                     Ptosis of eyelid    597  0.000697   
4524                           Conjunctivitis, infectious    939  0.000870   
7048                                Keratitis, infectious    146  0.001441   
...                                                   ...    ...       ...   
207569  Phlebitis and thrombophlebitis of low

         chromosome         snp  \
257     17 41436901   rs8070723   
333      4 90678541   rs2736990   
2288    17 41436901   rs8070723   
2604     4 90639515  rs11931074   
2642     4 90641340    rs356220   
...             ...         ...   
213704   4 90641340    rs356220   
213942   4 90639515  rs11931074   
214144   4 90678541   rs2736990   
214801   4 90641340    rs356220   
215079  17 41436901   rs8070723   

                                         phewas phenotype  cases   p-value  \
257                                  Infection of the eye   1229  0.000009   
333                              Acute reaction to stress    339  0.000018   
2288                                  Sprains and strains   2841  0.000394   
2604    Vascular complications of surgery and medical ...     76  0.000461   
2642                             Acute reaction to stress    339  0.000471   
...                                                   ...    ...       ...   
213704                          B

Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Nitric oxide subgraph
associated memes:  ['DMD', 'NOS2', 'NOS1']
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Interleukin signaling subgraph
associated memes:  ['IL6', 'IL17A', 'IL33', 'VDAC1', 'GMFA', 'BECN1', 'MAVS', 'NLRP3', 'MAP1LC3B', 'VDAC2', 'CXCL8', 'IBSP', 'IL1B', 'IL18', 'CACNB2', 'IL2', 'IRF3']
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Dopaminergic subgraph
associated memes:  ['BDNF', 'CRYAB', 'BCL2L1', 'LRRK2', 'LMX1A', 'SLC18A2', 'MAOB', 'PITX3', 'GABRB1', 'DDC', 'GDNF', 'PPARGC1A', 'TOMM7', 'SNCA', 'DRD2', 'MIR133B', 'FOXA2', 'UCHL1'

        chromosome         snp  \
333     4 90678541   rs2736990   
2604    4 90639515  rs11931074   
2642    4 90641340    rs356220   
6056    4 90678541   rs2736990   
7510    4 90639515  rs11931074   
...            ...         ...   
211289  4 90678541   rs2736990   
213704  4 90641340    rs356220   
213942  4 90639515  rs11931074   
214144  4 90678541   rs2736990   
214801  4 90641340    rs356220   

                                         phewas phenotype  cases   p-value  \
333                              Acute reaction to stress    339  0.000018   
2604    Vascular complications of surgery and medical ...     76  0.000461   
2642                             Acute reaction to stress    339  0.000471   
6056              Other disorders of stomach and duodenum    270  0.001218   
7510                                    Vascular dementia    162  0.001548   
...                                                   ...    ...       ...   
211289                 Other cardiac conducti

Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Lipid metabolism subgraph
associated memes:  ['IL33', 'IL17A', 'CCL2', 'MTR', 'GMFA', 'PLA2G6', 'SNCA', 'NFKB1', 'PPA1', 'LRRK2', 'IL1B', 'TNF']
        chromosome         snp  \
333     4 90678541   rs2736990   
2604    4 90639515  rs11931074   
2642    4 90641340    rs356220   
6056    4 90678541   rs2736990   
7510    4 90639515  rs11931074   
...            ...         ...   
211289  4 90678541   rs2736990   
213704  4 90641340    rs356220   
213942  4 90639515  rs11931074   
214144  4 90678541   rs2736990   
214801  4 90641340    rs356220   

                                         phewas phenotype  cases   p-value  \
333                              Acute reaction to stress    339  0.000018   
2604    Vascular complications of surgery and medical ...     76  0.000461   
2642                             A

Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Cytokine signaling subgraph
associated memes:  ['NEDD4', 'INSR', 'HGS', 'LRRK2', 'CHUK', 'KDR']
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Matrix metalloproteinase subgraph
associated memes:  ['PINK1', 'VDAC1', 'MMP3', 'USP30', 'TOMM40', 'TOMM7', 'MMP9', 'CHCHD4', 'VDAC2', 'VPS35', 'TOMM20', 'RHO', 'TBK1', 'TOMM22', 'TH', 'PRKN', 'CSNK2A1', 'TOMM70', 'IRF3']
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  RhoA subgraph
associated memes:  ['MMP9', 'RHOT2', 'EFHD2', 'RAC1', 'NOX1', 'RHO', 'CDC42', 'USP30', 'PRKN', 'OPHN1', 'RHOA', 'STARD13']
Empty DataF

In [31]:
print ("Mechanisms with genes that can be associated to PD SNPs in PheWAS:", count_prs)
print ("Mechanisms with genes that can't be associated to PD SNPS:", count_noprs)
print("SNPs actually used:",snp_set)

Mechanisms with genes that can be associated to PD SNPs in PheWAS: 27
Mechanisms with genes that can't be associated to PD SNPS: 37
SNPs actually used: {'rs1564282', 'rs393152', 'rs356220', 'rs8070723', 'rs2736990', 'rs823156', 'rs11931074'}


In [32]:
####2nd approach filtering the PheWAS mapping using only SNPs also available in the 3700 SNPs database####

In [32]:
#redefining the function
def prs_3700(genes,catalog,snpslist,filtering,log=False,p_threshold=5.000000e-02):
    
    """ Function that takes the genes list + phewas catalog as input and filters the catalog by the given genes and "Parkinson's disease" gwas association.
        Optional is a p-value threshold (default=0.05) that dismisses all entries larger than the set threshold 
        and summation either over the Log(OR) or OR.
        Returns naive polygenetic risk score (prs) as simple summation of odds ratios / log odds ratios of the SNPs(OR)
        Only considers SNPs from the 3700 SNPs database.
        
    """
    
    assert type(genes)==list
    
    pat = '|'.join(r"{}".format(x) for x in genes) #compile search pattern for input genes of interest
    pat_2 = '|'.join(r"{}".format(x) for x in snpslist)
    catalog_filtered = catalog[catalog["gene_name"].str.contains(pat,na=False)] #filtered catalog containing genes of interest only
    catalog_filtered_3700 = catalog_filtered[catalog_filtered["snp"].str.contains(pat_2,na=False)]
    #print ("Catalog containing the input genes only",catalog_filtered)
    #print ()
    catalog_filtered_pd=catalog_filtered_3700[catalog_filtered_3700
                                              [filtering].str.contains("Parkinson's disease")] #filter catalog on pd phenotype only
    catalog_filtered_pd_pvalue_series=catalog_filtered_pd["p-value"]<=p_threshold
    catalog_filtered_pd_pvalue=catalog_filtered_pd[catalog_filtered_pd_pvalue_series]
    
    snps_used=catalog_filtered_pd_pvalue['snp'].tolist()
        
    print (catalog_filtered_pd_pvalue)
    
    naive_prs=catalog_filtered_pd_pvalue["odds-ratio"].sum() #calculation of prs by choice of the user
    if log==True:
        return math.log10(naive_prs),snps_used
    else:
        return naive_prs,snps_used

In [None]:
######## phenotype filtering #########

In [33]:
catalog=pd.read_csv("phewascatalog_full.csv",delimiter = ',')

#3700 snps
snps_3700=pd.read_csv('LordickData/PPMI_combPD_pass_CADD.csv')
snpsIDs_3700=snps_3700["ID"].tolist()

snp_set=set()

count_noprs=0
count_prs=0
for mechanism in MechanismToGene_noNans.keys():
    print ("PD-associated mechanism: ", mechanism)
    prs_3700_= prs_3700(MechanismToGene_noNans[mechanism],catalog,snpsIDs_3700,"phewas phenotype",log=False)
    print("PRS: ",prs_3700_[0])
    if prs_3700_[0]>0.0:
        count_prs+=1
        snp_set|= set(prs_3700_[1])
        print("Set: ",snp_set)
        
    else:
        count_noprs+=1

PD-associated mechanism:  MAPK subgraph
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Calcium-dependent signal transduction
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  GRB10 subgraph
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Synuclein subgraph
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Notch signaling subgraph
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0

Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  G-protein-mediated signaling
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Energy metabolic subgraph
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Cholesterol metabolism subgraph
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Disaccharide metabolism subgraph
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated me

In [34]:
print ("Mechanisms with genes that can be associated to PD SNPs in PheWAS:", count_prs)
print ("Mechanisms with genes that can't be associated to PD SNPS:", count_noprs)
print("SNPs actually used:",snp_set)

Mechanisms with genes that can be associated to PD SNPs in PheWAS: 0
Mechanisms with genes that can't be associated to PD SNPS: 64
SNPs actually used: set()


Zero!

In [None]:
##### gwas - association filtering ##### 

In [35]:
#3700 snps
snps_3700=pd.read_csv('LordickData/PPMI_combPD_pass_CADD.csv')
snpsIDs_3700=snps_3700["ID"].tolist()

snp_set=set()

count_noprs=0
count_prs=0
for mechanism in MechanismToGene_noNans.keys():
    print ("PD-associated mechanism: ", mechanism)
    prs_3700_= prs_3700(MechanismToGene_noNans[mechanism],catalog,snpsIDs_3700,"gwas-associations",log=False)
    print("PRS: ",prs_3700_[0])
    if prs_3700_[0]>0.0:
        count_prs+=1
        snp_set|= set(prs_3700_[1])
        print("Set: ",snp_set)
        
    else:
        count_noprs+=1

PD-associated mechanism:  MAPK subgraph
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Calcium-dependent signal transduction
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  GRB10 subgraph
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Synuclein subgraph
         chromosome         snp  \
257     17 41436901   rs8070723   
333      4 90678541   rs2736990   
2288    17 41436901   rs8070723   
2604     4 90639515  rs11931074   
2642     4 90641340    rs356220   
...             ...         ...   
213704   4 90641340    rs356220   
213942   4 90639515  rs11931074   
214144   4 90678541   rs2736990   
2148

        chromosome         snp  \
333     4 90678541   rs2736990   
2604    4 90639515  rs11931074   
2642    4 90641340    rs356220   
6056    4 90678541   rs2736990   
7510    4 90639515  rs11931074   
...            ...         ...   
211289  4 90678541   rs2736990   
213704  4 90641340    rs356220   
213942  4 90639515  rs11931074   
214144  4 90678541   rs2736990   
214801  4 90641340    rs356220   

                                         phewas phenotype  cases   p-value  \
333                              Acute reaction to stress    339  0.000018   
2604    Vascular complications of surgery and medical ...     76  0.000461   
2642                             Acute reaction to stress    339  0.000471   
6056              Other disorders of stomach and duodenum    270  0.001218   
7510                                    Vascular dementia    162  0.001548   
...                                                   ...    ...       ...   
211289                 Other cardiac conducti

Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Regulation of cytoskeleton subgraph
         chromosome         snp  \
257     17 41436901   rs8070723   
333      4 90678541   rs2736990   
2288    17 41436901   rs8070723   
2604     4 90639515  rs11931074   
2642     4 90641340    rs356220   
...             ...         ...   
213704   4 90641340    rs356220   
213942   4 90639515  rs11931074   
214144   4 90678541   rs2736990   
214801   4 90641340    rs356220   
215079  17 41436901   rs8070723   

                                         phewas phenotype  cases   p-value  \
257                                  Infection of the eye   1229  0.000009   
333                              Acute reaction to stress    339  0.000018   
2288                                  Sprains and strains   2841  0.000394   
2604    Vascular complications of surgery and medical

         chromosome        snp  \
257     17 41436901  rs8070723   
2288    17 41436901  rs8070723   
3726    17 41436901  rs8070723   
4524    17 41436901  rs8070723   
7048    17 41436901  rs8070723   
...             ...        ...   
207569  17 41436901  rs8070723   
207691  17 41436901  rs8070723   
209824  17 41436901  rs8070723   
211032  17 41436901  rs8070723   
215079  17 41436901  rs8070723   

                                         phewas phenotype  cases   p-value  \
257                                  Infection of the eye   1229  0.000009   
2288                                  Sprains and strains   2841  0.000394   
3726                                     Ptosis of eyelid    597  0.000697   
4524                           Conjunctivitis, infectious    939  0.000870   
7048                                Keratitis, infectious    146  0.001441   
...                                                   ...    ...       ...   
207569  Phlebitis and thrombophlebitis of low

         chromosome         snp  \
257     17 41436901   rs8070723   
333      4 90678541   rs2736990   
2288    17 41436901   rs8070723   
2604     4 90639515  rs11931074   
2642     4 90641340    rs356220   
...             ...         ...   
213704   4 90641340    rs356220   
213942   4 90639515  rs11931074   
214144   4 90678541   rs2736990   
214801   4 90641340    rs356220   
215079  17 41436901   rs8070723   

                                         phewas phenotype  cases   p-value  \
257                                  Infection of the eye   1229  0.000009   
333                              Acute reaction to stress    339  0.000018   
2288                                  Sprains and strains   2841  0.000394   
2604    Vascular complications of surgery and medical ...     76  0.000461   
2642                             Acute reaction to stress    339  0.000471   
...                                                   ...    ...       ...   
213704                          B

Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Vitamin subgraph
       chromosome       snp  \
5076            1  rs823156   
9194            1  rs823156   
10446           1  rs823156   
10717           1  rs823156   
15773           1  rs823156   
...           ...       ...   
208688          1  rs823156   
208749          1  rs823156   
208953          1  rs823156   
210086          1  rs823156   
211855          1  rs823156   

                                         phewas phenotype  cases   p-value  \
5076                                   Atrophic gastritis    103  0.000996   
9194                  Nonrheumatic aortic valve disorders    814  0.001952   
10446            Anomalies of tooth position/malocclusion     61  0.002235   
10717                                    Swelling of limb    706  0.002303   
15773                                   Co

Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Lipid metabolism subgraph
        chromosome         snp  \
333     4 90678541   rs2736990   
2604    4 90639515  rs11931074   
2642    4 90641340    rs356220   
6056    4 90678541   rs2736990   
7510    4 90639515  rs11931074   
...            ...         ...   
211289  4 90678541   rs2736990   
213704  4 90641340    rs356220   
213942  4 90639515  rs11931074   
214144  4 90678541   rs2736990   
214801  4 90641340    rs356220   

                                         phewas phenotype  cases   p-value  \
333                              Acute reaction to stress    339  0.000018   
2604    Vascular complications of surgery and medical ...     76  0.000461   
2642                             Acute reaction to stress    339  0.000471   
6056              Other disorders of stomach and duodenum    270  0.001218 

Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  FMR1 subgraph
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Cytokine signaling subgraph
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Matrix metalloproteinase subgraph
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  RhoA subgraph
Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
PRS:  0.0
PD-associated mechanism:  Bcl-2 subgraph
     

In [36]:
print ("Mechanisms with genes that can be associated to PD SNPs in PheWAS:", count_prs)
print ("Mechanisms with genes that can't be associated to PD SNPS:", count_noprs)
print("SNPs actually used:",snp_set)

Mechanisms with genes that can be associated to PD SNPs in PheWAS: 27
Mechanisms with genes that can't be associated to PD SNPS: 37
SNPs actually used: {'rs1564282', 'rs393152', 'rs356220', 'rs8070723', 'rs2736990', 'rs823156', 'rs11931074'}


# This is regardless of the eQTL,LD extented SNPs mapping


# Ashars mapping:


In [1]:
# Including the RiskSNP mapping from RiskSNPs.rda
rda_dic = pyreadr.read_r('LordickData/RiskSNPs.rda')
comb_df=rda_dic["comb"] # What's the difference between combPD and comb ? 
comb_PD_df=rda_dic["combPD"] 
print (comb_PD_df)

NameError: name 'pyreadr' is not defined

# combPD data

In [5]:
liste=comb_df_PD["rsID"].tolist()
liste_2=comb_df_PD["GENCODE_name"].tolist()
print ("Mappings in dataframe:",len(liste))
print ("used SNPs in total:",len(set(liste)))
print ("used Genes in total:",len(set(liste_2)))

Mappings in dataframe: 42567
used SNPs in total: 4242
used Genes in total: 213


In [10]:
#ids from old mapping written in txt file to test for vcf
with open("../OldMapping_rsIDs.txt","w") as oldsnps:
    for rs in liste:
        oldsnps.write(rs+"\n")

In [15]:
#convert the mapping from the dataframe to a dictionary
GenetoSNP_PD={}
for x, y in zip(comb_PD_df['rsID'],comb_PD_df['GENCODE_name']):
    if y in GenetoSNP_PD:
        GenetoSNP_PD[y].append(x)
    else:
        GenetoSNP_PD[y]=[x]
print (GenetoSNP_PD)

{'RP11-115D19.1': ['rs10003708', 'rs1045722', 'rs10516844', 'rs11315884', 'rs11931074', 'rs11945223', 'rs17016071', 'rs356165', 'rs356182', 'rs356209', 'rs356211', 'rs356219', 'rs356220', 'rs356221', 'rs356223', 'rs356225', 'rs3857051', 'rs3857052', 'rs3857053', 'rs61032876', 'rs6818319', 'rs73831461', 'rs7436973', 'rs7655792', 'rs7675290', 'rs7681312', 'rs7681815', 'rs8180209', 'rs8180214'], 'STAP1': ['rs10017445', 'rs13151390', 'rs200858521', 'rs2242330', 'rs3775870', 'rs7682206'], 'SNCA': ['rs10025915', 'rs10033209', 'rs104893875', 'rs104893877', 'rs104893878', 'rs10516845', 'rs116599416', 'rs11931062', 'rs11935469', 'rs11944331', 'rs146789505', 'rs189596', 'rs200151263', 'rs201106962', 'rs2736990', 'rs28393675', 'rs28613708', 'rs356168', 'rs356200', 'rs356203', 'rs356204', 'rs3756054', 'rs3775422', 'rs3775423', 'rs3775433', 'rs3822086', 'rs3857057', 'rs3857058', 'rs3857059', 'rs3857061', 'rs3899608', 'rs4031753', 'rs4088093', 'rs4088094', 'rs58054215', 'rs60031383', 'rs6826785', 'r

In [16]:
#pd.options.display.max_rows
#pd.set_option('display.max_rows', None)
print(MechanismToGene_noNans)

{'MAPK subgraph': ['FRA6E', 'HIST3H3', 'MAPK8', 'MAP2K3', 'LRRK2', 'DNAJC27', 'MAP3K4', 'MARK2', 'MAP2K4', 'MAP2K7', 'PINK1', 'JUN', 'ECSIT', 'MAP2K6', 'NLRX1', 'IL1B', 'MAPK3', 'RAD18', 'IL18', 'DUSP1', 'PRKN', 'HTRA2', 'MAP1LC3B', 'NLRP3', 'RPS6KA5', 'ATF4', 'XBP1', 'MAPK10', 'MAPK9', 'PARK7', 'CASP1', 'GIGYF2'], 'Calcium-dependent signal transduction': ['ESR2', 'ESR1', 'CACNA1A', 'CXCL8', 'S100B', 'NOX1', 'LRRK2', 'CACNB2', 'SCN10A'], 'GRB10 subgraph': ['NEDD4', 'PARK11', 'DDC', 'INSR', 'PIK3CG', 'GRB10', 'INS', 'HGS', 'AKT1', 'LMX1A', 'MAPK3', 'IGF1R', 'IGF1', 'GIGYF1', 'GIGYF2', 'KDR'], 'Synuclein subgraph': ['MAPT', 'HIST3H3', 'GAK', 'MT-ND1', 'BCL2L1', 'LRRK2', 'SNCB', 'CASP8', 'MAOB', 'TP53', 'TPPP', 'DNMT1', 'PLA2G6', 'SNCA', 'CSTB', 'PPP2R5B', 'PLK2', 'SELENOW', 'FGF1', 'UCHL1', 'SNCG', 'CHM', 'SNCAIP', 'PPP2R1B', 'FGF20', 'FAF1', 'ATP13A2', 'PARK7', 'CASP1', 'CASP3'], 'Notch signaling subgraph': ['MAPT', 'OTC', 'NFE2L2', 'HGS', 'LRRK2', 'LMX1A', 'GPR33', 'NOS1', 'IL6', 'FOXO

In [40]:
#testing for correctness:
the_list=['rs10003708', 'rs1045722', 'rs10516844', 'rs11315884', 'rs11931074', 'rs11945223', 'rs17016071', 'rs356165', 'rs356182', 'rs356209', 'rs356211', 'rs356219', 'rs356220', 'rs356221', 'rs356223', 'rs356225', 'rs3857051', 'rs3857052', 'rs3857053', 'rs61032876', 'rs6818319', 'rs73831461', 'rs7436973', 'rs7655792', 'rs7675290', 'rs7681312', 'rs7681815', 'rs8180209', 'rs8180214']
the_gene=comb_PD_df.loc[comb_PD_df['GENCODE_name'] == "RP11-115D19.1"]
snps=the_gene["rsID"].tolist()
snps==the_list

True

In [22]:
### Mechanism to Gene to SNP with AHars mapping
MechanismToGeneToSNPcount_asharmap={x:[len(y),0] for x,y in MechanismToGene_noNans.items()}
print (MechanismToGeneToSNPcount_asharmap)

{'MAPK subgraph': [32, 0], 'Calcium-dependent signal transduction': [9, 0], 'GRB10 subgraph': [16, 0], 'Synuclein subgraph': [30, 0], 'Notch signaling subgraph': [23, 0], 'Calsyntenin subgraph': [1, 0], 'Ubiquitin subgraph': [32, 0], 'Chaperone subgraph': [31, 0], 'PINK1 subgraph': [32, 0], 'Caspase subgraph': [17, 0], 'Reactive oxygen species subgraph': [16, 0], 'miRNA subgraph': [19, 0], 'Mitochondrial subgraph': [32, 0], 'Mitophagy subgraph': [15, 0], 'LRRK2 subgraph': [20, 0], 'CRH subgraph': [5, 0], 'Pre-translational events': [18, 0], 'Regulation of cytoskeleton subgraph': [32, 0], 'Akt/mTOR subgraph': [8, 0], 'Phosphatidylinositol 3 subgraph': [7, 0], 'Apoptosis signaling subgraph': [32, 0], 'Autophagy signaling subgraph': [16, 0], 'Wnt signaling subgraph': [4, 0], 'GSK3 subgraph': [4, 0], 'Tau protein subgraph': [15, 0], 'ATP13A2 subgraph': [2, 0], 'Endoplasmic reticulum-Golgi protein export': [8, 0], 'Tumor necrosis factor subgraph': [10, 0], 'Response to oxidative stress': [8

In [None]:
#mapping was done correctly

In [41]:
catalog=pd.read_csv("phewascatalog_full.csv",delimiter = ',') #reading in the whole csv as a dataframe once again

In [18]:
#updating prs function to screen for snps in PheWAS
def prs_SNP_PD(gene,catalog,filtering,log=False,p_threshold=0.05):
    
    #print ("SNPs:",GenetoSNP_PD[gene])
    #print("########################")
    if gene in GenetoSNP_PD:
        SNP_pat = '|'.join(r"{}".format(x) for x in GenetoSNP_PD[gene]) #compile search pattern for input genes of interest
        catalog_filtered = catalog[catalog["snp"].str.contains(SNP_pat,na=False)]
        catalog_filtered_pd=catalog_filtered[catalog_filtered[filtering].str.contains("Parkinson's disease")]#filtered catalog containing genes of interest only
        print ("Catalog containing the input snps only",catalog_filtered)
        print ("########################")

        catalog_filtered_pvalue_series=catalog_filtered_pd["p-value"]<=p_threshold
        catalog_filtered_pvalue=catalog_filtered_pd[catalog_filtered_pvalue_series]
        
        snps_used=catalog_filtered_pvalue['snp'].tolist()

        #print (catalog_filtered_pd_pvalue)

        naive_prs=catalog_filtered_pvalue["odds-ratio"].sum() #calculation of prs by choice of the user
        
        if log==True:
            return math.log10(naive_prs),snps_used
        else:
            return naive_prs,snps_used
    else:
        snps_used="gene not in mapping"
        return False,snps_used

In [43]:
######## phewas phenotype filtering ########

In [19]:
count=0
for gene in GenetoSNP_PD.keys():
    score=prs_SNP_PD(gene,catalog,"phewas phenotype",log=False)[0]
    if score>0.0:
        print (gene,":",score)
        count+=1

Catalog containing the input snps only         chromosome         snp  \
2604    4 90639515  rs11931074   
2642    4 90641340    rs356220   
7510    4 90639515  rs11931074   
11757   4 90641340    rs356220   
12510   4 90639515  rs11931074   
...            ...         ...   
208018  4 90639515  rs11931074   
210027  4 90639515  rs11931074   
213704  4 90641340    rs356220   
213942  4 90639515  rs11931074   
214801  4 90641340    rs356220   

                                         phewas phenotype  cases   p-value  \
2604    Vascular complications of surgery and medical ...     76  0.000461   
2642                             Acute reaction to stress    339  0.000471   
7510                                    Vascular dementia    162  0.001548   
11757                                   Vascular dementia    162  0.002538   
12510                  Elevated prostate specific antigen    840  0.002712   
...                                                   ...    ...       ...   
208018

Catalog containing the input snps only        chromosome         snp  \
1862            9  rs10121009   
2029            9  rs10121009   
3907            9  rs10121009   
7458            9  rs10121009   
8452            9  rs10121009   
...           ...         ...   
208487          9  rs10121009   
208537          9  rs10121009   
209254          9  rs10121009   
209651          9  rs10121009   
211444          9  rs10121009   

                                       phewas phenotype  cases   p-value  \
1862                           Gram negative septicemia    135  0.000303   
2029                                    Aplastic anemia    172  0.000340   
3907                                              Shock    141  0.000735   
7458                                              ASCVD    166  0.001534   
8452                             Disorders of the globe    111  0.001770   
...                                                 ...    ...       ...   
208487                         I

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only          chromosome         snp  \
2464    11 84417846  rs10501570   
3990    11 84417846  rs10501570   
9022    11 84417846  rs10501570   
10499   11 84417846  rs10501570   
13519   11 84417846  rs10501570   
...             ...         ...   
203021  11 84417846  rs10501570   
206468  11 84417846  rs10501570   
209245  11 84417846  rs10501570   
210254  11 84417846  rs10501570   
215055  11 84417846  rs10501570   

                                        phewas phenotype  cases   p-value  \
2464    Type 1 diabetic peripheral circulatory disorders     35  0.000428   
3990                                           Flat foot    286  0.000753   
9022                         Type 2 diabetic retinopathy    498  0.001911   
10499                     

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only        chromosome        snp                      phewas phenotype  cases  \
3816           21  rs2823357                      Essential tremor    221   
11818          21  rs2823357                   Respiratory failure    279   
15226          21  rs2823357              Asthma with exacerbation    155   
16757          21  rs2823357                    Chronic bronchitis    596   
16848          21  rs2823357                       Viral hepatitis    255   
...           ...        ...                                   ...    ...   
195549    

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only          chromosome        snp  \
186     17 43719143   rs393152   
257     17 41436901  rs8070723   
283              17   rs415430   
299      6 30056728  rs9468692   
347     17 44828931   rs199533   
...             ...        ...   
213807  10 72483010  rs1816002   
214095           17   rs183211   
214289           17   rs415430   
214914           17   rs415430   
215079  17 41436901  rs8070723   

                                         phewas phenotype  cases   p-value  \
186                                  Infection of the eye   12

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only        chromosome        snp  \
6081     4 852313  rs1564282   
15499    4 852313  rs1564282   
17232    4 852313  rs1564282   
31223    4 852313  rs1564282   
33750    4 852313  rs1564282   
38953    4 852313  rs1564282   
47547    4 852313  rs1564282   
47973    4 852313  rs1564282   
48831    4 852313  rs1564282   
49634    4 852313  rs1564282   
50348    4 852313  rs1564282   
53749    4 852313  rs1564282   
62100    4 852313  rs1564282   
62304    4 852313  rs1564282   
69960    4 852313  rs1564282   
70670    4 852313  rs1564282   
71210    4 852313  rs1564282   
75276    4 852313  rs1564282   
77642    4 852313  rs1564282   
79862    4 852313  rs1564282   
83284    4 852313  rs1564282   
88794    4 852313  rs1564282   
91474    4 852313  

Catalog containing the input snps only        chromosome         snp  \
6440     4 964359  rs11248060   
10587    4 964359  rs11248060   
20523    4 964359  rs11248060   
27908    4 964359  rs11248060   
29880    4 964359  rs11248060   
34608    4 964359  rs11248060   
39188    4 964359  rs11248060   
41379    4 964359  rs11248060   
46964    4 964359  rs11248060   
50843    4 964359  rs11248060   
51517    4 964359  rs11248060   
56658    4 964359  rs11248060   
58093    4 964359  rs11248060   
61708    4 964359  rs11248060   
66036    4 964359  rs11248060   
68789    4 964359  rs11248060   
69962    4 964359  rs11248060   
71972    4 964359  rs11248060   
76514    4 964359  rs11248060   
77983    4 964359  rs11248060   
78754    4 964359  rs11248060   
79469    4 964359  rs11248060   
83172    4 964359  rs11248060   
87070    4 964359  rs11248060   
99043    4 964359  rs11248060   
100056   4 964359  rs11248060   
101867   4 964359  rs11248060   
108317   4 964359  rs11248060   
1116

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only         chromosome        snp                     phewas phenotype  cases  \
610     4 15737937  rs4538475           Secondary thrombocytopenia     65   
1441    4 15737937  rs4538475                     Essential tremor    221   
1680    4 15346446  rs4698412                        Sialoadenitis    103   
2293    4 15737937  rs4538475        Cervical cancer and dysplasia    263   
5485    4 15346446  rs4698412               Diseases of the tongue    135   
...            ...        ...                                  ...    ...   
210172  4 15346446  rs4698412                 Hypercholesterolemia   4518   
211573  4 15737937  rs4538475                     Other dyschromia    310   
211759  4 15346446  rs4698412               Impaction of intes

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only          chromosome         snp  \
2934    14 54290830  rs12431733   
8844    14 54290830  rs12431733   
10130   14 54290830  rs12431733   
11665   14 54290830  rs12431733   
12235   14 54290830  rs12431733   
13856   14 54290830  rs12431733   
18399   14 54290830  rs12431733   
18635   14 54290830  rs12431733   
21370   14 54290830  rs12431733   
22267   14 54290830  rs12431733   
22589   14 54290830  rs12431733   
34747   14 54290830  rs12431733   
38249   14 54290830  rs12431733   
49976   14 54290830  rs12431733   
50523   14 54290830  rs12431733   
52303   14 54290830  rs12431733   
55614   14 54290830  rs12431733   
58311   14 54290830  rs12431733   
85204   14 54290830  rs12431733   
90334   14 54290830  rs12431733   
90726   14 54290830 

Catalog containing the input snps only        chromosome        snp                                 phewas phenotype  \
4166           18  rs4130047                                   Herpes simplex   
9428           18  rs4130047              Derangement of joint, non-traumatic   
13269          18  rs4130047                       Other derangement of joint   
19226          18  rs4130047                                          Obesity   
27443          18  rs4130047  Poisoning by hormones and synthetic substitutes   
30895          18  rs4130047                              Large cell lymphoma   
35084          18  rs4130047                                       Overweight   
36829          18  rs4130047                         Other hemoglobinopathies   
41660          18  rs4130047                  Calculus of lower urinary tract   
43078          18  rs4130047             Gout and other crystal arthropathies   
52519          18  rs4130047                                  Costocho

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only        chromosome       snp  \
283            17  rs415430   
500            17  rs415430   
4263           17  rs415430   
5965           17  rs415430   
7780           17  rs415430   
...           ...       ...   
208732         17  rs415430   
210763         17  rs415430   
213513         17  rs415430   
214289         17  rs415430   
214914         17  rs415430   

                                         phewas phenotype  cases   p-value  \
283                                  Infection of the eye   1229  0.000012   
500                                   Sprains and strains   2841  0.000042   
4263                           Conjunctivitis, infectious    939  0.000815   
5965                                     Ptosis of eyelid    597  0.00

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index

Catalog containing the input snps only          chromosome        snp  \
186     17 43719143   rs393152   
257     17 41436901  rs8070723   
283              17   rs415430   
299      6 30056728  rs9468692   
347     17 44828931   rs199533   
...             ...        ...   
213807  10 72483010  rs1816002   
214095           17   rs183211   
214289           17   rs415430   
214914           17   rs415430   
215079  17 41436901  rs8070723   

                                         phewas phenotype  cases   p-value  \
186                                  Infection of the eye   1229  0.000004   
257                                  Infection of the eye   1229  0.000009   
283                                  Infection of the eye   1229  0.000012   
299                                           Insect bite    348  0.000014   
347                                  Infection of the eye   1229  0.000021   
...                                                   ...    ...       ...   
213807

Catalog containing the input snps only          chromosome        snp  \
186     17 43719143   rs393152   
257     17 41436901  rs8070723   
283              17   rs415430   
299      6 30056728  rs9468692   
347     17 44828931   rs199533   
...             ...        ...   
213807  10 72483010  rs1816002   
214095           17   rs183211   
214289           17   rs415430   
214914           17   rs415430   
215079  17 41436901  rs8070723   

                                         phewas phenotype  cases   p-value  \
186                                  Infection of the eye   1229  0.000004   
257                                  Infection of the eye   1229  0.000009   
283                                  Infection of the eye   1229  0.000012   
299                                           Insect bite    348  0.000014   
347                                  Infection of the eye   1229  0.000021   
...                                                   ...    ...       ...   
213807

Catalog containing the input snps only          chromosome        snp  \
186     17 43719143   rs393152   
257     17 41436901  rs8070723   
283              17   rs415430   
299      6 30056728  rs9468692   
347     17 44828931   rs199533   
...             ...        ...   
213807  10 72483010  rs1816002   
214095           17   rs183211   
214289           17   rs415430   
214914           17   rs415430   
215079  17 41436901  rs8070723   

                                         phewas phenotype  cases   p-value  \
186                                  Infection of the eye   1229  0.000004   
257                                  Infection of the eye   1229  0.000009   
283                                  Infection of the eye   1229  0.000012   
299                                           Insect bite    348  0.000014   
347                                  Infection of the eye   1229  0.000021   
...                                                   ...    ...       ...   
213807

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only          chromosome        snp                               phewas phenotype  \
299      6 30056728  rs9468692                                    Insect bite   
688      6 30056728  rs9468692                         Other disorders of ear   
1198     6 30056728  rs9468692              Chronic ulcer of unspecified site   
2707     6 30056728  rs9468692     Peritonitis and retroperitoneal infections   
3821    10 72483010  rs1816002  Cancer of the digestive organs and peritoneum   
...             ...        ...                                            ...   
205650   6 30056728  rs9468692                                   Enthesopathy   
208306   6 30056728  rs9468692                   Cardiac conduction disorders   
209374   6 30056728  rs946

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only         chromosome        snp  \
128     6 32409530  rs3129882   
153     6 32409530  rs3129882   
174     6 32409530  rs3129882   
191     6 32409530  rs3129882   
396     6 32409530  rs3129882   
...            ...        ...   
202217  6 32409530  rs3129882   
204844  6 32409530  rs3129882   
206679  6 32409530  rs3129882   
210593  6 32409530  rs3129882   
214958  6 32409530  rs3129882   

                                         phewas phenotype  cases  \
128     Rheumatoid arthritis & related inflammatory po...    511   
153             

In [33]:
print("Total of genes with prs larger than 0 regardless of mechanism mapping:", count)
print("Total of genes: 213")
print (str((count/213)*100)+"% of the genes will be included in PRS calculations")

Total of genes with prs larger than 0 regardless of mechanism mapping: 0
Total of genes: 213
0.0% of the genes will be included in PRS calculations


In [142]:
######## gwas association filtering #########

In [46]:
count=0
for gene in GenetoSNP_PD.keys():
    score=prs_SNP_PD(gene,catalog,"gwas-associations",log=False)[0]
    if score>0.0:
        print (gene,":",score)
        count+=1

Catalog containing the input snps only         chromosome         snp  \
2604    4 90639515  rs11931074   
2642    4 90641340    rs356220   
7510    4 90639515  rs11931074   
11757   4 90641340    rs356220   
12510   4 90639515  rs11931074   
...            ...         ...   
208018  4 90639515  rs11931074   
210027  4 90639515  rs11931074   
213704  4 90641340    rs356220   
213942  4 90639515  rs11931074   
214801  4 90641340    rs356220   

                                         phewas phenotype  cases   p-value  \
2604    Vascular complications of surgery and medical ...     76  0.000461   
2642                             Acute reaction to stress    339  0.000471   
7510                                    Vascular dementia    162  0.001548   
11757                                   Vascular dementia    162  0.002538   
12510                  Elevated prostate specific antigen    840  0.002712   
...                                                   ...    ...       ...   
208018

Catalog containing the input snps only        chromosome         snp  \
1862            9  rs10121009   
2029            9  rs10121009   
3907            9  rs10121009   
7458            9  rs10121009   
8452            9  rs10121009   
...           ...         ...   
208487          9  rs10121009   
208537          9  rs10121009   
209254          9  rs10121009   
209651          9  rs10121009   
211444          9  rs10121009   

                                       phewas phenotype  cases   p-value  \
1862                           Gram negative septicemia    135  0.000303   
2029                                    Aplastic anemia    172  0.000340   
3907                                              Shock    141  0.000735   
7458                                              ASCVD    166  0.001534   
8452                             Disorders of the globe    111  0.001770   
...                                                 ...    ...       ...   
208487                         I

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only          chromosome         snp  \
2464    11 84417846  rs10501570   
3990    11 84417846  rs10501570   
9022    11 84417846  rs10501570   
10499   11 84417846  rs10501570   
13519   11 84417846  rs10501570   
...             ...         ...   
203021  11 84417846  rs10501570   
206468  11 84417846  rs10501570   
209245  11 84417846  rs10501570   
210254  11 84417846  rs10501570   
215055  11 84417846  rs10501570   

                                        phewas phenotype  cases   p-value  \
2464    Type 1 diabetic peripheral circulatory disorders     35  0.000428   
3990                                           Flat foot    286  0.000753   
9022                         Type 2 diabetic retinopathy    498  0.001911   
10499                     

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only        chromosome        snp                      phewas phenotype  cases  \
3816           21  rs2823357                      Essential tremor    221   
11818          21  rs2823357                   Respiratory failure    279   
15226          21  rs2823357              Asthma with exacerbation    155   
16757          21  rs2823357                    Chronic bronchitis    596   
16848          21  rs2823357                       Viral hepatitis    255   
...           ...        ...                                   ...    ...   
195549    

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only          chromosome        snp  \
186     17 43719143   rs393152   
257     17 41436901  rs8070723   
283              17   rs415430   
299      6 30056728  rs9468692   
347     17 44828931   rs199533   
...             ...        ...   
213807  10 72483010  rs1816002   
214095           17   rs183211   
214289           17   rs415430   
214914           17   rs415430   
215079  17 41436901  rs8070723   

                                         phewas phenotype  cases   p-value  \
186                                  Infection of the eye   12

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only        chromosome        snp  \
6081     4 852313  rs1564282   
15499    4 852313  rs1564282   
17232    4 852313  rs1564282   
31223    4 852313  rs1564282   
33750    4 852313  rs1564282   
38953    4 852313  rs1564282   
47547    4 852313  rs1564282   
47973    4 852313  rs1564282   
48831    4 852313  rs1564282   
49634    4 852313  rs1564282   
50348    4 852313  rs1564282   
53749    4 852313  rs1564282   
62100    4 852313  rs1564282   
62304    4 852313  rs1564282   
69960    4 852313  rs1564282   
70670    4 852313  rs1564282   
71210    4 852313  rs1564282   
75276    4 852313  rs1564282   
77642    4 852313  rs1564282   
79862    4 852313  rs1564282   
83284    4 852313  rs1564282   
88794    4 852313  rs1564282   
91474    4 852313  

Catalog containing the input snps only        chromosome         snp  \
6440     4 964359  rs11248060   
10587    4 964359  rs11248060   
20523    4 964359  rs11248060   
27908    4 964359  rs11248060   
29880    4 964359  rs11248060   
34608    4 964359  rs11248060   
39188    4 964359  rs11248060   
41379    4 964359  rs11248060   
46964    4 964359  rs11248060   
50843    4 964359  rs11248060   
51517    4 964359  rs11248060   
56658    4 964359  rs11248060   
58093    4 964359  rs11248060   
61708    4 964359  rs11248060   
66036    4 964359  rs11248060   
68789    4 964359  rs11248060   
69962    4 964359  rs11248060   
71972    4 964359  rs11248060   
76514    4 964359  rs11248060   
77983    4 964359  rs11248060   
78754    4 964359  rs11248060   
79469    4 964359  rs11248060   
83172    4 964359  rs11248060   
87070    4 964359  rs11248060   
99043    4 964359  rs11248060   
100056   4 964359  rs11248060   
101867   4 964359  rs11248060   
108317   4 964359  rs11248060   
1116

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only         chromosome        snp                     phewas phenotype  cases  \
610     4 15737937  rs4538475           Secondary thrombocytopenia     65   
1441    4 15737937  rs4538475                     Essential tremor    221   
1680    4 15346446  rs4698412                        Sialoadenitis    103   
2293    4 15737937  rs4538475        Cervical cancer and dysplasia    263   
5485    4 15346446  rs4698412               Diseases of the tongue    135   
...            ...        ...                                  ...    ...   
210172  4 15346446  rs4698412                 Hypercholesterolemia   4518   
211573  4 15737937  rs4538475                     Other dyschromia    310   
211759  4 15346446  rs4698412               Impaction of intes

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only          chromosome         snp  \
2934    14 54290830  rs12431733   
8844    14 54290830  rs12431733   
10130   14 54290830  rs12431733   
11665   14 54290830  rs12431733   
12235   14 54290830  rs12431733   
13856   14 54290830  rs12431733   
18399   14 54290830  rs12431733   
18635   14 54290830  rs12431733   
21370   14 54290830  rs12431733   
22267   14 54290830  rs12431733   
22589   14 54290830  rs12431733   
34747   14 54290830  rs12431733   
38249   14 54290830  rs12431733   
49976   14 54290830  rs12431733   
50523   14 54290830  rs12431733   
52303   14 54290830  rs12431733   
55614   14 54290830  rs12431733   
58311   14 54290830  rs12431733   
85204   14 54290830  rs12431733   
90334   14 54290830  rs12431733   
90726   14 54290830 

Catalog containing the input snps only        chromosome        snp                                 phewas phenotype  \
4166           18  rs4130047                                   Herpes simplex   
9428           18  rs4130047              Derangement of joint, non-traumatic   
13269          18  rs4130047                       Other derangement of joint   
19226          18  rs4130047                                          Obesity   
27443          18  rs4130047  Poisoning by hormones and synthetic substitutes   
30895          18  rs4130047                              Large cell lymphoma   
35084          18  rs4130047                                       Overweight   
36829          18  rs4130047                         Other hemoglobinopathies   
41660          18  rs4130047                  Calculus of lower urinary tract   
43078          18  rs4130047             Gout and other crystal arthropathies   
52519          18  rs4130047                                  Costocho

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index

Catalog containing the input snps only        chromosome       snp  \
283            17  rs415430   
500            17  rs415430   
4263           17  rs415430   
5965           17  rs415430   
7780           17  rs415430   
...           ...       ...   
208732         17  rs415430   
210763         17  rs415430   
213513         17  rs415430   
214289         17  rs415430   
214914         17  rs415430   

                                         phewas phenotype  cases   p-value  \
283                                  Infection of the eye   1229  0.000012   
500                                   Sprains and strains   2841  0.000042   
4263                           Conjunctivitis, infectious    939  0.000815   
5965                                     Ptosis of eyelid    597  0.001197   
7780                                Keratitis, infectious    146  0.001612   
...                                                   ...    ...       ...   
208732                                    

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index

Catalog containing the input snps only        chromosome         snp                         phewas phenotype  cases  \
9736           17  rs11868035                  Ulceration of intestine     65   
10893          17  rs11868035                           Duodenal ulcer    209   
11457          17  rs11868035               Secondary thrombocytopenia     65   
12818          17  rs11868035           Obstructive chronic bronchitis    471   
12967          17  rs11868035               Pleurisy; pleural effusion    977   
...           ...         ...                                      ...    ...   
189937         17  rs11868035  Cerebral edema and compression of brain     39   
197504         17  rs11868035                            Cramp of limb    216   
204281         17  rs11868035               Non-healing surgical wound     68   
209365         17  rs11868035            Thoracic neuritis/radiculitis   1659   
212311         17  rs11868035                                Fasciitis

Catalog containing the input snps only          chromosome        snp                              phewas phenotype  \
186     17 43719143   rs393152                          Infection of the eye   
257     17 41436901  rs8070723                          Infection of the eye   
299      6 30056728  rs9468692                                   Insect bite   
688      6 30056728  rs9468692                        Other disorders of ear   
1198     6 30056728  rs9468692             Chronic ulcer of unspecified site   
...             ...        ...                                           ...   
211084   12 9910164  rs4763879  Cardiac and circulatory congenital anomalies   
211121   6 31248921  rs9468925                         Symptomatic menopause   
211735  17 43513441    rs11012                                 Pain in joint   
213807  10 72483010  rs1816002                                    Septicemia   
215079  17 41436901  rs8070723                       Nontoxic nodular goiter   



Catalog containing the input snps only          chromosome        snp  \
186     17 43719143   rs393152   
257     17 41436901  rs8070723   
283              17   rs415430   
299      6 30056728  rs9468692   
347     17 44828931   rs199533   
...             ...        ...   
213807  10 72483010  rs1816002   
214095           17   rs183211   
214289           17   rs415430   
214914           17   rs415430   
215079  17 41436901  rs8070723   

                                         phewas phenotype  cases   p-value  \
186                                  Infection of the eye   1229  0.000004   
257                                  Infection of the eye   1229  0.000009   
283                                  Infection of the eye   1229  0.000012   
299                                           Insect bite    348  0.000014   
347                                  Infection of the eye   1229  0.000021   
...                                                   ...    ...       ...   
213807

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only          chromosome        snp                               phewas phenotype  \
299      6 30056728  rs9468692                                    Insect bite   
688      6 30056728  rs9468692                         Other disorders of ear   
1198     6 30056728  rs9468692              Chronic ulcer of unspecified site   
2707     6 30056728  rs9468692     Peritonitis and retroperitoneal infections   
3821    10 72483010  rs1816002  Cancer of the digestive organs and peritoneum   
...             ...        ...                                            ...   
205650   6 30056728  rs9468692                                   Enthesopathy   
208306   6 30056728  rs9468692                   Cardiac conduction disorders   
209374   6 30056728  rs946

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only         chromosome        snp  \
128     6 32409530  rs3129882   
153     6 32409530  rs3129882   
174     6 32409530  rs3129882   
191     6 32409530  rs3129882   
396     6 32409530  rs3129882   
...            ...        ...   
202217  6 32409530  rs3129882   
204844  6 32409530  rs3129882   
206679  6 32409530  rs3129882   
210593  6 32409530  rs3129882   
214958  6 32409530  rs3129882   

                                         phewas phenotype  cases  \
128     Rheumatoid arthritis & related inflammatory po...    511   
153             

In [48]:
print("Total of genes with prs larger than 0 regardless of mechanism mapping:", count)
print("Total of genes: 213")
print (str((count/213)*100)+"% of the genes will be included in PRS calculations")

Total of genes with prs larger than 0 regardless of mechanism mapping: 57
Total of genes: 213
26.76056338028169% of the genes will be included in PRS calculations


Now:
# Mechanism -> Gene(s) -> SNPs

In [52]:
############ PheWAS Phenotype ##########

# Mechanism PRS Calculation
MechanismToScore=dict() #dictionary with gene to snps score as well as summated scores for each mechanism
snp_set=set()

for mechanism in MechanismToGene_noNans.keys():
    MechanismToScore[mechanism]=[[],[0]] 
    for gene in MechanismToGene_noNans[mechanism]:
        gene,score=gene,prs_SNP_PD(gene,catalog,"phewas phenotype",log=False)
        MechanismToScore[mechanism][0].append((gene,score[0]))
        MechanismToGeneToSNPcount_asharmap[mechanism][1]=set(score[1])
        if score[0]!=False:
            MechanismToScore[mechanism][1]+=score[0]
            snp_set|= set(score[1])
            if score[0]>0.0:
                MechanismToGeneToSNPcount_asharmap[mechanism][1]=len(set(score[1]))

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index

212059      0.7048       GAK       457.30  Parkinson's disease  
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990       Other disorders of stomach and duodenum   
9252    4 90678541  rs2736990                             Vascular dementia   
11139   4 90678541  rs2736990  Cancer of other lymphoid, histiocytic tissue   
12479   4 90678541  rs2736990     Other disorders of the kidney 

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990       Other disorders of stomach and duodenum   
9252    4 90678541  rs2736990                             Vascular dementia   
11139   4 90678541  rs2736990  Cancer of other lymphoid, histiocytic tissue   
12479   4 90678541  rs2736990     Other disorders of the kidney and ureters   
...            ...        ...                                           ...

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only          chromosome        snp                      phewas phenotype  cases  \
186     17 43719143   rs393152                  Infection of the eye   1229   
257     17 41436901  rs8070723                  Infection of the eye   1229   
299      6 30056728  rs9468692                           Insect bite    348   
347     17 44828931   rs199533                  Infection of the eye   1229   
688      6 30056728  rs9468692                Other disorders of ear    314   
...             ...        ...                                   ...    ...   
211121   6 31248921  rs9468925                 Symptomatic menopause   1847   
211735  17 43513441    rs11012                         Pain in joint   7301   
212698  17 44828931   rs199533  Hyperosmolal

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990       Other disorders of stomach and duodenum   
9252    4 90678541  rs2736990                             Vascular dementia   
11139   4 90678541  rs2736990  Cancer of other lymphoid, histiocytic tissue   
12479   4 90678541  rs2736990     Other disorders of the kidney and ureters   
...            ...        ...                                           ...

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only          chromosome        snp                      phewas phenotype  cases  \
186     17 43719143   rs393152                  Infection of the eye   1229   
257     17 41436901  rs8070723                  Infection of the eye   1229   
299      6 30056728  rs9468692                           Insect bite    348   
347     17 44828931   rs199533                  Infection of the eye   1229   
688      6 30056728  rs9468692                Other disorders of ear    314   
...             ...        ...                                   ...    ...

[74 rows x 9 columns]
########################
Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990       Other disorders of stomach and duodenum   
9252    4 90678541  rs2736990                             Vascular dementia   
11139   4 90678541  rs2736990  Cancer of other lymphoid, histiocytic tissue   
12479   4 90678541  rs2736990     Other disorders of the kidney and ureters   
...            ...        ...                                           ...   
208010  4 90678541  rs2736990         Other disorders of the nervous system   
209263  4 90678541  rs2736990                          Syncope and collapse   
210757  4 90678541  rs2736990                      Cyst of kidney, acquired   
211289  4 90678541  rs2736990            Other cardiac conduction disorders   
214144  4 90678541  rs2736990                

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990       Other disorders of stomach and duodenum   
9252    4 90678541  rs2736990                             Vascular dementia   
11139   4 90678541  rs2736990  Cancer of other lymphoid, histiocytic tissue   
12479   4 90678541  rs2736990     Other disorders of the kidney and ureters   
...            ...        ...                                           ...   
208010  4 90678541  rs2736990         Other disorders of the nervous system   
209263  4 90678541  rs2736990                          Syncope and collapse   
210757  4 90678541  rs2736990               

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990       Other disorders of stomach and duodenum   
9252    4 90678541  rs2736990                             Vascular dementia   
11139   4 90678541  rs27369

Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990       Other disorders of stomach and duodenum   
9252    4 90678541  rs2736990                             Vascular dementia   
11139   4 90678541  rs2736990  Cancer of other lymphoid, histiocytic tissue   
12479   4 90678541  rs2736990     Other disorders of the kidney and ureters   
...            ...        ...                                           ...   
208010  4 90678541  rs2736990         Other disorders of the nervous system   
209263  4 90678541  rs2736990                          Syncope and collapse   
210757  4 90678541  rs2736990                      Cyst of kidney, acquired   
211289  4 90678541  rs2736990            Other cardiac conduction disorders   
214144  4 90678541  rs2736990                              Thrombocytopenia   

        case

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index

[74 rows x 9 columns]
########################


In [55]:
MechanismToScore

{'MAPK subgraph': [[('FRA6E', False),
   ('HIST3H3', False),
   ('MAPK8', False),
   ('MAP2K3', False),
   ('LRRK2', 0.0),
   ('DNAJC27', False),
   ('MAP3K4', False),
   ('MARK2', False),
   ('MAP2K4', False),
   ('MAP2K7', False),
   ('PINK1', 0.0),
   ('JUN', False),
   ('ECSIT', False),
   ('MAP2K6', False),
   ('NLRX1', False),
   ('IL1B', False),
   ('MAPK3', False),
   ('RAD18', False),
   ('IL18', False),
   ('DUSP1', False),
   ('PRKN', False),
   ('HTRA2', 0.0),
   ('MAP1LC3B', False),
   ('NLRP3', False),
   ('RPS6KA5', False),
   ('ATF4', False),
   ('XBP1', False),
   ('MAPK10', False),
   ('MAPK9', False),
   ('PARK7', 0.0),
   ('CASP1', False),
   ('GIGYF2', 0.0)],
  [0]],
 'Calcium-dependent signal transduction': [[('ESR2', False),
   ('ESR1', False),
   ('CACNA1A', False),
   ('CXCL8', False),
   ('S100B', False),
   ('NOX1', False),
   ('LRRK2', 0.0),
   ('CACNB2', False),
   ('SCN10A', False)],
  [0]],
 'GRB10 subgraph': [[('NEDD4', False),
   ('PARK11', False),
   (

In [54]:
print (str(len(snp_set))+" SNPs were actually used: ",snp_set)

0 SNPs were actually used:  set()


In [55]:
# actual fraction of PD related Genes of the Mechanism-gene mapping that are actually in the Gene-SNP mapping
gene_list_ofallmechanisms=[y for x in list(MechanismToGene_noNans.values()) for y in x]
gene_set_ofallmechanisms=list(set(gene_list_ofallmechanisms))
count=0
for i in gene_set_ofallmechanisms:
    if i in GenetoSNP_PD:
        count+=1
print("Number of PD related genes:",len(gene_set_ofallmechanisms))
print("Number of PD related genes occuring in the GeneToSNP mapping:", count)
print ("Fraction:", count/len(gene_set_ofallmechanisms))

Number of PD related genes: 394
Number of PD related genes occuring in the GeneToSNP mapping: 20
Fraction: 0.050761421319796954


Problem: Only ~ 5 % of the PD related genes from the NeuroMMSIG thing occure in the GeneToSNP mapping

In [156]:
#more summary statistics: How many Mechanisms actually have useful scores?
count=0
for mechanism in MechanismToScore:
    if MechanismToScore[mechanism][1][0]>0:
        count+=1
print("Fraction of PD Mechanism with a PRS > 0:",count/len(MechanismToScore))

Fraction of PD Mechanism with a PRS > 0: 0.0


In [24]:
###### gwas association ####### 
# Mechanism PRS Calculation

MechanismToScore=dict() #dictionary with gene to snps score as well as summated scores for each mechanism
snp_set=set()

for mechanism in MechanismToGene_noNans.keys():
    MechanismToScore[mechanism]=[[],[0]] 
    for gene in MechanismToGene_noNans[mechanism]:
        gene,score=gene,prs_SNP_PD(gene,catalog,"gwas-associations",log=False)
        print(score)
        MechanismToScore[mechanism][0].append((gene,score[0]))
        if score[0]!=False:
            MechanismToScore[mechanism][1]+=score[0]
            snp_set|= set(score[1])
            MechanismToGeneToSNPcount_asharmap[mechanism][1]+=len(set(score[1]))
            

(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
(0.0, [])
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
(0.0, [])
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'g

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
(0.0, [])
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
(0.0, [])
Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990       Other disorders of stomach and duodenum   
9252    4 90678541  rs2736990                             Vascular dementia   
11139   4 90678541  rs27369

(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
(0.0, [])
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990       Other disorders of stomach and duodenum   
9252    4 90

(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
(0.0, [])
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_

Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990       Other disorders of stomach and duodenum   
9252    4 90678541  rs2736990                             Vascular dementia   
11139   4 90678541  rs2736990  Cancer of other lymphoid, histiocytic tissue   
12479   4 90678541  rs2736990     Other disorders of the kidney and ureters   
...            ...        ...                                           ...   
208010  4 90678541  rs2736990         Other disorders of the nervous system   
209263  4 90678541  rs2736990                          Syncope and collapse   
210757  4 90678541  rs2736990                      Cyst of kidney, acquired   
211289  4 90678541  rs2736990            Other cardiac conduction disorders   
214144  4 90678541  rs2736990                              Thrombocytopenia   

        case

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
(0.0, [])
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990       Other disorders of stomach and duodenum   
9252    4 90678541  rs2736990                             Vascular dementia   
11139   4 90678541  rs2736990  Cancer of other lymphoid, histiocytic tissue   
12479   4 90678541  rs2736990     Other disorders of the kidney and ureters   
...            ...        ...                                           ...   
208010

Catalog containing the input snps only          chromosome        snp                      phewas phenotype  cases  \
186     17 43719143   rs393152                  Infection of the eye   1229   
257     17 41436901  rs8070723                  Infection of the eye   1229   
299      6 30056728  rs9468692                           Insect bite    348   
347     17 44828931   rs199533                  Infection of the eye   1229   
688      6 30056728  rs9468692                Other disorders of ear    314   
...             ...        ...                                   ...    ...   
211121   6 31248921  rs9468925                 Symptomatic menopause   1847   
211735  17 43513441    rs11012                         Pain in joint   7301   
212698  17 44828931   rs199533  Hyperosmolality and/or hypernatremia    107   
213807  10 72483010  rs1816002                            Septicemia    605   
215079  17 41436901  rs8070723               Nontoxic nodular goiter    589   

         p-v

Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990       Other disorders of stomach and duodenum   
9252    4 90678541  rs2736990                             Vascular dementia   
11139   4 90678541  rs2736990  Cancer of other lymphoid, histiocytic tissue   
12479   4 90678541  rs2736990     Other disorders of the kidney and ureters   
...            ...        ...                                           ...   
208010  4 90678541  rs2736990         Other disorders of the nervous system   
209263  4 90678541  rs2736990                          Syncope and collapse   
210757  4 90678541  rs2736990                      Cyst of kidney, acquired   
211289  4 90678541  rs2736990            Other cardiac conduction disorders   
214144  4 90678541  rs2736990                              Thrombocytopenia   

        case

Catalog containing the input snps only          chromosome        snp                      phewas phenotype  cases  \
186     17 43719143   rs393152                  Infection of the eye   1229   
257     17 41436901  rs8070723                  Infection of the eye   1229   
299      6 30056728  rs9468692                           Insect bite    348   
347     17 44828931   rs199533                  Infection of the eye   1229   
688      6 30056728  rs9468692                Other disorders of ear    314   
...             ...        ...                                   ...    ...   
211121   6 31248921  rs9468925                 Symptomatic menopause   1847   
211735  17 43513441    rs11012                         Pain in joint   7301   
212698  17 44828931   rs199533  Hyperosmolality and/or hypernatremia    107   
213807  10 72483010  rs1816002                            Septicemia    605   
215079  17 41436901  rs8070723               Nontoxic nodular goiter    589   

         p-v

Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990       Other disorders of stomach and duodenum   
9252    4 90678541  rs2736990                             Vascular dementia   
11139   4 90678541  rs2736990  Cancer of other lymphoid, histiocytic tissue   
12479   4 90678541  rs2736990     Other disorders of the kidney and ureters   
...            ...        ...                                           ...   
208010  4 90678541  rs2736990         Other disorders of the nervous system   
209263  4 90678541  rs2736990                          Syncope and collapse   
210757  4 90678541  rs2736990                      Cyst of kidney, acquired   
211289  4 90678541  rs2736990            Other cardiac conduction disorders   
214144  4 90678541  rs2736990                              Thrombocytopenia   

        case

Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990       Other disorders of stomach and duodenum   
9252    4 90678541  rs2736990                             Vascular dementia   
11139   4 90678541  rs2736990  Cancer of other lymphoid, histiocytic tissue   
12479   4 90678541  rs2736990     Other disorders of the kidney and ureters   
...            ...        ...                                           ...   
208010  4 90678541  rs2736990         Other disorders of the nervous system   
209263  4 90678541  rs2736990                          Syncope and collapse   
210757  4 90678541  rs2736990                      Cyst of kidney, acquired   
211289  4 90678541  rs2736990            Other cardiac conduction disorders   
214144  4 90678541  rs2736990                              Thrombocytopenia   

        case

[74 rows x 9 columns]
########################
(76.87150000000001, ['rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs2736990', 'rs273699

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
(0.0, [])
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
(0.0, [])
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
(0.0, [])
(False, 'gene not in mapping')
Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990  

Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990       Other disorders of stomach and duodenum   
9252    4 90678541  rs2736990                             Vascular dementia   
11139   4 90678541  rs2736990  Cancer of other lymphoid, histiocytic tissue   
12479   4 90678541  rs2736990     Other disorders of the kidney and ureters   
...            ...        ...                                           ...   
208010  4 90678541  rs2736990         Other disorders of the nervous system   
209263  4 90678541  rs2736990                          Syncope and collapse   
210757  4 90678541  rs2736990                      Cyst of kidney, acquired   
211289  4 90678541  rs2736990            Other cardiac conduction disorders   
214144  4 90678541  rs2736990                              Thrombocytopenia   

        case

Catalog containing the input snps only          chromosome        snp                      phewas phenotype  cases  \
186     17 43719143   rs393152                  Infection of the eye   1229   
257     17 41436901  rs8070723                  Infection of the eye   1229   
299      6 30056728  rs9468692                           Insect bite    348   
347     17 44828931   rs199533                  Infection of the eye   1229   
688      6 30056728  rs9468692                Other disorders of ear    314   
...             ...        ...                                   ...    ...   
211121   6 31248921  rs9468925                 Symptomatic menopause   1847   
211735  17 43513441    rs11012                         Pain in joint   7301   
212698  17 44828931   rs199533  Hyperosmolality and/or hypernatremia    107   
213807  10 72483010  rs1816002                            Septicemia    605   
215079  17 41436901  rs8070723               Nontoxic nodular goiter    589   

         p-v

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
(0.0, [])
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
(0.0, [])
Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990       Other disorders of stomac

Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
(0.0, [])
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
(0.0, [])
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_name, phewas code, gwas-associations]
Index: []
########################
(0.0, [])
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
Catalog containing the input snps only Empty DataFrame
Columns: [chromosome, snp, phewas phenotype, cases, p-value, odds-ratio, gene_na

Catalog containing the input snps only         chromosome        snp                              phewas phenotype  \
333     4 90678541  rs2736990                      Acute reaction to stress   
6056    4 90678541  rs2736990       Other disorders of stomach and duodenum   
9252    4 90678541  rs2736990                             Vascular dementia   
11139   4 90678541  rs2736990  Cancer of other lymphoid, histiocytic tissue   
12479   4 90678541  rs2736990     Other disorders of the kidney and ureters   
...            ...        ...                                           ...   
208010  4 90678541  rs2736990         Other disorders of the nervous system   
209263  4 90678541  rs2736990                          Syncope and collapse   
210757  4 90678541  rs2736990                      Cyst of kidney, acquired   
211289  4 90678541  rs2736990            Other cardiac conduction disorders   
214144  4 90678541  rs2736990                              Thrombocytopenia   

        case

(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')
(False, 'gene not in mapping')


In [57]:
print (str(len(snp_set))+" SNPs were actually used: ",snp_set)

8 SNPs were actually used:  {'rs1564282', 'rs393152', 'rs11012', 'rs8070723', 'rs947211', 'rs199533', 'rs2736990', 'rs823156'}


In [77]:
MechanismToScore['Notch signaling subgraph'][1]

array([549.656])

In [71]:
MechanismToScore

{'MAPK subgraph': [[('FRA6E', False),
   ('HIST3H3', False),
   ('MAPK8', False),
   ('MAP2K3', False),
   ('LRRK2', 0.0),
   ('DNAJC27', False),
   ('MAP3K4', False),
   ('MARK2', False),
   ('MAP2K4', False),
   ('MAP2K7', False),
   ('PINK1', 0.0),
   ('JUN', False),
   ('ECSIT', False),
   ('MAP2K6', False),
   ('NLRX1', False),
   ('IL1B', False),
   ('MAPK3', False),
   ('RAD18', False),
   ('IL18', False),
   ('DUSP1', False),
   ('PRKN', False),
   ('HTRA2', 0.0),
   ('MAP1LC3B', False),
   ('NLRP3', False),
   ('RPS6KA5', False),
   ('ATF4', False),
   ('XBP1', False),
   ('MAPK10', False),
   ('MAPK9', False),
   ('PARK7', 0.0),
   ('CASP1', False),
   ('GIGYF2', 0.0)],
  [0]],
 'Calcium-dependent signal transduction': [[('ESR2', False),
   ('ESR1', False),
   ('CACNA1A', False),
   ('CXCL8', False),
   ('S100B', False),
   ('NOX1', False),
   ('LRRK2', 0.0),
   ('CACNB2', False),
   ('SCN10A', False)],
  [0]],
 'GRB10 subgraph': [[('NEDD4', False),
   ('PARK11', False),
   (

In [58]:
count=0
for mechanism in MechanismToScore:
    if MechanismToScore[mechanism][1][0]>0:
        count+=1
print("Fraction of PD Mechanism with a PRS > 0:",count/len(MechanismToScore))

Fraction of PD Mechanism with a PRS > 0: 0.421875


# comb data

In [82]:
# comb data obviously lesser dataframe size but more snps and genes in total
liste=comb_df["rsID"].tolist()
liste_2=comb_df["GENCODE_name"].tolist()
print ("Mappings in dataframe:",len(liste))
print ("SNPs in total:",len(set(liste)))
print ("Genes in total:",len(set(liste_2)))

Mappings in dataframe: 7869
SNPs in total: 6230
Genes in total: 616


In [83]:
#convert the mapping from the dataframe to a dictionary
GenetoSNP={}
for x, y in zip(comb_df['rsID'],comb_df['GENCODE_name']):
    if y in GenetoSNP:
        GenetoSNP[y].append(x)
    else:
        GenetoSNP[y]=[x]
#print (GenetoSNP)

In [84]:
#updating prs_SNO function for the comb genes-snps
def prs_SNP(gene,csv,filtering,log=False,p_threshold=0.05):
    
    #print ("SNPs:",GenetoSNP_PD[gene])
    #print("########################")
    if gene in GenetoSNP:
        #catalog=pd.read_csv(csv, delimiter = ',') #reading in the whole csv as a dataframe
        SNP_pat = '|'.join(r"{}".format(x) for x in GenetoSNP[gene]) #compile search pattern for input genes of interest
        catalog_filtered = catalog[catalog["snp"].str.contains(SNP_pat,na=False)] #filtered catalog containing genes of interest only
        catalog_filtered_pd=catalog_filtered[catalog_filtered[filtering].str.contains("Parkinson's disease")]
        catalog_filtered_pvalue_series=catalog_filtered_pd["p-value"]<=p_threshold
        catalog_filtered_pvalue=catalog_filtered_pd[catalog_filtered_pvalue_series]
        
        snps_used=catalog_filtered_pvalue['snp'].tolist()

        #print (catalog_filtered_pd_pvalue)

        naive_prs=catalog_filtered_pvalue["odds-ratio"].sum() #calculation of prs by choice of the user
        if log==True:
            return math.log10(naive_prs),snps_used
        else:
            return naive_prs,snps_used
    else:
        snps_used="gene not in mapping"
        return False,snps_used

In [85]:
######phewas phenotype#####
count=0
for gene in GenetoSNP.keys():
    score=prs_SNP(gene,catalog,"phewas phenotype",log=False)
    if score[0]>0.0:
        print (gene,":",score[0])
        count+=1
print("Total of genes with prs larger than 0 regardless of mechanism mapping:", count)

UTS2D : 0.7911
RP11-219B17.1 : 1.272
TOMM40 : 0.8234
UTS2B : 0.7911
Total of genes with prs larger than 0 regardless of mechanism mapping: 4


In [86]:
print ("Fraction of genes with prs larger than 0 regardless of mechansim mapping:",str((count/616)*100)+"%")

Fraction of genes with prs larger than 0 regardless of mechansim mapping: 0.6493506493506493%


In [87]:
######gwas association#####
count=0
for gene in GenetoSNP.keys():
    score=prs_SNP(gene,catalog,"gwas-associations",log=False)
    if score[0]>0.0:
        print (gene,":",score[0])
        count+=1
print("Total of genes with prs larger than 0 regardless of mechanism mapping:", count)

Total of genes with prs larger than 0 regardless of mechanism mapping: 0


In [88]:
print ("Fraction of genes with prs larger than 0 regardless of mechansim mapping:",str((count/616)*100)+"%")

Fraction of genes with prs larger than 0 regardless of mechansim mapping: 0.0%


### Now: Mechanism -> Gene(s) -> SNPs 
### to do

In [94]:
# Mechanism PRS Calculation
snp_set=set()
MechanismToScore=dict()

for mechanism in MechanismToGene_noNans.keys():
    MechanismToScore[mechanism]=[[],[0]] 
    for gene in MechanismToGene_noNans[mechanism]:
        gene,score=gene,prs_SNP(gene,catalog,"phewas phenotype",log=False)
        #print(score)
        MechanismToScore[mechanism][0].append((gene,score[0]))
        if score[0]!=False:
            MechanismToScore[mechanism][1]+=score[0]
            snp_set|= set(score[1])

In [95]:
MechanismToScore

{'MAPK subgraph': [[('FRA6E', False),
   ('HIST3H3', False),
   ('MAPK8', False),
   ('MAP2K3', False),
   ('LRRK2', False),
   ('DNAJC27', False),
   ('MAP3K4', False),
   ('MARK2', False),
   ('MAP2K4', False),
   ('MAP2K7', False),
   ('PINK1', False),
   ('JUN', False),
   ('ECSIT', False),
   ('MAP2K6', False),
   ('NLRX1', False),
   ('IL1B', False),
   ('MAPK3', False),
   ('RAD18', False),
   ('IL18', False),
   ('DUSP1', False),
   ('PRKN', False),
   ('HTRA2', False),
   ('MAP1LC3B', False),
   ('NLRP3', False),
   ('RPS6KA5', False),
   ('ATF4', False),
   ('XBP1', False),
   ('MAPK10', False),
   ('MAPK9', False),
   ('PARK7', False),
   ('CASP1', False),
   ('GIGYF2', False)],
  [0]],
 'Calcium-dependent signal transduction': [[('ESR2', False),
   ('ESR1', False),
   ('CACNA1A', False),
   ('CXCL8', False),
   ('S100B', False),
   ('NOX1', False),
   ('LRRK2', False),
   ('CACNB2', False),
   ('SCN10A', False)],
  [0]],
 'GRB10 subgraph': [[('NEDD4', False),
   ('PARK11', 

In [96]:
print (str(len(snp_set))+" SNPs were/was actually used: ",snp_set)

1 SNPs were/was actually used:  {'rs157580'}


In [97]:
count=0
for i in gene_set_ofallmechanisms:
    if i in GenetoSNP:
        count+=1
print("Number of PD related genes:",len(gene_set_ofallmechanisms))
print("Number of PD related genes occuring in the GeneToSNP mapping:", count)
print ("Fraction:", count/len(gene_set_ofallmechanisms))

Number of PD related genes: 394
Number of PD related genes occuring in the GeneToSNP mapping: 22
Fraction: 0.05583756345177665


In [98]:
MechanismToScore['Mitochondrial subgraph']

[[('BAX', False),
  ('OTC', False),
  ('ACO2', False),
  ('MAPK8', False),
  ('CHCHD4', False),
  ('GABPA', False),
  ('LONP1', False),
  ('LRRK2', False),
  ('NFKBIA', False),
  ('MAOB', False),
  ('FOXO3', False),
  ('PINK1', False),
  ('TOMM40', 0.8234),
  ('PPARGC1A', False),
  ('COX4I2', False),
  ('CYC1', False),
  ('CYCS', 0.0),
  ('SNCA', False),
  ('NLRX1', False),
  ('CLPP', False),
  ('OPA1', False),
  ('UTRN', False),
  ('UCP2', False),
  ('TIMM8A', False),
  ('PARL', False),
  ('NDUFA8', False),
  ('POLG', False),
  ('TIMM17A', False),
  ('AIF1', False),
  ('COX17', False),
  ('CLPX', False),
  ('OMA1', False)],
 array([0.8234])]

In [101]:
#more summary statistics: How many Mechanisms actually have useful scores?
count=0
for mechanism in MechanismToScore:
    if MechanismToScore[mechanism][1][0]>0:
        count+=1
print("Fraction of PD Mechanism with a PRS > 0:",(count/len(MechanismToScore))*100, "%")

Fraction of PD Mechanism with a PRS > 0: 3.125 %


In [2]:
####### more statistics ###########

In [3]:
####### To do: compare original SNPs (~ 3700) to mapping SNPS maybe #######

In [4]:
snps_3700=pd.read_csv('LordickData/PPMI_combPD_pass_CADD.csv')
snpsIDs_3700=snps_3700["ID"].tolist()
print ("count:", len(snps_3700["ID"]))

count: 3742


In [5]:
mapping_snps=list(GenetoSNP_PD.values())
mapping_snps = [item for items in mapping_snps for item in items]
mapping_snps_set=set(mapping_snps)

NameError: name 'GenetoSNP_PD' is not defined

In [66]:
print("count SNPS from Mapping file:",len(mapping_snps_set))

count SNPS from Mapping file: 4242


In [58]:
# -> more SNPs in mapping than in SNP origin database?!

In [68]:
# overlap 
len(mapping_snps_set.intersection(snpsIDs_3700))

3742

All SNPs from the Origin (3742) fall into the mapping SNPS (RiskSnps). Hence: extra 500 SNPs are in the RiskSNP dataset for combPD

In [6]:
#check quickly how many snps we have have are actually in the pheWAS catalog
def check4SNPs(snpslist,catalog):
    
    #catalog=pd.read_csv(csv, delimiter = ',') #reading in the whole csv as a dataframe
    SNP_pat = '|'.join(r"{}".format(x) for x in snpslist) #compile search pattern for input genes of interest
    catalog_filtered = catalog[catalog["snp"].str.contains(SNP_pat,na=False)] #filtered catalog containing genes of interest only
    print ("count of input SNPs:",len(snpslist))
    print ("count of input SNPs found in PheWAS:",len(catalog_filtered))
    print (catalog_filtered)
    
    return ("SNPs actually missing in PheWAS: "+str(len(snpslist)-len(catalog_filtered)))

In [81]:
#check for origin SNPs 3700
check4SNPs(snpsIDs_3700,catalog)

count of input SNPs: 3742
count of input SNPs found in PheWAS: 3324
         chromosome         snp  \
128      6 32409530   rs3129882   
153      6 32409530   rs3129882   
174      6 32409530   rs3129882   
186     17 43719143    rs393152   
191      6 32409530   rs3129882   
...             ...         ...   
214942            3  rs10513789   
214958   6 32409530   rs3129882   
215055  11 84417846  rs10501570   
215056   4 15737937   rs4538475   
215079  17 41436901   rs8070723   

                                         phewas phenotype  cases  \
128     Rheumatoid arthritis & related inflammatory po...    511   
153                          Type 1 diabetic ketoacidosis    127   
174                                  Rheumatoid arthritis    398   
186                                  Infection of the eye   1229   
191                                       Type 1 diabetes    615   
...                                                   ...    ...   
214942                             

'SNPs actually missing in PheWAS: 418'

In [76]:
#check for mapping snps
check4SNPs(list(mapping_snps_set),catalog)

count of input SNPs: 4242
count of input SNPs found in PheWAS: 3378


'SNPs actually missing in PheWAS: 864'

In [None]:
# -> 54 of the additional 500 SNPs in the mapping are also found in the db. No big difference at all. 

In [34]:
#####################################################################################################