# Immune disease associations of Neanderthal-introgressed SNPs

This code investigates if Neanderthal-introgressed SNPs (present in Chen introgressed sequences) have been associated with any immune-related diseases, including infectious diseases, allergic diseases, autoimmune diseases and autoinflammatory diseases, using data from the NHGRI-EBI GWAS Catalog.

Neanderthal-introgressed SNPs from:
1. Dannemann M, Prufer K & Kelso J. Functional implications of Neandertal introgression in modern humans. *Genome Biol* 2017 **18**:61.
2. Simonti CN *et al.* The phenotypic legacy of admixture between modern humans and Neandertals. *Science* 2016 **351**:737-41.  

Neanderthal-introgressed sequences by Chen *et al.* from:
* Chen L *et al.* Identifying and interpreting apparent Neanderthal ancestry in African individuals. *Cell* 2020 **180**:677-687.  

GWAS summary statistics from:
* [GWAS Catalog](https://www.ebi.ac.uk/gwas/docs/file-downloads)

In [1]:
# Import modules
import pandas as pd

## Get Neanderthal SNPs present in GWAS Catalog

In [2]:
# Load Chen Neanderthal-introgressed SNPs
chen = pd.read_excel('../chen/Additional File 1.xlsx', 'Sheet1', usecols=['Chromosome', 'Position', 'Source', 'ID', 'Chen'])
neanderthal = chen.loc[chen.Chen == 'Yes'].copy()
neanderthal.drop('Chen', axis=1)

Unnamed: 0,ID,Chromosome,Position,Source
0,rs118163204,1,834360,simonti_only
1,rs79501908,1,838695,simonti_only
2,rs151325546,1,850373,simonti_only
3,rs80174979,1,854793,simonti_only
13,rs117504198,1,1964852,simonti_only
...,...,...,...,...
353190,rs373168311,22,51020584,dannemann_only
353191,rs62239545,22,51026458,dannemann_only
353192,rs58521696,22,51026842,dannemann_only
353193,rs138179215,22,51028027,dannemann_only


In [3]:
# Load GWAS catalog
catalog = pd.read_csv('GWAS_Catalog.tsv', sep="\t", header=0,
                      usecols=['DISEASE/TRAIT', 'CHR_ID', 'CHR_POS', 'REPORTED GENE(S)', 'MAPPED_GENE',
                               'STRONGEST SNP-RISK ALLELE', 'SNPS', 'RISK ALLELE FREQUENCY', 'P-VALUE', 'OR or BETA',
                               '95% CI (TEXT)', 'MAPPED_TRAIT', 'STUDY ACCESSION'], low_memory=False)
catalog = catalog.loc[catalog.CHR_ID != 'X'].copy()
catalog = catalog.loc[catalog.CHR_ID != 'Y'].copy()
catalog.rename(columns={'CHR_ID': 'Chromosome', 'CHR_POS': 'Position', 'SNPS': 'ID'}, inplace=True)

In [4]:
# Neanderthal SNPs present in GWAS catalog
nean_catalog = neanderthal.merge(catalog.drop(columns=['Chromosome', 'Position']), how='inner', on='ID')
nean_catalog

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
0,rs2742690,1,2987268,dannemann_only,Yes,Waist-hip ratio,,PRDM16,rs2742690-A,0.2001,9E-12,0.0168,[0.012-0.022] unit increase,waist-hip ratio,GCST008996
1,rs947350,1,3431555,both,Yes,Lung function (FEV1/FVC),,MEGF6,rs947350-?,NR,1E-34,,,FEV/FEC ratio,GCST007080
2,rs12562437,1,3651031,dannemann_only,Yes,Visceral adipose tissue/subcutaneous adipose t...,"TP73, CCDC27, KIAA0495, LRRC47",TP73,rs12562437-T,0.04,2E-6,,,visceral:subcutaneous adipose tissue ratio,GCST001524
3,rs10910018,1,3651409,both,Yes,Visceral fat,"TP73, CCDC27, KIAA0495, LRRC47",TP73,rs10910018-A,0.04,2E-6,,,visceral adipose tissue measurement,GCST001525
4,rs10910018,1,3651409,both,Yes,Visceral fat,"TP73, CCDC27, KIAA0495, LRRC47",TP73,rs10910018-A,0.04,2E-6,,,visceral adipose tissue measurement,GCST001525
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1133,rs12530,22,32783904,dannemann_only,Yes,IgG glycosylation,NR,RTCB,rs12530-C,0.18432396929239,3E-6,0.1828,[0.11-0.26] unit increase,serum IgG glycosylation measurement,GCST001848
1134,rs117325373,22,50357625,dannemann_only,Yes,Response to paliperidone in schizophrenia (pos...,,PIM3,rs117325373-G,NR,2E-6,6.3469,[NR] unit decrease,"schizophrenia, response to paliperidone, schiz...",GCST004040
1135,rs79966207,22,50722408,dannemann_only,Yes,Blond vs. brown/black hair color,PLXNB2,PLXNB2,rs79966207-C,0.1776,5E-10,1.0630,NR,hair color,GCST006988
1136,rs79966207,22,50722408,dannemann_only,Yes,Body mass index,NR,PLXNB2,rs79966207-?,NR,6E-9,,,body mass index,GCST009871


## Immune-related diseases associated with Neanderthal SNPs

### Infections

In [5]:
nean_catalog.loc[nean_catalog['DISEASE/TRAIT'].str.contains('influenza')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
315,rs4261353,3,191695113,dannemann_only,Yes,Severe influenza A (H1N1) infection,FGF12,AC026320.2 - AC026320.1,rs4261353-?,0.001866,1e-08,27.44,[4.948-152.1],influenza A (H1N1),GCST003123


In [6]:
nean_catalog.loc[nean_catalog['DISEASE/TRAIT'].str.contains('wart')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
960,rs150469170,14,77009234,dannemann_only,Yes,Plantar warts,NR,AC008050.1,rs150469170-?,NR,1e-06,0.3413,[0.2-0.48] unit increase,susceptibility to plantar warts measurement,GCST005005


In [7]:
nean_catalog.loc[nean_catalog['DISEASE/TRAIT'].str.contains('HIV')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
387,rs17291045,4,161506897,both,Yes,HIV-1 control,intergenic,LINC02477,rs17291045-?,NR,5e-08,,,HIV-1 infection,GCST000549
487,rs36111427,6,117753834,dannemann_only,Yes,Time-dependent creatinine clearance change res...,NR,AL132671.2,rs36111427-?,,9e-06,,,"response to tenofovir, HIV infection, creatini...",GCST006069
636,rs72749479,9,83282271,simonti_only,Yes,Time-dependent creatinine clearance change res...,NR,MTND2P9 - RPS19P6,rs72749479-?,,7e-07,,,"response to tenofovir, HIV infection, creatini...",GCST006069
637,rs72749483,9,83285174,simonti_only,Yes,Time-dependent creatinine clearance change res...,NR,MTND2P9 - RPS19P6,rs72749483-?,,7e-07,,,"response to tenofovir, HIV infection, creatini...",GCST006069
688,rs17154929,10,44524675,dannemann_only,Yes,HIV-associated dementia,intergenic,LINC00841 - AL137026.2,rs17154929-T,NR,1e-07,,,AIDS dementia,GCST001542


In [8]:
nean_catalog.loc[nean_catalog['DISEASE/TRAIT'].str.contains('Malaria')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
717,rs12788102,11,4790575,simonti_only,Yes,Malaria,HBB,"OR51F1, MMP26",rs12788102-A,NR,2e-16,2.15,[1.79-2.59],malaria,GCST002033


### Allergic diseases

In [9]:
nean_catalog.loc[nean_catalog['MAPPED_TRAIT'].str.contains('allerg')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
317,rs113048054,3,196343817,dannemann_only,Yes,"Allergic disease (asthma, hay fever or eczema)",NR,AC092933.2 - LINC01063,rs113048054-A,0.93413,4e-12,1.080497,[1.06-1.1],"Eczema, allergic rhinitis, asthma",GCST009716
322,rs4916533,3,196373582,simonti_only,Yes,Asthma or allergic disease (pleiotropy),NRROS,"NRROS, PIGX",rs4916533-?,NR,2e-08,1.075269,,"allergy, asthma",GCST007564
323,rs4916533,3,196373582,simonti_only,Yes,Hay fever and/or eczema,NR,"NRROS, PIGX",rs4916533-C,0.91931,6e-10,1.070778,[1.05-1.09],"Eczema, allergic rhinitis",GCST009717
334,rs11727978,4,38811051,simonti_only,Yes,Allergy,"TLR1, TLR6",TLR1,rs11727978-?,NR,1e-15,1.083,[1.062-1.104],allergy,GCST003990
335,rs66819621,4,38819403,simonti_only,Yes,Allergic rhinitis,TLR1,TLR1,rs66819621-A,0.8428,1.9999999999999998e-25,1.179941,[1.15-1.22],allergic rhinitis,GCST009719
371,rs45613035,4,123141070,dannemann_only,Yes,Hay fever and/or eczema,NR,KIAA1109,rs45613035-C,0.09838,6e-20,1.094,[1.074-1.115],"Eczema, allergic rhinitis",GCST009717


In [10]:
nean_catalog.loc[nean_catalog['MAPPED_TRAIT'].str.contains('asthma')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
317,rs113048054,3,196343817,dannemann_only,Yes,"Allergic disease (asthma, hay fever or eczema)",NR,AC092933.2 - LINC01063,rs113048054-A,0.93413,4e-12,1.080497,[1.06-1.1],"Eczema, allergic rhinitis, asthma",GCST009716
318,rs113048054,3,196343817,dannemann_only,Yes,Asthma,"NNROS, LRRC33",AC092933.2 - LINC01063,rs113048054-G,0.06587,3e-11,,,asthma,GCST009720
319,rs112336433,3,196349004,dannemann_only,Yes,Asthma,NRROS,AC092933.2 - LINC01063,rs112336433-C,0.93272,3e-08,1.069407,[1.05-1.09],asthma,GCST008916
322,rs4916533,3,196373582,simonti_only,Yes,Asthma or allergic disease (pleiotropy),NRROS,"NRROS, PIGX",rs4916533-?,NR,2e-08,1.075269,,"allergy, asthma",GCST007564
370,rs45613035,4,123141070,dannemann_only,Yes,Asthma (childhood onset),NR,KIAA1109,rs45613035-?,NR,9e-12,,,childhood onset asthma,GCST009841
372,rs45613035,4,123141070,dannemann_only,Yes,Asthma,"ADAD1, IL2, IL21, IL21-AS1, KIAA1109",KIAA1109,rs45613035-C,0.098644,1e-12,1.077323,[1.06-1.1],asthma,GCST008916
373,rs45613035,4,123141070,dannemann_only,Yes,Atopic asthma,NR,KIAA1109,rs45613035-?,NR,5e-16,,,atopic asthma,GCST009850
374,rs45613035,4,123141070,dannemann_only,Yes,Asthma,KIAA1109,KIAA1109,rs45613035-C,NR,6e-19,1.107,[1.09-1.13],asthma,GCST007798
375,rs45613035,4,123141070,dannemann_only,Yes,Asthma,"KIAA1109, ADAD1, IL2, TRPC3, BBS7, IL21",KIAA1109,rs45613035-C,0.09872,5e-14,1.08175,[1.06-1.10],asthma,GCST009798
378,rs62322662,4,123359569,both,Yes,Asthma,"ADAD1, IL2",ADAD1 - IL2,rs62322662-G,NR,2e-18,1.119,[1.09-1.14],asthma,GCST007798


In [11]:
nean_catalog.loc[nean_catalog['MAPPED_TRAIT'].str.contains('Eczema')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
317,rs113048054,3,196343817,dannemann_only,Yes,"Allergic disease (asthma, hay fever or eczema)",NR,AC092933.2 - LINC01063,rs113048054-A,0.93413,4e-12,1.080497,[1.06-1.1],"Eczema, allergic rhinitis, asthma",GCST009716
321,rs12152276,3,196368501,dannemann_only,Yes,Eczema,,"PIGX, NRROS",rs12152276-?,NR,7e-10,,,Eczema,GCST007075
323,rs4916533,3,196373582,simonti_only,Yes,Hay fever and/or eczema,NR,"NRROS, PIGX",rs4916533-C,0.91931,6e-10,1.070778,[1.05-1.09],"Eczema, allergic rhinitis",GCST009717
371,rs45613035,4,123141070,dannemann_only,Yes,Hay fever and/or eczema,NR,KIAA1109,rs45613035-C,0.09838,6e-20,1.094,[1.074-1.115],"Eczema, allergic rhinitis",GCST009717
376,rs45613035,4,123141070,dannemann_only,Yes,Eczema,,KIAA1109,rs45613035-?,NR,1.9999999999999998e-26,,,Eczema,GCST007075
1116,rs73203093,21,36457506,dannemann_only,Yes,Eczema,,RUNX1,rs73203093-?,NR,7e-09,,,Eczema,GCST007075


### Autoimmune/autoinflammatory diseases

In [12]:
nean_catalog.loc[nean_catalog['MAPPED_TRAIT'].str.contains('lupus')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
96,rs34889541,1,198594769,dannemann_only,Yes,Systemic lupus erythematosus,"PTPRC, CD45",AL157402.1 - PTPRC,rs34889541-G,0.86,3e-10,1.282051,[1.19-1.39],systemic lupus erythematosus,GCST003622
97,rs34889541,1,198594769,dannemann_only,Yes,Systemic lupus erythematosus,"PTPRC, CD45",AL157402.1 - PTPRC,rs34889541-G,0.93,2e-12,1.234568,[1.16-1.32],systemic lupus erythematosus,GCST003622
536,rs35000415,7,128585616,dannemann_only,Yes,Systemic lupus erythematosus,IRF5,IRF5,rs35000415-?,0.108,2.0000000000000003e-45,1.8,[NR],systemic lupus erythematosus,GCST003622
537,rs35000415,7,128585616,dannemann_only,Yes,Systemic lupus erythematosus,IRF5,IRF5,rs35000415-T,,1.0000000000000001e-60,1.83,[NR],systemic lupus erythematosus,GCST003156
538,rs35000415,7,128585616,dannemann_only,Yes,Systemic lupus erythematosus,"IRF5, TNPO3",IRF5,rs35000415-T,NR,1e-99,1.82,[1.69-1.96],systemic lupus erythematosus,GCST005752
540,rs10488631,7,128594183,dannemann_only,Yes,Systemic lupus erythematosus,IRF5,AC025594.2 - TNPO3,rs10488631-C,0.11,7e-18,1.92,[1.66-2.22],systemic lupus erythematosus,GCST000996
541,rs10488631,7,128594183,dannemann_only,Yes,Systemic lupus erythematosus,"TNPO3, IRF5",AC025594.2 - TNPO3,rs10488631-C,0.12,2e-11,,,systemic lupus erythematosus,GCST000144
545,rs10488631,7,128594183,dannemann_only,Yes,Systemic lupus erythematosus,TNPO3,AC025594.2 - TNPO3,rs10488631-C,0.11,2e-13,1.829,[1.684-1.99],systemic lupus erythematosus,GCST002463
552,rs10488631,7,128594183,dannemann_only,Yes,Systemic lupus erythematosus,"IRF5, TNPO3",AC025594.2 - TNPO3,rs10488631-C,,9e-110,1.92,[1.81–2.03],systemic lupus erythematosus,GCST003155
561,rs12706861,7,128616582,dannemann_only,Yes,Systemic lupus erythematosus,"IRF5, TNPO3",TNPO3,rs12706861-T,0.118,4e-71,1.76,[1.65-1.87],systemic lupus erythematosus,GCST007400


In [13]:
nean_catalog.loc[nean_catalog['MAPPED_TRAIT'].str.contains('rheumatoid')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
550,rs10488631,7,128594183,dannemann_only,Yes,Rheumatoid arthritis,IRF5,AC025594.2 - TNPO3,rs10488631-C,0.11,4e-11,1.19,[NR],rheumatoid arthritis,GCST000679
574,rs2306848,7,129962414,simonti_only,Yes,Rheumatoid arthritis,CPA4,CPA4,rs2306848-G,0.948,6e-12,,,rheumatoid arthritis,GCST007843
950,rs3783782,14,61940675,dannemann_only,Yes,Rheumatoid arthritis,PRKCH,PRKCH,rs3783782-A,0.09,2e-09,1.14,[1.09-1.18],rheumatoid arthritis,GCST002318
951,rs3783782,14,61940675,dannemann_only,Yes,Rheumatoid arthritis,PRKCH,PRKCH,rs3783782-A,0.22,4e-09,1.14,[1.09-1.19],rheumatoid arthritis,GCST002318
952,rs3783782,14,61940675,dannemann_only,Yes,Rheumatoid arthritis,PRKCH,PRKCH,rs3783782-A,NR,1e-07,0.151127,[0.095-0.207] unit increase,rheumatoid arthritis,GCST006959
953,rs3783782,14,61940675,dannemann_only,Yes,Rheumatoid arthritis,PRKCH,PRKCH,rs3783782-A,0.23,1e-07,0.16062,[0.1-0.22] unit increase,rheumatoid arthritis,GCST006959


In [14]:
nean_catalog.loc[nean_catalog['MAPPED_TRAIT'].str.contains('scleroderma')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
542,rs10488631,7,128594183,dannemann_only,Yes,Systemic sclerosis,"TNPO, IRF5",AC025594.2 - TNPO3,rs10488631-C,NR,2e-13,1.5,[1.35-1.67],systemic scleroderma,GCST000650
544,rs10488631,7,128594183,dannemann_only,Yes,Systemic sclerosis,"TNPO3, IRF5",AC025594.2 - TNPO3,rs10488631-C,0.09,4e-07,1.35,[1.20-1.51],systemic scleroderma,GCST001146
546,rs10488631,7,128594183,dannemann_only,Yes,Systemic sclerosis,IRF5,AC025594.2 - TNPO3,rs10488631-?,NR,2e-10,1.5,[1.32-1.69],systemic scleroderma,GCST001160
547,rs10488631,7,128594183,dannemann_only,Yes,Systemic sclerosis,IRF5,AC025594.2 - TNPO3,rs10488631-?,NR,8e-07,1.63,[1.34-1.98],systemic scleroderma,GCST001156
548,rs10488631,7,128594183,dannemann_only,Yes,Systemic sclerosis,IRF5,AC025594.2 - TNPO3,rs10488631-?,NR,1e-09,1.61,[1.38-1.88],systemic scleroderma,GCST001156
549,rs10488631,7,128594183,dannemann_only,Yes,Systemic sclerosis,IRF5,AC025594.2 - TNPO3,rs10488631-?,NR,2e-07,1.52,[1.30-1.79],systemic scleroderma,GCST001156
553,rs10488631,7,128594183,dannemann_only,Yes,Diffuse cutaneous systemic sclerosis,TNPO3,AC025594.2 - TNPO3,rs10488631-?,,1e-09,,,diffuse scleroderma,GCST005335
554,rs10488631,7,128594183,dannemann_only,Yes,Systemic sclerosis (anti-centromere-positive),"IRF5, TNPO3",AC025594.2 - TNPO3,rs10488631-?,,9e-07,,,anti-centromere-antibody-positive systemic scl...,GCST005333
555,rs10488631,7,128594183,dannemann_only,Yes,Systemic sclerosis (anti-topoisomerase-positive),TNPO3,AC025594.2 - TNPO3,rs10488631-?,,4e-07,,,anti-topoisomerase-I-antibody-positive systemi...,GCST005332
556,rs10488631,7,128594183,dannemann_only,Yes,Limited cutaneous systemic scleroderma,"TNPO3, IRF5",AC025594.2 - TNPO3,rs10488631-?,NR,3e-10,,,limited scleroderma,GCST005334


In [15]:
nean_catalog.loc[nean_catalog['MAPPED_TRAIT'].str.contains('Sjogren')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
565,rs17339836,7,128681062,both,Yes,Sjögren's syndrome,"IRF5, TNPO3",TNPO3,rs17339836-T,0.12,2e-16,1.58,[1.36–1.84],Sjogren syndrome,GCST004878


In [16]:
nean_catalog.loc[nean_catalog['MAPPED_TRAIT'].str.contains('Grave')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
65,rs1265883,1,160464911,dannemann_only,Yes,Graves' disease,SLAMF6,SLAMF6,rs1265883-C,0.1,2e-18,1.34,[1.25-1.43],Graves disease,GCST001984


In [17]:
nean_catalog.loc[nean_catalog['MAPPED_TRAIT'].str.contains('glomerulonephritis')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
196,rs17830558,2,160878364,simonti_only,Yes,Membranous nephropathy,PLA2R1,PLA2R1,rs17830558-T,0.52,4e-10,1.87,[1.54–2.28],membranous glomerulonephritis,GCST003402
199,rs17831251,2,160914156,simonti_only,Yes,Membranous nephropathy,PLA2R1,PLA2R1,rs17831251-C,NR,5e-103,2.25,[2.09-2.42],membranous glomerulonephritis,GCST010004
200,rs17831251,2,160914156,simonti_only,Yes,Membranous nephropathy,PLA2R1,PLA2R1,rs17831251-C,0.61,5e-48,1.98,[1.81-2.17],membranous glomerulonephritis,GCST010005
201,rs17831251,2,160914156,simonti_only,Yes,Membranous nephropathy,PLA2R1,PLA2R1,rs17831251-C,0.7,4e-61,2.81,[2.48-3.17],membranous glomerulonephritis,GCST010006
202,rs4664308,2,160917497,simonti_only,Yes,Idiopathic membranous nephropathy,"LY75, ITGB6, PLA2R1, RBMS1",PLA2R1,rs4664308-A,0.43,9.000000000000001e-29,2.28,[1.96-2.64],membranous glomerulonephritis,GCST000984


In [18]:
nean_catalog.loc[nean_catalog['MAPPED_TRAIT'].str.contains('colitis')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
874,rs12422544,12,40528432,both,Yes,Ulcerative colitis,NR,SLC2A13 - LINC02555,rs12422544-G,0.01915,3e-06,1.215345,[1.13-1.3],ulcerative colitis,GCST003045
1046,rs7236492,18,77220616,dannemann_only,Yes,Chronic inflammatory diseases (ankylosing spon...,NFATC1,"AC018445.5, NFATC1",rs7236492-?,NR,1e-07,,,"ankylosing spondylitis, psoriasis, ulcerative ...",GCST005537


In [19]:
nean_catalog.loc[nean_catalog['MAPPED_TRAIT'].str.contains('Crohn')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
692,rs7076156,10,64415184,dannemann_only,Yes,Crohn's disease,"ADO, ZNF365, ERG2","AC067752.1, AC024598.1",rs7076156-G,0.751,7e-09,1.19,[1.10-1.30],Crohn's disease,GCST001438
875,rs12422544,12,40528432,both,Yes,Crohn's disease,NR,SLC2A13 - LINC02555,rs12422544-G,0.01915,3.9999999999999997e-25,1.455191,[1.38-1.53],Crohn's disease,GCST003044
876,rs11175593,12,40601940,dannemann_only,Yes,Crohn's disease,"MUC19, LRRK2","LRRK2-DT, LRRK2, LINC02471",rs11175593-T,0.02,3e-10,1.54,[NR],Crohn's disease,GCST000207
923,rs17061048,13,40833012,dannemann_only,Yes,Crohn's disease,NR,LINC00598,rs17061048-A,0.95,1e-07,1.161814,[1.11-1.22],Crohn's disease,GCST003044
1044,rs7236492,18,77220616,dannemann_only,Yes,Crohn's disease,"NFATC1, TST","AC018445.5, NFATC1",rs7236492-G,0.85,9e-09,1.104796,[1.07-1.14],Crohn's disease,GCST003044
1046,rs7236492,18,77220616,dannemann_only,Yes,Chronic inflammatory diseases (ankylosing spon...,NFATC1,"AC018445.5, NFATC1",rs7236492-?,NR,1e-07,,,"ankylosing spondylitis, psoriasis, ulcerative ...",GCST005537


In [20]:
nean_catalog.loc[nean_catalog['MAPPED_TRAIT'].str.contains('bowel')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
924,rs17061048,13,40833012,dannemann_only,Yes,Inflammatory bowel disease,NR,LINC00598,rs17061048-A,0.95,5e-09,1.145133,,inflammatory bowel disease,GCST003043
1045,rs7236492,18,77220616,dannemann_only,Yes,Inflammatory bowel disease,"NFATC1, TST","AC018445.5, NFATC1",rs7236492-G,0.85,1e-08,1.083568,,inflammatory bowel disease,GCST003043


In [21]:
nean_catalog.loc[nean_catalog['MAPPED_TRAIT'].str.contains('psoriasis')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
414,rs12188351,5,168386089,both,Yes,Inflammatory skin disease,SLIT3,SLIT3,rs12188351-?,NR,1e-08,,,psoriasis,GCST002740
693,rs7922314,10,64538279,dannemann_only,Yes,Cutaneous psoriasis,ADO,AC067751.1,rs7922314-C,0.9206,2e-06,1.33,NR,"cutaneous psoriasis measurement, psoriasis",GCST003269
886,rs2066807,12,56740682,dannemann_only,Yes,Psoriasis,"IL23A, STAT2",STAT2,rs2066807-G,0.932351,1e-10,1.55,[1.35-1.77],psoriasis,GCST002874
887,rs2066807,12,56740682,dannemann_only,Yes,Psoriasis,"IL23A, STAT2",STAT2,rs2066807-G,0.932351,5e-12,1.4,[1.27-1.54],psoriasis,GCST002874
1046,rs7236492,18,77220616,dannemann_only,Yes,Chronic inflammatory diseases (ankylosing spon...,NFATC1,"AC018445.5, NFATC1",rs7236492-?,NR,1e-07,,,"ankylosing spondylitis, psoriasis, ulcerative ...",GCST005537


In [22]:
nean_catalog.loc[nean_catalog['MAPPED_TRAIT'].str.contains('celiac')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
272,rs13098911,3,46235201,both,Yes,Celiac disease,"CCRL2, CCR5, CCR9, CCR1, CCR2, CCR3",CCR3,rs13098911-A,0.10,3e-17,1.3,[1.23-1.39],celiac disease,GCST000612
377,rs62321692,4,123261530,dannemann_only,Yes,Celiac disease,KIAA1109,KIAA1109,rs62321692-?,NR,1e-08,0.1924,unit decrease,celiac disease,GCST008489


In [23]:
nean_catalog.loc[nean_catalog['MAPPED_TRAIT'].str.contains('multiple sclerosis')]

Unnamed: 0,ID,Chromosome,Position,Source,Chen,DISEASE/TRAIT,REPORTED GENE(S),MAPPED_GENE,STRONGEST SNP-RISK ALLELE,RISK ALLELE FREQUENCY,P-VALUE,OR or BETA,95% CI (TEXT),MAPPED_TRAIT,STUDY ACCESSION
492,rs17780048,6,138179146,dannemann_only,Yes,Multiple sclerosis,LOC100130476,WAKMAR2,rs17780048-C,NR,5e-12,1.1008,NR,multiple sclerosis,GCST009597


## Do immune disease-associated Neanderthal SNPs show eQTL?

In [24]:
# Load eQTL data
fairfax_ori = pd.read_csv("../fairfax/tab2_a_cis_eSNPs.txt", sep="\t", usecols=["SNP", "Gene", "Min.dataset", "LPS2.FDR", "LPS24.FDR", "IFN.FDR", "Naive.FDR"])

fairfax_re = pd.read_csv('overlap_filtered_fairfax.csv', usecols=['rsid', 'pvalue', 'gene_id', 'Condition', 'beta'])
fairfax_re.sort_values('pvalue', inplace=True)
fairfax_re.drop_duplicates(subset=['rsid', 'gene_id', 'Condition'], keep='first', inplace=True)

nedelec_re = pd.read_csv('overlap_filtered_nedelec.csv', usecols=['rsid', 'pvalue', 'gene_id', 'Condition', 'beta'])
nedelec_re.sort_values('pvalue', inplace=True)
nedelec_re.drop_duplicates(subset=['rsid', 'gene_id', 'Condition'], keep='first', inplace=True)

quach = pd.read_csv('overlap_filtered_quach.csv', usecols=['rsid', 'pvalue', 'gene_id', 'Condition', 'beta'])
quach.sort_values('pvalue', inplace=True)
quach.drop_duplicates(subset=['rsid', 'gene_id', 'Condition'], keep='first', inplace=True)

alasoo = pd.read_csv('overlap_filtered_alasoo.csv', usecols=['rsid', 'pvalue', 'gene_id', 'Condition', 'beta'])
alasoo.sort_values('pvalue', inplace=True)
alasoo.drop_duplicates(subset=['rsid', 'gene_id', 'Condition'], keep='first', inplace=True)

In [25]:
# Selected Neanderthal SNPs with immune disease associations
gwas = open('overlapped_SNPs.txt', 'r').read().splitlines()
gwas

['rs4261353',
 'rs150469170',
 'rs2660',
 'rs2384071',
 'rs2384072',
 'rs11466617',
 'rs11466640',
 'rs4274855',
 'rs11725309',
 'rs11722813',
 'rs5743557',
 'rs5743562',
 'rs5743563',
 'rs5743565',
 'rs5743592',
 'rs5743571',
 'rs17582893',
 'rs17582921',
 'rs12788102',
 'rs17291045',
 'rs4916533',
 'rs113048054',
 'rs66819621',
 'rs11727978',
 'rs45613035',
 'rs12152276',
 'rs73203093',
 'rs13239597',
 'rs35000415',
 'rs10488631',
 'rs12706861',
 'rs11059927',
 'rs1385374',
 'rs4252665',
 'rs34889541',
 'rs3783782',
 'rs2306848',
 'rs62478615',
 'rs13239597',
 'rs36073657',
 'rs12534421',
 'rs17340351',
 'rs17339836',
 'rs1265883',
 'rs17830558',
 'rs17831251',
 'rs4664308',
 'rs17061048',
 'rs7236492',
 'rs7076156',
 'rs12422544',
 'rs11175593',
 'rs2066807',
 'rs12188351',
 'rs13098911',
 'rs62321692',
 'rs17780048']

In [26]:
# Overlap with original Fairfax eQTLs
ls = set(list(fairfax_ori.SNP)).intersection(gwas)
fairfax_ori.loc[fairfax_ori.SNP.isin(ls)]

Unnamed: 0,SNP,Gene,LPS2.FDR,LPS24.FDR,IFN.FDR,Naive.FDR,Min.dataset
3532,rs11725309,TLR1,9.9e-05,0.210042,4.68e-08,,IFN
5823,rs7236492,NFATC1,0.254156,1.2e-05,0.000212976,0.002774,LPS24
6359,rs2066807,CNPY2,0.074123,0.00078,0.000655555,0.034881,IFN
9990,rs7236492,NFATC1,,0.00046,0.02801852,0.164114,LPS24
12213,rs7236492,NFATC1,,0.032637,0.1839184,0.022086,Naive
12676,rs2066807,TMEM4,0.295496,0.000184,0.2863624,0.046866,LPS24
16655,rs11725309,TLR6,,0.000439,,0.234215,LPS24


In [27]:
# Overlap with recomputed Fairfax eQTLs
ls = set(list(fairfax_re.rsid)).intersection(gwas)
fairfax_re.loc[fairfax_re.rsid.isin(ls)]

Unnamed: 0,rsid,pvalue,gene_id,beta,Condition
30222,rs2660,1.977790e-137,ENSG00000089127,1.179160,Naive
7083,rs2660,7.975460e-130,ENSG00000089127,1.502870,IFN
7496,rs2384072,2.502920e-111,ENSG00000089127,1.431310,IFN
7489,rs2384071,2.567830e-111,ENSG00000089127,1.431740,IFN
30454,rs2384071,4.288830e-110,ENSG00000089127,1.103850,Naive
...,...,...,...,...,...
11693,rs17582893,5.169010e-10,ENSG00000174125,0.217032,LPS2
17042,rs11466640,8.309440e-10,ENSG00000174125,0.205908,LPS24
18659,rs4274855,1.668790e-09,ENSG00000174125,0.200024,LPS24
32063,rs7236492,1.804790e-09,ENSG00000131196,0.105722,Naive


In [28]:
# Overlap with recomputed Nedelec eQTLs
ls = set(list(nedelec_re.rsid)).intersection(gwas)
nedelec_re.loc[nedelec_re.rsid.isin(ls)]

Unnamed: 0,rsid,pvalue,gene_id,beta,Condition
1674,rs2384071,1.76555e-13,ENSG00000111331,0.162116,Salmonella
1675,rs2384072,1.76762e-13,ENSG00000111331,0.162108,Salmonella
1616,rs2660,1.07788e-11,ENSG00000111331,0.157389,Salmonella


In [29]:
# Overlap with recomputed Quach eQTLs
ls = set(list(quach.rsid)).intersection(gwas)
quach.loc[quach.rsid.isin(ls)]

Unnamed: 0,rsid,pvalue,gene_id,beta,Condition
916,rs2660,3.8841e-15,ENSG00000111331,0.197481,IAV
975,rs2384072,6.71257e-15,ENSG00000111331,0.194627,IAV
974,rs2384071,6.71733e-15,ENSG00000111331,0.194641,IAV
1017,rs11059927,4.87631e-12,ENSG00000139370,-0.26489,IAV
1024,rs1385374,4.8801e-12,ENSG00000139370,-0.264857,IAV
7240,rs2384071,4.87105e-10,ENSG00000111331,0.175183,R848
7241,rs2384072,4.87502e-10,ENSG00000111331,0.175165,R848
7182,rs2660,1.27944e-09,ENSG00000111331,0.172391,R848
8158,rs13098911,2.20519e-08,ENSG00000283646,0.684537,Naive


In [30]:
# Overlap with recomputed Alasoo eQTLs
ls = set(list(alasoo.rsid)).intersection(gwas)
alasoo.loc[alasoo.rsid.isin(ls)]

Unnamed: 0,rsid,pvalue,gene_id,beta,Condition
1124,rs13098911,5.54313e-11,ENSG00000223552,-0.816275,IFNg
2543,rs66819621,5.6025e-09,ENSG00000174125,-0.463299,Salmonella
636,rs2660,7.55666e-09,ENSG00000089127,-0.128276,IFNg+Salmonella
694,rs2384071,3.36478e-08,ENSG00000089127,-0.121321,IFNg+Salmonella
695,rs2384072,3.37173e-08,ENSG00000089127,-0.121283,IFNg+Salmonella
