# ieugwaspy: Brief overview

This notebook provides some examples of how to use ```ieugwaspy``` to access data from the IEU GWAS database API.

First import the ```ieugwaspy``` package and ```pandas``` for data handling

In [2]:
import ieugwaspy as igd
import pandas as pd

## Exploring available studies

We get the full data on studies using the ```igd.gwasinfo()``` function, passing no parameters (if you pass a list of study IDs this function will just return the data for that subset). We then use Pandas ```DataFrame.from_dict``` function to convert this to a Pandas DataFrame, reorientating the data appropriately to one study per row. Finally, we look at the top of the file.

In [3]:
data = igd.gwasinfo()
df = pd.DataFrame.from_dict(data, orient='index')
df.head()

Unnamed: 0,note,access,year,mr,author,consortium,sex,priority,population,unit,sample_size,nsnp,trait,id,subcategory,category,ncase,pmid,ncontrol,sd
eqtl-a-ENSG00000261662,,public,2018.0,1,Vosa U,,Males and Females,0,European,,5502.0,14407,ENSG00000261662,eqtl-a-ENSG00000261662,,,,,,
eqtl-a-ENSG00000160460,,public,2018.0,1,Vosa U,,Males and Females,0,European,,29896.0,18645,ENSG00000160460,eqtl-a-ENSG00000160460,,,,,,
eqtl-a-ENSG00000205426,,public,2018.0,1,Vosa U,,Males and Females,0,European,,31684.0,19133,ENSG00000205426,eqtl-a-ENSG00000205426,,,,,,
eqtl-a-ENSG00000003436,,public,2018.0,1,Vosa U,,Males and Females,0,European,,31684.0,18041,ENSG00000003436,eqtl-a-ENSG00000003436,,,,,,
prot-a-7,,public,2018.0,1,Sun BB,,Males and Females,0,European,,3301.0,10534735,Tyrosine-protein kinase ABL1,prot-a-7,Protein,Immune system,,29875488.0,,


We can query the DataFrame for studies with Body mass data using standard Pandas syntax

In [31]:
bmistudies = df[df['trait'].str.contains("body", case=False) & df['trait'].str.contains("mass", case=False)]
bmistudies.head()

Unnamed: 0,note,access,year,mr,author,consortium,sex,priority,population,unit,sample_size,nsnp,trait,id,subcategory,category,ncase,pmid,ncontrol,sd
ebi-a-GCST006368,,public,2018.0,1,Hoffmann TJ,,,0,European,,315347.0,27854527,Body mass index,ebi-a-GCST006368,,,,30108127.0,,
ukb-a-267,,public,2017.0,1,Neale,Neale Lab,Males and Females,1,European,SD,331315.0,10894596,Whole body water mass,ukb-a-267,,,,,,
ukb-b-19953,21001: Output from GWAS pipeline using Phesant...,public,2018.0,1,Ben Elsworth,MRC-IEU,Males and Females,1,European,SD,461460.0,9851867,Body mass index (BMI),ukb-b-19953,,Continuous,,,,
ukb-a-265,,public,2017.0,1,Neale,Neale Lab,Males and Females,1,European,SD,330762.0,10894596,Whole body fat mass,ukb-a-265,,,,,,
ukb-a-266,,public,2017.0,1,Neale,Neale Lab,Males and Females,1,European,SD,331291.0,10894596,Whole body fat-free mass,ukb-a-266,,,,,,


## Extract association data

### Get tophits for a single study

Here we use one of the studies returned by the query for "Body mass" above

In [23]:
tophits = pd.DataFrame.from_dict(igd.tophits(["ukb-b-19953"])).sort_values(by='p', ascending=True)
tophits.head()

Unnamed: 0,p,se,n,beta,position,chr,id,rsid,ea,nea,eaf,trait
443,1.59956e-291,0.002014,461460,0.073497,53806453,16,ukb-b-19953,rs56094641,G,A,0.404564,Body mass index (BMI)
271,2.30144e-118,0.002342,461460,0.054172,57829135,18,ukb-b-19953,rs6567160,C,T,0.232714,Body mass index (BMI)
284,4.4998699999999994e-100,0.002612,461460,0.055468,628504,2,ukb-b-19953,rs6744646,G,A,0.828299,Body mass index (BMI)
100,1.99986e-91,0.002443,461460,0.049529,177889025,1,ukb-b-19953,rs539515,C,A,0.204942,Body mass index (BMI)
183,4.600449999999999e-87,0.004609,461460,-0.091156,422144,2,ukb-b-19953,rs62107261,C,T,0.048327,Body mass index (BMI)


### Extract association data for a combination of studies and SNPs

This allows us to extract a specific set of SNPs from a specific set of studies, both passed as Python lists.

In [27]:
assocs = pd.DataFrame.from_dict(igd.associations(["rs56094641","rs6567160","rs6744646"],
                                                 ["ieu-a-974","ieu-a-785","ukb-a-248"], 
                                                 proxies=1, 
                                                 r2=0.8, 
                                                 align_alleles=1, 
                                                 palindromes=1, 
                                                 maf_threshold=0.3)
                                                ).sort_values(by='p', ascending=True)
assocs.head()

Unnamed: 0,se,position,p,chr,beta,n,id,rsid,ea,nea,eaf,target_snp,proxy_snp,proxy,target_a1,target_a2,proxy_a1,proxy_a2,trait
6,0.002447,53806453,1.25893e-191,16,0.072289,336107,ukb-a-248,rs56094641,G,A,0.403318,rs56094641,rs56094641,False,,,,,Body mass index (BMI)
7,0.002834,57829135,2.5415600000000003e-75,18,0.052066,336107,ukb-a-248,rs6567160,C,T,0.234055,rs6567160,rs6567160,False,,,,,Body mass index (BMI)
0,0.0041,53821862,5.66761e-74,16,0.074,171366,ieu-a-974,rs56094641,G,A,0.4667,rs56094641,rs7201850,True,G,A,T,C,Body mass index
8,0.003187,628504,5.76766e-58,2,0.051158,336107,ukb-a-248,rs6744646,G,A,0.829019,rs6744646,rs6744646,False,,,,,Body mass index (BMI)
1,0.0046,57829135,5.08276e-34,18,0.0563,171875,ieu-a-974,rs6567160,C,T,0.2833,rs6567160,rs6567160,False,,,,,Body mass index


## Run a PheWAS

Extract results against all traits for one SNP from the BMI example above

In [28]:
phewas = pd.DataFrame.from_dict(igd.phewas(["rs56094641"])).sort_values(by='p', ascending=True)
phewas.head()

Unnamed: 0,n,position,beta,chr,p,se,id,rsid,ea,nea,eaf,trait
259,461460,53806453,0.073497,16,1.59956e-291,0.002014,ukb-b-19953,rs56094641,G,A,0.404564,Body mass index (BMI)
237,454884,53806453,0.073344,16,4.39542e-286,0.002029,ukb-b-2303,rs56094641,G,A,0.404637,Body mass index (BMI)
295,461632,53806453,0.06055,16,3.49945e-257,0.001768,ukb-b-11842,rs56094641,G,A,0.404568,Weight
211,454893,53806453,0.060498,16,3.9994500000000005e-253,0.00178,ukb-b-12039,rs56094641,G,A,0.404637,Weight
230,454718,53806453,0.047362,16,6.99842e-242,0.001426,ukb-b-4650,rs56094641,G,A,0.404698,Comparative body size at age 10


## Convert chr:pos variant IDs to rsid

This is just a helper function to convert variants in chromosome:position format to dbSNP rsids for queries against the database

In [29]:
variants = igd.variants_to_rsid(['10:44865737','7:105561135-105563135'])
print(variants)

['rs7777', 'rs234']
