# Summary Statistics REST API workshop

The following example shows how to access and parse data from the GWAS Summary Statistics database using the REST API. This demonstrates examples, but is not exhaustive. Please refer to the [documentation](https://www.ebi.ac.uk/gwas/summary-statistics/docs/) for more details.

Version: `0.1`

Date: `2019 June 04`

REST is language-agnostic. Here we use Python, just for the purpose of demonstration.

In [None]:
# Importing required packages

import requests     # Manages data transfer from the GWAS Catalog REST API
import pandas as pd # Makes data handling easier
import json         # Hanling the returned data type called JSON

## Endpoints

Run the following to see the endpoints (associations, traits, studies, chromosomes):

In [None]:
# API root address:
api_url='https://www.ebi.ac.uk/gwas/summary-statistics/api'

response = requests.get(api_url)

# The returned response is a "response" object, from which we have to extract and parse the information:
decoded = response.json()

print(json.dumps(decoded, indent = 2))


## Get associations for a given variant

In [None]:
# Accessing data for a single variant. Must be an rsID:
variant = 'rs62402518'
request_url = '{api}/associations/{variant}'.format(api=api_url, variant=variant)
response = requests.get(request_url)
decoded = response.json()

print(json.dumps(decoded, indent = 2))

### Interpreting the response
From the returned JSON, you can see it has 20 associations each from different studies. For each association there are the values for p-value, beta, etc. and also links to the associated trait, variant, study and self (study & variant combination). It is also paginated and has `"_links"` at the bottom, showing the URLs for this page (`"self"`) the first page (`"first"`) and the next page (`"next"`). By default it shows 20 results per page, but this can be changed using the `size=` parameter, just as you see in the first and next links i.e. `?size=20`. 

The same 'layout' of the data applies to all the following examples.

## Get a list of associations for a given trait

In [None]:
# Accessing data for a specific trait (EFO term):
trait = 'EFO_0004466'
request_url = '{api}/traits/{trait}/associations'.format(api=api_url, trait=trait)
response = requests.get(request_url)
decoded = response.json()

print(json.dumps(decoded, indent = 2))

## Get a list of associations for a given study


In [None]:
# Accessing data for a specific study. Must be a GWAS Catalog study accession ID e.g. GCST000571:
study = 'GCST000571'
request_url = '{api}/studies/{study}/associations'.format(api=api_url, study=study)
response = requests.get(request_url)
decoded = response.json()

print(json.dumps(decoded, indent = 2))

## Get a list of associations within a genomic region


In [None]:
# Accessing data for a specific genomic region (e.g. chr9:132000000-133000000):
chromosome = 9
bp_lower = 132000000
bp_upper = 133000000
request_url = '{api}/chromosomes/{chrom}/associations?bp_lower={low}&bp_upper={high}'.format(api=api_url, 
                                                                                             chrom=chromosome, 
                                                                                             low=bp_lower,
                                                                                             high=bp_upper)
response = requests.get(request_url)
decoded = response.json()

print(json.dumps(decoded, indent = 2))

## Get a list of associations below a p-value threshold

You may want to filter the associations to only those that are below a p-value threshold e.g. for a given trait such as diabetes type II (EFO_0001360) you want all the associations below p-value 1.0e-5:

In [None]:
# Accessing data for a specific trait (EFO term) below a p-value threshold:
trait = 'EFO_0001360'
pval_upper = 1.0e-5 # can be any valid float e.g. 0.00001
request_url = '{api}/traits/{trait}/associations?p_upper={high}'.format(api=api_url, 
                                                                        trait=trait,
                                                                        high=pval_upper)
response = requests.get(request_url)
decoded = response.json()

print(json.dumps(decoded, indent = 2))

Let's write the above so that it returns the response in a pandas dataframe

In [None]:
# return a pandas dataframe of results for the above example:

extracted_data = []
size = 10

trait = 'EFO_0001360'
pval_upper = 1.0e-5 # can be any valid float e.g. 0.00001
request_url = '{api}/traits/{trait}/associations?p_upper={high}&size={size}'.format(api=api_url, 
                                                                                    trait=trait,
                                                                                    high=pval_upper,
                                                                                    size=size)
response = requests.get(request_url)
decoded = response.json()

for i in range(0, size):
    association = decoded['_embedded']['associations'][str(i)]

    pval = association['p_value']
    bp = association['base_pair_location']
    chrom = association['chromosome']
    ea = association['effect_allele']
    oa = association['other_allele']
    beta = association['beta']
    odds = association['odds_ratio']
    
    extracted_data.append({'trait': trait,
                           'pvalue': pval,
                           'position': bp,
                           'chromosome': chrom,
                           'effect_allele': ea,
                           'other_allele': oa,
                           'beta': beta,
                           'odds_ratio': odds
                          })


table = pd.DataFrame.from_dict(extracted_data)
table