# GWAS Catalog REST API workshop

* The following example shows a basic example how to access and parse data from the GWAS Catalog through the REST API. 
* Although this example is written in Python, any other programming language is equally good.
* Examples in other languages will be available soon.


### Contents:

* **Exercise 1**: fetching data from the API via browser
* **Exercise 2**: fetching data programatically for a single variant
* **Exercise 3**: fetching data for a list of variants
* **Exercise 4**: fetching and merging data from multiple endpoints

## Exercise 1

Opening the GWAS Catalog REST API in the browser to fetch a single study with accession ID: [GCST001795](https://www.ebi.ac.uk/gwas/studies/GCST001795)

**How the URL is generated:**

* API URL: `https://www.ebi.ac.uk/gwas/rest/api`
* Endpoint: `studies`
* AccessionID: `GCST001795`

**URL:**

[https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795](https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795)

## Understanding the returned data:

* Number of simple key-value pairs eg:

```json
    "initialSampleSize" : "1,656 Han Chinese ancestry cases, 3,394 Han Chinese ancestry controls",
    "snpCount" : 2100739,
    "imputed" : true,
    "accessionId" : "GCST001795",
```

* List allowing multiple elements for a key:

```json
    "genotypingTechnologies" : [ {
        "genotypingTechnology" : "Genome-wide genotyping array"
    } ],
```
* List where the values are themselves complex objects eg. ancestries.


The returned data is highly structured, hard to read for humans but easy to read for computer. In the following examples we make small scripts in Python to organize this data to make is easy to read for humans.

## Exercise 2

Return all associations for a single rsID ([rs7329174](https://www.ebi.ac.uk/gwas/variants/rs7329174))

In [None]:
# Importing required packages
import requests     # Manages data transfer from the GWAS Catalog REST API
import pandas as pd # Makes data handling easier
import json         # Hanling the returned data type called JSON

### Return association data:

In [None]:
# API Address:
apiUrl = 'https://www.ebi.ac.uk/gwas/rest/api'

# Accessing data for a single variant:
variant = 'rs7329174'
requestUrl = '%s/singleNucleotidePolymorphisms/%s/associations?projection=associationBySnp' %(apiUrl, variant)
response = requests.get(requestUrl, headers={ "Content-Type" : "application/json"})

# The returned response is a "response" object, from which we have to extract and parse the information:
decoded = response.json()

# The returned information is parsed as a python dictionary. Take a look at the values:
print(json.dumps(decoded, indent = 2))

### Parsing returned data to get traits and p-values

In [None]:
for association in decoded['_embedded']['associations']:
    trait = ",".join([trait['trait'] for trait in association['efoTraits']])
    pvalue = association['pvalue']
    
    print("Trait: %s, p-value: %s" %(trait, pvalue))


## Exercise 3

Look up association data for a list of variants

In [None]:

# List of variants:
variants = ['rs142968358', 'rs62402518', 'rs12199222', 'rs7329174', 'rs9879858765']

# Store extracted data in this list:
extractedData = []

# Iterating over all variants:
for variant in variants:

    # Accessing data for a single variant:
    requestUrl = '%s/singleNucleotidePolymorphisms/%s/associations?projection=associationBySnp' %(apiUrl, variant)
    response = requests.get(requestUrl, headers={ "Content-Type" : "application/json"})
    
    # Testing if rsID exists:
    if not response.ok:
        print("[Warning] %s is not in the GWAS Catalog!!" % variant)
        continue
    
    # Test if the returned data looks good:
    try:
        decoded = response.json()
    except:
        print("[Warning] Failed to encode data for %s" % variant)
        continue
    
    for association in decoded['_embedded']['associations']:
        trait = ",".join([trait['trait'] for trait in association['efoTraits']])
        pvalue = association['pvalue']
        
        extractedData.append({'variant' : variant,
                              'trait' : trait,
                              'pvalue' : pvalue})

# Format data into a table:
table = pd.DataFrame.from_dict(extractedData)
table

## Exercise 4

Extend the previous table with pubmed ID and study accession. These pieces of information is not found in the association data, they have to be fetched from other endpoints.

Use the links to these endpoints provided by each association data:

```json

"_links": {
    "self": {
        "href": "https://www.ebi.ac.uk/gwas/rest/api/associations/26384"
    },
    "association": {
        "href": "https://www.ebi.ac.uk/gwas/rest/api/associations/26384{?projection}",
        "templated": true
    },
    "snps": {
        "href": "https://www.ebi.ac.uk/gwas/rest/api/associations/26384/snps"
    },
    "efoTraits": {
        "href": "https://www.ebi.ac.uk/gwas/rest/api/associations/26384/efoTraits"
    },
    "study": {
        "href": "https://www.ebi.ac.uk/gwas/rest/api/associations/26384/study"
    }
}
```

In [None]:
# A small function to get accession ID and pubmed ID following the study link:
def getStudy(studyLink):
    # Accessing data for a single study:
    response = requests.get(studyLink, headers={ "Content-Type" : "application/json"})
    decoded = response.json()
    
    accessionID = decoded['accessionId']
    pubmedId = decoded['publicationInfo']['pubmedId']
    
    return((accessionID, pubmedId))

In [None]:
extractedData = []
for variant in variants:

    # Accessing data for a single variant:
    requestUrl = '%s/singleNucleotidePolymorphisms/%s/associations?projection=associationBySnp' %(apiUrl, variant)
    response = requests.get(requestUrl, headers={ "Content-Type" : "application/json"})
    
    # Testing if rsID exists:
    if not response.ok:
        print("[Warning] %s is not in the GWAS Catalog!!" % variant)
        continue
    
    # Test if the returned data looks good:
    try:
        decoded = response.json()
    except:
        print("[Warning] Failed to encode data for %s" % variant)
        continue
    
    for association in decoded['_embedded']['associations']:
        # extract study data:
        (accessionID, pubmedId) = getStudy(association['_links']['study']['href'])
        
        # 
        trait = ",".join([trait['trait'] for trait in association['efoTraits']])
        pvalue = association['pvalue']
        
        extractedData.append({'variant' : variant,
                              'trait' : trait,
                              'pvalue' : pvalue,
                              'accessionID' : accessionID,
                              'pubmedID' : pubmedId
                             })
        
table = pd.DataFrame.from_dict(extractedData)
# table.to_excel('workshop.xlsx')
print(table)
        