### Using Python to Query Open Targets Genetics using GraphQL

#### Pyton Connection to GraphQL API
Code snippet that illustrate how to use very basic Python packages (requests/json/pandas) to connect to a GraphQL API, from [Melvynn's article](https://towardsdatascience.com/connecting-to-a-graphql-api-using-python-246dda927840).

In [1]:
import requests
import json
import pandas as pd

In [2]:
query = """query {
    characters {
    results {
      name
      status
      species
      type
      gender
    }
  }
}"""

In [3]:
url = 'https://rickandmortyapi.com/graphql/'
r = requests.post(url, json={'query': query})
print(r.status_code)
print(r.text)

200
{"data":{"characters":{"results":[{"name":"Rick Sanchez","status":"Alive","species":"Human","type":"","gender":"Male"},{"name":"Morty Smith","status":"Alive","species":"Human","type":"","gender":"Male"},{"name":"Summer Smith","status":"Alive","species":"Human","type":"","gender":"Female"},{"name":"Beth Smith","status":"Alive","species":"Human","type":"","gender":"Female"},{"name":"Jerry Smith","status":"Alive","species":"Human","type":"","gender":"Male"},{"name":"Abadango Cluster Princess","status":"Alive","species":"Alien","type":"","gender":"Female"},{"name":"Abradolf Lincler","status":"unknown","species":"Human","type":"Genetic experiment","gender":"Male"},{"name":"Adjudicator Rick","status":"Dead","species":"Human","type":"","gender":"Male"},{"name":"Agency Director","status":"Dead","species":"Human","type":"","gender":"Male"},{"name":"Alan Rails","status":"Dead","species":"Human","type":"Superhuman (Ghost trains summoner)","gender":"Male"},{"name":"Albert Einstein","status":"D

In [4]:
json_data = json.loads(r.text)
df_data = json_data['data']['characters']['results']
df = pd.DataFrame(df_data)
df.head(5)

Unnamed: 0,name,status,species,type,gender
0,Rick Sanchez,Alive,Human,,Male
1,Morty Smith,Alive,Human,,Male
2,Summer Smith,Alive,Human,,Female
3,Beth Smith,Alive,Human,,Female
4,Jerry Smith,Alive,Human,,Male


#### Python Connection to GraphQL API of Open Targets Genetics
Use Python to reproduce [Open Targets's tutorial](http://blog.opentargets.org/2020/08/06/accessing-the-open-targets-genetics-using-graphql/) using GraphQL browser

**From Ensembl ID to Gene Information**

In [5]:
query = "{ geneInfo(geneId: \"ENSG00000091831\") { symbol }}"
print(query)
url = 'https://genetics-api.opentargets.io/graphql'
r = requests.post(url, json={'query': query})
print(r.status_code)
print(r.text)

{ geneInfo(geneId: "ENSG00000091831") { symbol }}
200
{"data":{"geneInfo":{"symbol":"ESR1"}}}


In [6]:
query = """{
    geneInfo(geneId: \"ENSG00000012048\") {
        id
        symbol
        description
        chromosome
        start
        end
  }
}"""

In [7]:
url = 'https://genetics-api.opentargets.io/graphql'
r = requests.post(url, json={'query': query})
print(r.status_code)
print(r.text)
json_data = json.loads(r.text)
pd.DataFrame(json_data['data']['geneInfo'], index=[0])
# use index=[0] because there are only one element

200
{"data":{"geneInfo":{"id":"ENSG00000012048","symbol":"BRCA1","description":"BRCA1, DNA repair associated [Source:HGNC Symbol;Acc:HGNC:1100]","chromosome":"17","start":43044295,"end":43170245}}}


Unnamed: 0,id,symbol,description,chromosome,start,end
0,ENSG00000012048,BRCA1,"BRCA1, DNA repair associated [Source:HGNC Symb...",17,43044295,43170245


**From Ensembl ID to Related Studies**

In [8]:
query = """{
    studiesAndLeadVariantsForGene(geneId: \"ENSG00000012048\") {
        study {
          pmid
          pubDate
          pubJournal
          pubAuthor
          hasSumsStats
          nInitial
          nReplication
          nCases
          traitCategory
          numAssocLoci
        }
    }
}"""

In [9]:
url = 'https://genetics-api.opentargets.io/graphql'
r = requests.post(url, json={'query': query})
print(r.status_code)
print(r.text[:100])
json_data = json.loads(r.text)

200
{"data":{"studiesAndLeadVariantsForGene":[{"study":{"pmid":"PMID:29059683","pubDate":"2017-10-23","p


In [10]:
print(type(json_data['data']['studiesAndLeadVariantsForGene']))
print(len(json_data['data']['studiesAndLeadVariantsForGene']))
print(json_data['data']['studiesAndLeadVariantsForGene'][0])
pd.DataFrame([x['study'] for x in json_data['data']['studiesAndLeadVariantsForGene']]).head(5)
#because study is one of return element

<class 'list'>
79
{'study': {'pmid': 'PMID:29059683', 'pubDate': '2017-10-23', 'pubJournal': 'Nature', 'pubAuthor': 'Michailidou K', 'hasSumsStats': True, 'nInitial': 139274, 'nReplication': 103745, 'nCases': 76192, 'traitCategory': 'Integumentary system', 'numAssocLoci': 199}}


Unnamed: 0,pmid,pubDate,pubJournal,pubAuthor,hasSumsStats,nInitial,nReplication,nCases,traitCategory,numAssocLoci
0,PMID:29059683,2017-10-23,Nature,Michailidou K,True,139274,103745.0,76192.0,Integumentary system,199
1,,2018-08-01,,UKB Neale v2,True,359983,0.0,,Anthropometric measurement,502
2,,2018-08-01,,UKB Neale v2,True,354707,0.0,,Anthropometric measurement,415
3,PMID:30239722,2018-09-14,Hum Mol Genet,Pulit SL,False,806834,,,Anthropometric measurement,680
4,,2018-08-01,,UKB Neale v2,True,354831,0.0,,Anthropometric measurement,478


**From Ensembl ID to Related Studies and Lead Variants based on L2G Pipeline**

Check https://gist.github.com/mirandaio/bc0cac808341b074ab0e2da0cfcc3e42 for implementatoion based on input json file and curl.

In [11]:
query = """{
    studiesAndLeadVariantsForGeneByL2G(geneId: \"ENSG00000158158\") {
        variant {
            id
            rsId
        }
        study {
            studyId
            traitReported
        }
    }
}"""

In [12]:
url = 'https://genetics-api.opentargets.io/graphql'
r = requests.post(url, json={'query': query})
print(r.status_code)
print(r.text[:100])
json_data = json.loads(r.text)

200
{"data":{"studiesAndLeadVariantsForGeneByL2G":[{"variant":{"id":"2_97131943_T_C","rsId":"rs13390019"


In [13]:
print(type(json_data['data']['studiesAndLeadVariantsForGeneByL2G']))
print(len(json_data['data']['studiesAndLeadVariantsForGeneByL2G']))
json_data['data']['studiesAndLeadVariantsForGeneByL2G'][0]

<class 'list'>
41


{'variant': {'id': '2_97131943_T_C', 'rsId': 'rs13390019'},
 'study': {'studyId': 'GCST008757', 'traitReported': 'Alcohol consumption'}}

In [14]:
pd.DataFrame([{**x['variant'], **x['study']} for x in json_data['data']['studiesAndLeadVariantsForGeneByL2G']]).head(5)

Unnamed: 0,id,rsId,studyId,traitReported
0,2_97131943_T_C,rs13390019,GCST008757,Alcohol consumption
1,2_96510897_C_T,rs2579503,GCST007269,Pulse pressure
2,2_96377542_G_GT,rs561539268,NEALE2_1687,Comparative body size at age 10
3,2_96381261_G_A,rs1081707,GCST007268,Diastolic blood pressure
4,2_96506388_G_A,rs1866444,GCST008413,Core binding factor acute myeloid leukemia
