# Demo of Utilizing New GraphQL API for SNP Data

## Setting Up API Connection

The script begins by setting up a connection to a GraphQL API, specifying the base URL and the endpoint.

In [1]:
import requests
import json
import pandas as pd
from config.settings import settings

BASE_URL = settings.API_URL
GRAPHQL_ENDPOINT = 'graphql'

## Understanding Annotations in the API

It executes a GET request to retrieve a list of annotations. These annotations describe various data fields available through the API, detailing their characteristics such as name, description, and how they relate to each other in a hierarchical manner, much like a structured catalog of options you can query. 

**api_field:** Specifies the field name as it should be used in API requests, particularly when crafting queries for a GraphQL API. This ensures you're asking for data in a format the API understands.

In [2]:
response = requests.get(f"{BASE_URL}annotations")

annotations = response.json()
annotations_df = pd.DataFrame(annotations['results'])
annotations_df

Unnamed: 0,id,leaf,name,label,sort,parent_id,detail,link,pmid,field_type,keyword_searchable,api_field,root_url,sample_url,value_type
0,0,False,root,Annotation,0.0,,,,,,,,,,
1,1,False,Basic Info,,1.0,0,"Basic information about the variant, such as c...",,,,,,,,
2,26,False,ANNOVAR,,2.0,0,Pre-computed ANNOVAR annotations for all alter...,http://annovar.openbioinformatics.org/en/lates...,20601685,,,,,,
3,208,False,SnpEff,,3.0,0,AnpEff is a program for annotating and predict...,http://pcingola.github.io/SnpEff/,22728672,,,,,,
4,132,False,VEP,,4.0,0,Variant Effect Predictor (VEP) is developed by...,https://uswest.ensembl.org/info/docs/tools/vep...,27268795,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
541,621,True,sno_miRNA_type,,,495,the type of snoRNA or miRNA (from miRBase/snoR...,,,text,True,sno_miRNA_type,,,
542,622,True,splicing_consensus_ada_score,,,495,splicing-change prediction for splicing consen...,,,float,,splicing_consensus_ada_score,,,
543,623,True,splicing_consensus_rf_score,,,495,splicing-change prediction for splicing consen...,,,float,,splicing_consensus_rf_score,,,
544,624,True,target_gene,,,495,"target gene (for promoter, enhancer, etc.) bas...",,,text,True,target_gene,,,


## Extracting SNP Data Through a GraphQL Query

The script continues by constructing a GraphQL query designed to fetch specific information about Single Nucleotide Polymorphisms (SNPs) based on criteria such as chromosome number and position range. This query illustrates GraphQL's capability to precisely target and retrieve the needed data from the server, thus optimizing the data acquisition process. The response from this query provides detailed attributes of SNPs for subsequent processing or analysis.

In [7]:
query = """
query MyQuery {
  GetSNPsByChromosome(chr: "1", end: 1000000, start: 10) {
    alt 
    chr 
    pos
    rs_dbSNP151
    ref
    ANNOVAR_ensembl_Effect 
    ANNOVAR_refseq_Effect
  }
}
"""

response = requests.post(f"{BASE_URL}{GRAPHQL_ENDPOINT}", json={'query': query})

data = json.loads(response.text)
snps_by_chromosome = data['data']['GetSNPsByChromosome']

[{'alt': 'A',
  'chr': '1',
  'pos': 54353,
  'rs_dbSNP151': 'rs140052487',
  'ref': 'C',
  'ANNOVAR_ensembl_Effect': 'ncRNA_intronic|downstream',
  'ANNOVAR_refseq_Effect': 'intergenic'},
 {'alt': 'G',
  'chr': '1',
  'pos': 54763,
  'rs_dbSNP151': 'rs548455890',
  'ref': 'T',
  'ANNOVAR_ensembl_Effect': 'ncRNA_intronic|downstream',
  'ANNOVAR_refseq_Effect': 'intergenic'},
 {'alt': 'C',
  'chr': '1',
  'pos': 55427,
  'rs_dbSNP151': 'rs183189405',
  'ref': 'T',
  'ANNOVAR_ensembl_Effect': 'downstream',
  'ANNOVAR_refseq_Effect': 'intergenic'},
 {'alt': 'A',
  'chr': '1',
  'pos': 56586,
  'rs_dbSNP151': 'rs541979596',
  'ref': 'G',
  'ANNOVAR_ensembl_Effect': 'downstream',
  'ANNOVAR_refseq_Effect': 'intergenic'},
 {'alt': 'C',
  'chr': '1',
  'pos': 56644,
  'rs_dbSNP151': 'rs143342222',
  'ref': 'A',
  'ANNOVAR_ensembl_Effect': 'downstream',
  'ANNOVAR_refseq_Effect': 'intergenic'},
 {'alt': 'C',
  'chr': '1',
  'pos': 57033,
  'rs_dbSNP151': 'rs2691311',
  'ref': 'T',
  'ANNOVAR_e

## Processing and Displaying the Data

After receiving data from the GraphQL query, the script processes it for analysis. This involves flattening the nested structure of the data response to a more straightforward, table-like format.

In [6]:
flattened_data = [{k: v for k, v in record.items()} for record in snps_by_chromosome]
flattened_data
snp_df = pd.DataFrame(flattened_data)
snp_df

Unnamed: 0,alt,chr,pos,rs_dbSNP151,ref,ANNOVAR_ensembl_Effect,ANNOVAR_refseq_Effect
0,A,1,54353,rs140052487,C,ncRNA_intronic|downstream,intergenic
1,G,1,54763,rs548455890,T,ncRNA_intronic|downstream,intergenic
2,C,1,55427,rs183189405,T,downstream,intergenic
3,A,1,56586,rs541979596,G,downstream,intergenic
4,C,1,56644,rs143342222,A,downstream,intergenic
5,C,1,57033,rs2691311,T,downstream,intergenic
6,C,1,62055,rs559425327,G,upstream,intergenic
7,A,1,62162,rs140556834,G,upstream,intergenic
8,G,1,64670,rs545257650,A,upstream|downstream,upstream
9,G,1,64904,rs1452689085,T,upstream|downstream,upstream
