# 1. Making Data-structure Neutral Queries by URI


##### Use Case Scenario1 : Query for the OMIM ID for a Variant Using myvariant.info

#### Approach 1: Using Myvariant.info

Users could retrieve the OMIM ID through myvariant.info; however, it requires the user to understand how OMIM ID is embedded in the data structure of MyVariant.info.

In [1]:
# import myvariant python package
import myvariant
mv = myvariant.MyVariantInfo()

In [2]:
# Fetch info about OMIM ID
mv.getvariant('chr9:g.135781006_135781007del', fields='clinvar.rcv.conditions.identifiers.omim')

{'_id': 'chr9:g.135781006_135781007del',
 '_version': 1,
 'clinvar': {'rcv': {'conditions': {'identifiers': {'omim': '109800'}}}}}

#### Approach 2: Using JSON-LD powered neutral query function

By utilizing JSON-LD in making queries, the process would be simplified significantly. Users only need to know the URI for OMIM ID, which is unique for any API. And it saves users significant amount of time in order to figure out the data structure for each API.

In [3]:
'''
import get_biothings function which is built utilizing JSON-LD technology
This function could be used to make neutral query for all BioThings APIs,
e.g MyGene.info, MyVariant.info, Drug and compound API
'''
from biothings_helper import get_biothings

In [4]:
# Fetch info about OMIM ID using URI for OMIM ID, which is 'http://identifiers.org/omim/'
get_biothings(api='myvariant.info',id='chr9:g.135781006_135781007del', fields_uri='http://identifiers.org/omim/')

{'_id': 'chr9:g.135781006_135781007del',
 '_version': 1,
 'clinvar': {'rcv': {'conditions': {'identifiers': {'omim': '109800'}}}}}

##### Use Case Scenario2 : query for all variants in MyVariant.info related to gene CDK2

#### Approach 1: Using MyVariant.info

The traditional approach would first require the user to figure out which data source in MyVariant.info includes annotation information about gene names. Additionally, the user have to understand how gene name is represented and embedded in each of the data source.

In [13]:
# query for all variants in MyVariant.info related to gene CDK2 using myvariant python package
mv.query('clinvar.gene.symbol:CDK2 OR \
          docm.genename:CDK2 OR \
          snpeff.nmd.genename:CDK2 OR \
          gwassnps.genename:CDK2 OR \
          dbnsfp.genename:CDK2 OR \
          docm.default_gene_name:CDK2 OR \
          snpeff.ann.genename:CDK2 OR \
          snpeff.lof.genename:CDK2 OR \
          emv.gene.symbol:CDK2 OR \
          dbsnp.gene.symbol:CDK2 OR \
          evs.gene.symbol:CDK2 OR \
          cadd.gene.genename:CDK2', fetch_all=False)

{'hits': [{'_id': 'chr12:g.56361693G>T',
   '_score': 8.73773,
   'cadd': {'_license': 'http://goo.gl/bkpNhq',
    'alt': 'T',
    'anc': 'G',
    'annotype': ['CodingTranscript', 'Intergenic', 'Intergenic'],
    'bstatistic': 251,
    'chmm': {'bivflnk': 0.0,
     'enh': 0.024,
     'enhbiv': 0.0,
     'het': 0.0,
     'quies': 0.039,
     'reprpc': 0.0,
     'reprpcwk': 0.0,
     'tssa': 0.142,
     'tssaflnk': 0.291,
     'tssbiv': 0.0,
     'tx': 0.0,
     'txflnk': 0.063,
     'txwk': 0.181,
     'znfrpts': 0.0},
    'chrom': 12,
    'consdetail': ['stop_gained', 'upstream', 'upstream'],
    'consequence': ['STOP_GAINED', 'UPSTREAM', 'UPSTREAM'],
    'consscore': [8, 1, 1],
    'cpg': 0.03,
    'dna': {'helt': 0.43, 'mgw': 0.6, 'prot': -1.24, 'roll': 6.36},
    'encode': {'exp': 521.18,
     'h3k27ac': 20.96,
     'h3k4me1': 29.52,
     'h3k4me3': 55.04,
     'nucleo': 2.1},
    'exon': '2/7',
    'fitcons': 0.623825,
    'gc': 0.48,
    'gene': [{'ccds_id': 'CCDS8898.1',
      'c

#### Approach 2: Using JSON-LD powered neutral query function

In [7]:
from biothings_helper import query_biothings

In [10]:
# Query for a gene symbol CDK2 using URI for gene symbol, which is 'http://identifiers.org/hgnc.symbol/'
query_biothings(api='myvariant.info', fields_uri='http://identifiers.org/hgnc.symbol/', fields_value='CDK2', fetch_all=False)

clinvar.gene.symbol:CDK2 OR docm.genename:CDK2 OR snpeff.nmd.genename:CDK2 OR gwassnps.genename:CDK2 OR dbnsfp.genename:CDK2 OR docm.default_gene_name:CDK2 OR snpeff.ann.genename:CDK2 OR snpeff.lof.genename:CDK2 OR emv.gene.symbol:CDK2 OR dbsnp.gene.symbol:CDK2 OR evs.gene.symbol:CDK2 OR cadd.gene.genename:CDK2
None


{'hits': [{'_id': 'chr12:g.56361693G>T',
   '_score': 8.73773,
   'cadd': {'_license': 'http://goo.gl/bkpNhq',
    'alt': 'T',
    'anc': 'G',
    'annotype': ['CodingTranscript', 'Intergenic', 'Intergenic'],
    'bstatistic': 251,
    'chmm': {'bivflnk': 0.0,
     'enh': 0.024,
     'enhbiv': 0.0,
     'het': 0.0,
     'quies': 0.039,
     'reprpc': 0.0,
     'reprpcwk': 0.0,
     'tssa': 0.142,
     'tssaflnk': 0.291,
     'tssbiv': 0.0,
     'tx': 0.0,
     'txflnk': 0.063,
     'txwk': 0.181,
     'znfrpts': 0.0},
    'chrom': 12,
    'consdetail': ['stop_gained', 'upstream', 'upstream'],
    'consequence': ['STOP_GAINED', 'UPSTREAM', 'UPSTREAM'],
    'consscore': [8, 1, 1],
    'cpg': 0.03,
    'dna': {'helt': 0.43, 'mgw': 0.6, 'prot': -1.24, 'roll': 6.36},
    'encode': {'exp': 521.18,
     'h3k27ac': 20.96,
     'h3k4me1': 29.52,
     'h3k4me3': 55.04,
     'nucleo': 2.1},
    'exon': '2/7',
    'fitcons': 0.623825,
    'gc': 0.48,
    'gene': [{'ccds_id': 'CCDS8898.1',
      'c

# 2. Facilitate API Cross Linking


##### Use Case Scenario: Upstream analysis identified a missense variant (chr8:g.99440236C>A). The analyst wants to obtain the matching InterPro data in order to assess the likely functional significance of this variant.

#### Step 1: Query MyVariant.info to retrieve the annotation object for variant chr8:g.99440236C>A


In [15]:
from biothings_helper import Biothingsexplorer
explorer = Biothingsexplorer()
json_doc = explorer.get_json_doc('myvariant.info', 'chr8:g.99440236C>A')

In [16]:
# json doc with json-ld context added
json_doc

{'@context': {'cadd.gene.ccds_id': 'http://identifiers.org/ccds/',
  'cadd.gene.feature_id': 'http://identifiers.org/ensembl.transcript/',
  'cadd.gene.gene_id': 'http://identifiers.org/ensembl.gene/',
  'cadd.gene.genename': 'http://identifiers.org/hgnc.symbol/',
  'clinvar.gene.id': 'http://identifiers.org/hgnc/',
  'clinvar.gene.symbol': 'http://identifiers.org/hgnc.symbol/',
  'clinvar.omim': 'http://identifiers.org/omim/',
  'clinvar.rcv.accession': 'http://identifers.org/clinvar/',
  'clinvar.rcv.conditions.identifiers.efo': 'http://identifiers.org/efo/',
  'clinvar.rcv.conditions.identifiers.omim': 'http://identifiers.org/omim/',
  'clinvar.rcv.conditions.identifiers.orphanet': 'http://identifiers.org/orphanet/',
  'clinvar.rsid': 'http://identifiers.org/dbsnp/',
  'clinvar.uniprot': 'http://identifiers.org/uniprot/',
  'dbnsfp.clinvar.rs': 'http://identifiers.org/dbsnp/',
  'dbnsfp.ensembl.geneid': 'http://identifiers.org/ensembl.gene/',
  'dbnsfp.ensembl.proteinid': 'http://id

#### Step 2: List all available apis linked from this variant

In [17]:
explorer.find_linked_apis()

Available APIs which could be linked out is: {'mygene.info': 'ENSG00000156486'}


#### Step 3: Link to mygene.info and get interpro ID

In [19]:
explorer.explore_api('mygene.info', fields_uri='http://identifiers.org/interpro/')

{'_id': '3788',
 '_score': 22.792553,
 'interpro': [{'id': 'IPR011333'},
  {'id': 'IPR000210'},
  {'id': 'IPR005821'},
  {'id': 'IPR027359'},
  {'id': 'IPR003968'},
  {'id': 'IPR028325'},
  {'id': 'IPR003131'},
  {'id': 'IPR003971'}]}