# Use MyGeneset.info API in analytic workflows

The code example below illustrates the use of [MyGeneset.info API](https://mygeneset.info) for simplifying functional gene set analytical workflows. Users can search for gene sets by keywords (e.g. “impaired glucose tolerance”) and retrieve basic information about the gene set including the name, the NCBI taxon ID of the gene set, common identifiers of gene members etc. The expanded gene-level annotations can be further obtained from the companion [MyGene.info API](https://mygene.info).


The example code below uses popular [requests](https://requests.readthedocs.io) package to make API calls. `requests` can be easily installed as:

```bash
pip install requests
```

In [1]:
## import necessary packages first
from pprint import pprint
import requests

In [2]:
## Fetch gene sets involved in “impaired glucose tolerance”, include the taxon id of the gene set itself as well as the name, Uniprot ID, and mygene.info identifier for each gene in the gene set.
r = requests.get('https://mygeneset.info/v1/query/?q="impaired glucose tolerance"&fields=taxid,genes.name,genes.uniprot,genes.mygene_id')
data = r.json()
pprint(data)

{'hits': [{'_id': 'HP_IMPAIRED_GLUCOSE_TOLERANCE',
           '_score': 38.72672,
           'genes': [{'mygene_id': '10219',
                      'name': 'killer cell lectin like receptor G1',
                      'uniprot': 'Q96E93'},
                     {'mygene_id': '389692',
                      'name': 'MAF bZIP transcription factor A',
                      'uniprot': 'Q8NHW3'},
                     {'mygene_id': '7466',
                      'name': 'wolframin ER transmembrane glycoprotein',
                      'uniprot': 'O76024'},
                     {'mygene_id': '286053',
                      'name': 'NSE2 (MMS21) homolog, SMC5-SMC6 complex SUMO '
                              'ligase',
                      'uniprot': 'Q96MF7'},
                     {'mygene_id': '3643',
                      'name': 'insulin receptor',
                      'uniprot': 'P06213'},
                     {'mygene_id': '8462',
                      'name': 'KLF transcription factor 11',

In [3]:
## For each gene in a gene set, basic identifiers can be retrieved (such as UniProt, NCBI Gene, and others discussed in manuscript) which can be used in compatible downstream applications or to retrieve additional information from compatible resources
first_hit = data["hits"][0]
uniprot_ids = [x.get("uniprot", "") for x in first_hit["genes"]]
# do something with the list of uniprot id
pprint(uniprot_ids[:10])



['Q96E93',
 'Q8NHW3',
 'O76024',
 'Q96MF7',
 'P06213',
 'O14901',
 'P13866',
 'Q03135',
 'Q12851',
 'P35557']


In [4]:
## To fetch additional gene information beyond what is stored in MyGeneset.info, we can use the MyGene.info identifiers to leverage the power of MyGene.info. In this manner, we can do a post request to batch fetch the GO annotations for each gene. Other types of available gene-centric annotations and limitations when using MyGene.info can be found in the documentation for the service: https://docs.mygene.info/en/latest/doc/data.html#available-fields)
mygene_ids = [x["mygene_id"] for x in first_hit["genes"]]
querylist = ",".join(mygene_ids)
params = {"q": querylist, "fields": "name,go"}
res = requests.post("http://mygene.info/v3/query", data=params)
res = res.json()
# do something with the expanded annotations of the gene list
pprint(res[0])


{'_id': '10219',
 '_score': 25.9798,
 'go': {'BP': [{'evidence': 'TAS',
                'gocategory': 'BP',
                'id': 'GO:0006954',
                'pubmed': 9842918,
                'qualifier': 'involved_in',
                'term': 'inflammatory response'},
               {'evidence': 'TAS',
                'gocategory': 'BP',
                'id': 'GO:0006968',
                'pubmed': 9842918,
                'qualifier': 'involved_in',
                'term': 'cellular defense response'},
               {'evidence': 'TAS',
                'gocategory': 'BP',
                'id': 'GO:0007166',
                'pubmed': 9842918,
                'qualifier': 'involved_in',
                'term': 'cell surface receptor signaling pathway'},
               {'evidence': 'IEA',
                'gocategory': 'BP',
                'id': 'GO:0045087',
                'qualifier': 'involved_in',
                'term': 'innate immune response'}],
        'CC': [{'evidence': 'I