# MyGeneset.info API versioning and local caching/snapshot

The data behind MyGeneset.info API will be continuously updated to keep everything up-to-date, without breaking data structure changes of course. At any time, the version of our data, including the original data source versions, can be recorded using the `/v1//metadata` API endpoint. While most of our users would appreciate this rolling-update feature of the MyGeneset.info API, however, we understand that certain scenarios require exact same results when the same API calls were requested at different times. This versioning requirement can be addressed by using the [biothings_client](https://pypi.org/project/biothings-client/) python client to cache API call responses locally. `biothings_client` is an universal Python client for all [BioThings APIs](https://biothings.io) (including this MyGeneset.info API).

The example code below demonstrates the retrieval of the API metadata, including versions and more, and the use of local caching enabled by the `biothings_client` package.

## Get metadata via direct API call without `biothings_client` package

The example code below uses popular [requests](https://requests.readthedocs.io) package to make API calls. `requests` can be easily installed as:

```bash
pip install requests
```

In [1]:
## import necessary packages first
from pprint import pprint
import requests

In [2]:
## Get current build information without the biothings_client Python pacakge
r = requests.get("https://mygeneset.info/v1/metadata/")
data = r.json()
version_date = data["build_date"]
pprint(version_date)

'2023-03-17T12:18:13.773000-07:00'


In [3]:
# get stats of the current MyGeneset.info data build
pprint(data["stats"])

{'anonymous': 2, 'curated': 297796, 'total': 297798, 'user': 2}


In [4]:
# get underlying data source versions of the current MyGeneset.info data build
pprint(dict([(k, v["version"]) for k, v in data["src"].items()]))

{'ctd': 'January-10-2023-16979M',
 'do': 'obo-2023-02-27_genemap2-2023-03-16',
 'go': '20230306',
 'msigdb': '2022.1',
 'reactome': '83',
 'smpdb': '05-06-2019',
 'wikipathways': '20230310'}


## Make cached API calls using `biothings_client` pacakge

The example code below uses [biothings_client](https://pypi.org/project/biothings-client/) package to make API calls. `biothings_client` can be easily installed as:

```bash
pip install "biothings_client[caching]"
```

**Note:** including `[caching]` above will install an optional `requests_cache` package to enable the local caching feature.

In [5]:
## Using the universal biothings_client Python client to interact with MyGeneset.info
import biothings_client
mgs_client = biothings_client.get_client("geneset")

In [6]:
## enable the local cache for future API calls, can be any name
local_cache = "mgs_cache"
mgs_client.set_caching(local_cache)

INFO:biothings.client:[ Future queries will be cached in "/home/cwu/prj2/mygeneset.info/docs/ipynb/mgs_cache.sqlite" ]


In [7]:
## Once enabled, any API calls made (like the example call below) will be cached
res = mgs_client.querymany(["wnt", "jak-stat"], fields="name,count,source,taxid")

INFO:biothings.client:querying 1-2...
INFO:biothings.client:done. [ from cache ]
INFO:biothings.client:Finished.
INFO:biothings.client:Pass "returnall=True" to return complete lists of duplicate or missing query terms.


In [8]:
## Metadata about the MyGeneset.info API (such as versions, stats, build info) can also be made via the client
mgs_metadata = mgs_client.metadata()

pprint(mgs_metadata["build_date"])
pprint(mgs_metadata["stats"])
pprint(dict([(k, v["version"]) for k, v in mgs_metadata["src"].items()]))

INFO:biothings.client:[ from cache ]


'2023-03-17T12:18:13.773000-07:00'
{'anonymous': 2, 'curated': 297796, 'total': 297798, 'user': 2}
{'ctd': 'January-10-2023-16979M',
 'do': 'obo-2023-02-27_genemap2-2023-03-16',
 'go': '20230306',
 'msigdb': '2022.1',
 'reactome': '83',
 'smpdb': '05-06-2019',
 'wikipathways': '20230310'}


In [9]:
## Make exactly the same API calls as above, but this time they will be retrieved from the local cache
res_cached = mgs_client.querymany(["wnt", "jak-stat"], fields="name,count,source,taxid")
mgs_metadata_cached = mgs_client.metadata()
print("The two API calls above are identical: {}".format(res == res_cached))
print("The two metadata calls above are identical: {}".format(mgs_metadata == mgs_metadata_cached))

INFO:biothings.client:querying 1-2...
INFO:biothings.client:done. [ from cache ]
INFO:biothings.client:Finished.
INFO:biothings.client:Pass "returnall=True" to return complete lists of duplicate or missing query terms.
INFO:biothings.client:[ from cache ]


The two API calls above are identical: True
The two metadata calls above are identical: True


In [10]:
## You can stop the caching at the end or anytime you want
mgs_client.stop_caching()

**Note:** The local cache file is stored as `mgs_cache.sqlite` file in this case, if you copy the cache file along with the Python code here, the above code using `biothings_client` will always return the exactly same results.