# Python TERMite toolkit - DOCstore

This notebook walks you through how to make calls to the DOCstore API
and some of the post-processing of the JSON output.

Import the required sublibrary

In [1]:
from termite_toolkit import docstore

Point to a docstore server and then fill in the authentication details:

In [2]:
docstore_url = 'https://datascience.scibite.com:9090'
user = 'scibite_admin'
pw= 'weP3vw4ho9ihJka'

# Document-level query
We can make a document-level query of docstore. In this example we print the docstore id of the first hit for a query on genes HTT and EGFR

In [3]:
docs = docstore.DocStoreRequestBuilder()
# specify docstore API endpoint and add authentication
docs.set_url(docstore_url)
docs.set_basic_auth(username=user, password=pw)
# make call to DOCStore Document-level query API
docs_json = docs.get_docs(['id:GENE$HTT', 'id:GENE$EGFR'])
# print unique id of the first hit
uid = docs_json['hits'][0]['uid']
print (uid)



medline_*_*_*_32702387_1


# Retrieve a specific document
We can also use the document lookup by ID to retrieve data for a specific document based on its ID. For the purposes of this demo we use the ID from the previous query. The output of the script below are the authors of this document

In [4]:
docs = docstore.DocStoreRequestBuilder()
# specify docstore API endpoint and add authentication if necessary
docs.set_url(docstore_url)
docs.set_basic_auth(username=user, password=pw)
# make call to document lookup by ID API (using the uid of the previous query)
docs_jon = docs.get_doc_by_id(uid)
# print the authors of the document hit
print (docs_json['hits'][0]['authors'])

Rolfes, S; Munro, DAD; Lyras, EM; Matute, E; Ouk, K; Harms, C; Böttcher, C; Priller, J




# Document co-occurrence dataframe
The script below looks for the occurence of two entities in the same document. While you can retrieve the output in raw json format, the toolkit also enables to produce a dataframe from it.

In [5]:
docs = docstore.DocStoreRequestBuilder()
# specify docstore API endpoint and add authentication if necessary
docs.set_url(docstore_url)
docs.set_basic_auth(username=user, password=pw)
# make call to DOCStore Document co-occurence API
docs_json = docs.get_dcc_docs(['id:GENE$HTT', 'id:GENE$EGFR'])
# convert json to df
df = docstore.get_docstore_dcc_df(docs_json)
# print titles of hits
df



Unnamed: 0,document_id,document_date,title,authors,citation
0,25050814,2014-01-01,A unique four-hub protein cluster associates t...,"Simeone, P; Trerotola, M; Urbanella, A; Lattan...",PloS one [9] e103030
1,22974559,2013-01-01,Mutant huntingtin regulates EGF receptor fate ...,"Melone, MA; Calarco, A; Petillo, O; Margarucci...",Biochimica et biophysica acta [1832] 105-13
2,24116161,2013-01-01,Grb2 is regulated by foxd3 and has roles in pr...,"Baksi, S; Jana, NR; Bhattacharyya, NP; Mukhopa...",PloS one [8] e76792
3,8205025,2012-01-03,Huntington Interacting Protein-1( HIP1 ) and t...,"ROSS, THEODORA S",
4,20501595,2010-06-01,Paired and LIM class homeodomain proteins coor...,"Van Buskirk, C; Sternberg, PW","Development (Cambridge, England) [137] 2065-74"
5,7756581,2009-12-14,Huntington Interacting Protein-1( HIP1 ) and t...,"ROSS, THEODORA S",
6,7915876,2009-09-03,Huntington Interacting Protein-1( HIP1 ) and t...,"ROSS, THEODORA S",
7,7556753,2009-01-08,Huntington Interacting Protein-1( HIP1 ) and t...,"ROSS, THEODORA S",
8,17937893,2007-10-01,[ Astrocytes in Huntington's chorea : accompli...,"Liévens, JC; Birman, S",Medecine sciences : M/S [23] 845-9
9,17550941,2007-08-01,Intersectin enhances huntingtin aggregation an...,"Scappini, E; Koh, TW; Martin, NP; O'Bryan, JP",Human molecular genetics [16] 1862-71


# Sentence co-occurrence

In this case we're looking to find documents where the two entities are mentioned in the same sentence. The output of the script below is a dataframe with one co-occurence sentence per row.

In [6]:
docs = docstore.DocStoreRequestBuilder()
# specify docstore API endpoint and add authentication if necessary
docs.set_url(docstore_url)
docs.set_basic_auth(username=user, password=pw)
# make call to DOCStore sentence co-occurence API
docs_json = docs.get_scc_docs(['id:GENE$HTT', 'id:GENE$EGFR'])
# convert json to df
df = docstore.get_docstore_scc_df(docs_json)
# print doc_ids of hits
df



Unnamed: 0,document_id,document_date,scc_sentence
0,22974559,2013-01-01,Mutant huntingtin regulates EGF receptor fate ...
1,22974559,2013-01-01,We found that polyQ-htt controls EGFR degradat...
2,24116161,2013-01-01,"Grb2 is also known to interact with Htt, depen..."
3,9079622,1997-03-28,SH3 domain-dependent association of huntingtin...


# Entity lookup
There is also an API call to lookup entity synonyms. The output gives information such as entity id, type and names. Some words may have more than one synonym. In those cases the json output will include all the possible options.

In [7]:
docs = docstore.DocStoreRequestBuilder()
# specify docstore API endpoint and add authentication if necessary
docs.set_url(docstore_url)
docs.set_basic_auth(username=user, password=pw)
# returns json with list of synonyms and their IDs
synonym = 'hedgehog'
entity_type = 'GENE'
print(docs.entity_lookup_id(synonym,entity_type))

{'ids': [{'id': 'SHH', 'type': 'GENE', 'name': 'sonic hedgehog'}]}


