# Python TERMite toolkit - TExpress

We provide a Python library for making calls to our NER engine, TERMite, as well as the TExpress module for defining more complex semantic patterns. The library also enables post-processing of the JSON returned from such requests. This notebook assumes that you're read the example TERMite notebook and walks you through how to make a TExpress call and some of the post-processing of the JSON output.

## Example call to TExpress

The toolkit can also be used to make TExpress calls to identify patterns and extract biomedical relationships. Using TExpress with the toolkit is easy: simply ```import texpress``` from the ```termite_toolkit``` and make a call.

A simple TExpress call is made up of:
* the TERMite API endpoint
* the pattern you wish to search for - this can be created in the TERMite UI
* a TExpress request
* request execution

Below is an example TExpress call with the result being printed to the screen.

In [1]:
from pprint import pprint
from termite_toolkit import texpress

# specify termite API endpoint
termite_home = "http://localhost:9090/termite"

# specify the pattern you wish to search for- this can created in the TERMite UI
pattern = ":(INDICATION):{0,5}:(GENE)"

t = texpress.TexpressRequestBuilder()

# individually add items to your TERMite request
t.set_url(termite_home)
t.set_text("sildenafil citrate macrophage colony stimulating factor influenza")
t.set_subsume(True)
t.set_input_format("txt")
t.set_output_format("json")
t.set_allow_ambiguous(False)
t.set_pattern(pattern)

# execute the request
texpress_response = t.execute(display_request=False)

pprint(texpress_response, depth=2, compact=True, width=100)

{'RESP_META': {'CONID': '0:0:0:0:0:0:0:1/98',
               'HTTP_CODE': '200',
               'INPUT_SIZE': 65,
               'REQID': 'c7e140a4-c606-4cce-b117-3048565ce9e9-3120',
               'RUNTIME_OPTIONS': {...},
               'TERMITE_RUNTIME': 'default',
               'TERMITE_VERS': '6.4.9',
               'Timing_msec_TOTAL': '1',
               '_READY_FORMATTED_WITH': 'com.scibite.termitej.formatter.streamers.JsonStreamFormatter'},
 'RESP_MULTIDOC_PAYLOAD': {},
 'RESP_TEXPRESS': {'_document': {...}},
 'RESP_WORKFLOW': {}}


For more information on the TExpress JSON results [click here](https://help.scibite.com/a/solutions/articles/4000021813-anatomy-of-a-texpress-result-server-).

Like TERMite, TExpress calls can be simplified to call options and annotation:


In [3]:
from pprint import pprint
from termite_toolkit import texpress
import sys
import os

termite_home = "http://localhost:9090/termite"
parentDir = os.path.dirname(os.path.dirname(os.path.abspath("__file__")))  # this line relatively locates the parent directory
input_file = os.path.join(parentDir, 'sample_scripts/medline_sample.zip')
options = {"format": "medline.xml", "output": "json", "pattern": ":(INDICATION):{0,5}:(GENE)",
           "opts"  : "reverse=false"}

texpress_json_response = texpress.annotate_files(termite_home, input_file, options)

# TExpress toolkit library

The standard JSON output isn't very human friendly, so we've added functionality for parsing the JSON and doc.JSONx outputs. The output can be returned as either a dictionary object or as a pandas dataframe.

In [4]:
pprint(texpress.get_entity_hits_from_json(texpress_response))

{'USR_1[R]': [{'conf': 3,
               'doc_id': '_document',
               'entities': ['GENE#CSF1#colony stimulating factor 1',
                            'INDICATION#D007251#Influenza, Human'],
               'original_fragment': 'macrophage colony stimulating factor '
                                    'influenza'}]}


In [6]:
texpress.get_texpress_dataframe(texpress_json_response).head()

Unnamed: 0,docID,patternID,originalFragment,matchEntities,originalSentence,sentence,subsumed
0,26351389,USR_2,inflammatory response measured by C-reactive p...,"[INDICATION#D007249, GENE#CRP]",This retrospective study aims to compare the i...,2,False
1,26351387,USR_2,"factor (p = 0.029), especially in Crohn's disease","[GENE#CFP, INDICATION#D003424]",Only smoking was identified as a risk factor (...,15,False
2,26351381,USR_2,"inflammatory bowel disease, NGAL-MMP-9","[INDICATION#D015212, GENE#MMP9]",In the search for surrogate markers to assess ...,10,False
3,26351381,USR_2,MMP-9 supplements and outperforms CRP in both ...,"[GENE#MMP9, GENE#CRP, INDICATION#D003093]",In the search for surrogate markers to assess ...,10,False
4,26609182,USR_2,CXCL8 SPECT to monitor disease activity in inf...,"[GENE#CXCL8, INDICATION#D015212]",Tc-99m-CXCL8 SPECT to monitor disease activity...,0,False
