# This is a notebook demo for filter functionality in the Predict class

### There are four types of filters supported so far: 
1. NodeDegree
2. EdgeLabel
3. Co-occurrence
4. UniqueAPIs

**Each filter returns target objects labeled with their rankings and the filter used. The filters are called from FindConnection, which takes the "filters" parameter as a list of dictionaries whose length MUST be equal to the number of steps in the query process (ie. if you are going directly from input to output, then you would have one filter; if you have one intermediate node, you'd have a filter parameter with a length of 2).  Empty dictionaries may be used as a placeholder if you wish to not have a filter applied at that step. (examples at the end). If not empty, each dictionary must specify the "name" key for the name of the filter desired, and the optional "count" key which represents the number of target objects returned, and defaults to 50 if not specified.**
### Each one will be shown separately, first we import Hint and FindConnection, which calls the Predict class

In [1]:
from biothings_explorer.hint import Hint
from biothings_explorer.user_query_dispatcher import FindConnection

For this demo, we will use the same input and output object for demonstration purposes of the filters

In [2]:
ht = Hint()
input_obj = ht.query('CDK8')['Gene'][0]
output_obj = 'ChemicalSubstance'

## Filter 1: NodeDegree
This filter takes in the graph returned by BTE and returns the target nodes which have the highest degrees in that graph (number of edges connected to it) and ranks them.  

*(note that this filter could potentially produce misleading results, as some concepts that are simply more popular, rather than more relevant, and thus would appear in the graph with more edges and thus have a higher ranking)*

In [3]:
filt = [{'name': 'NodeDegree'}]
fc = FindConnection(input_obj=input_obj,
                    output_obj=output_obj,
                    intermediate_nodes=[],
                    filters=filt)
fc.connect()

The input object is not labeled with a ranking:

In [4]:
fc.fc.G.nodes(data=True)['CDK8']

{'type': 'Gene',
 'identifier': 'SYMBOL',
 'level': 1,
 'equivalent_ids': defaultdict(set,
             {'NCBIGene': ['1024'],
              'name': ['CYCLIN DEPENDENT KINASE 8'],
              'SYMBOL': ['CDK8'],
              'UMLS': ['C1413289'],
              'HGNC': ['1779'],
              'UNIPROTKB': ['P49336'],
              'ENSEMBL': ['ENSG00000132964']})}

Because "count" wasn't specified, it defaulted to returning 50 target nodes ranked from 1-50:

In [5]:
ranks = []
for i,node in fc.fc.G.nodes(data=True):
    if i != input_obj['SYMBOL']:
        ranks.append([node['rank'], i])
ranks.sort()
ranks

[[1, 'RONICICLIB'],
 [2, 'AT-7519'],
 [3, 'AP24534'],
 [4, 'SORAFENIB'],
 [5, 'PHA-793887'],
 [6, 'AZD-5438'],
 [7, 'ALVOCIDIB'],
 [8, 'PHOSPHORYL'],
 [9, 'MAGNESIUM'],
 [10, 'LINIFANIB'],
 [11, 'ISOCHAMAEJASMIN'],
 [12, 'DINOPROSTONE'],
 [13, 'CORTISTATIN A'],
 [14, 'CHEMBL3828689'],
 [15, 'CHEMBL3828637'],
 [16, 'CHEMBL3828575'],
 [17, 'CHEMBL3828572'],
 [18, 'CHEMBL3828553'],
 [19, 'CHEMBL3828523'],
 [20, 'CHEMBL3828503'],
 [21, 'CHEMBL3828458'],
 [22, 'CHEMBL3828370'],
 [23, 'CHEMBL3828286'],
 [24, 'CHEMBL3828221'],
 [25, 'CHEMBL3828209'],
 [26, 'CHEMBL3828120'],
 [27, 'CHEMBL3828116'],
 [28, 'CHEMBL3828084'],
 [29, 'CHEMBL3828071'],
 [30, 'CHEMBL3828003'],
 [31, 'CHEMBL3827983'],
 [32, 'CHEMBL3827944'],
 [33, 'CHEMBL3827904'],
 [34, 'CHEMBL3827874'],
 [35, 'CHEMBL3827799'],
 [36, 'CHEMBL3827774'],
 [37, 'CHEMBL3827758'],
 [38, 'CHEMBL3827678'],
 [39, 'CHEMBL3827664'],
 [40, 'CHEMBL3827605'],
 [41, 'CHEMBL3827586'],
 [42, 'CHEMBL3827327'],
 [43, 'CHEMBL3827118'],
 [44, 'CHEMBL38270

## Filter 2: EdgeLabel
This filter requires an extra parameter "label", which is a label OR a list of labels between the source and target that you'd like to be returned (ie. related_to, negatively_regulates, etc.).  It filters out all edges that do not have the labels provided, and ranks those remaining edges based on their NodeDegree (similar to filter 1)

In [6]:
filt = [{'name': 'EdgeLabel', 'label': ['related_to'], 'count': 15}]
fc = FindConnection(input_obj=input_obj,
                    output_obj=output_obj,
                    intermediate_nodes=[],
                    filters=filt)
fc.connect()

Again, the source node is NOT labeled with a ranking

In [7]:
fc.fc.G.nodes(data=True)['CDK8']

{'type': 'Gene',
 'identifier': 'SYMBOL',
 'level': 1,
 'equivalent_ids': defaultdict(set,
             {'NCBIGene': ['1024'],
              'name': ['CYCLIN DEPENDENT KINASE 8'],
              'SYMBOL': ['CDK8'],
              'UMLS': ['C1413289'],
              'HGNC': ['1779'],
              'UNIPROTKB': ['P49336'],
              'ENSEMBL': ['ENSG00000132964']})}

As we can see here, all edges returned have the label 'related_to'

In [8]:
label = set()
for i in fc.fc.G.edges(data=True):
    label.add(i[2]['label'])
label

{'related_to'}

And here we see that all non-source nodes are labeled with a rank from 1-15 since the "count" key was set to 15

In [9]:
ranks = []
for i,node in fc.fc.G.nodes(data=True):
    if i != input_obj['SYMBOL']:
        ranks.append([node['rank'], i])
ranks.sort()
ranks

[[1, 'RONICICLIB'],
 [2, 'AT-7519'],
 [3, 'PHOSPHORYL'],
 [4, 'PHA-793887'],
 [5, 'MAGNESIUM'],
 [6, 'ISOCHAMAEJASMIN'],
 [7, 'DINOPROSTONE'],
 [8, 'CORTISTATIN A'],
 [9, 'CHEMBL3828689'],
 [10, 'CHEMBL3828637'],
 [11, 'CHEMBL3828575'],
 [12, 'CHEMBL3828572'],
 [13, 'CHEMBL3828553'],
 [14, 'CHEMBL3828523'],
 [15, 'CHEMBL3828503']]

## Filter 3: Co-occurrence
This filter makes API calls to the NIH MRCOC co-occurrence database located here: https://biothings.ncats.io/mrcoc. The co-occurrence score in the API is calculated with the [normalized google distance formula](https://en.wikipedia.org/wiki/Normalized_Google_distance), and a lower co-occurrence represents a closer proximity/how often two concepts appear together in the academic literature. This filter returns target nodes labeled with the rank as well as their co-occurrence score with the source node.

All NGD scores should be between 0 and 1; sometimes errors can occur when querying the API due to errors in UMLS and MESH IDs. A NGD score of 100 means that at least one of the nodes did not have a UMLS or MESH ID in the graph object. A NGD score of 200 means that both nodes had an ID, but the pair was not found in the co-occurrence database. For NGDs of 100 and 200, the rankings are arbitrarily assigned and should not be taken as actual rankings.

Here I'm using a different input object because of the error with UMLS IDs associated with the CDK8 Gene.

In [10]:
input_obj = ht.query('D000755')['Disease'][0]
input_obj

{'MONDO': 'MONDO:0011382',
 'DOID': 'DOID:10923',
 'UMLS': 'C0002895',
 'name': 'sickle cell anemia',
 'MESH': 'D000755',
 'OMIM': '603903',
 'ORPHANET': '232',
 'primary': {'identifier': 'MONDO',
  'cls': 'Disease',
  'value': 'MONDO:0011382'},
 'display': 'MONDO(MONDO:0011382) DOID(DOID:10923) OMIM(603903) ORPHANET(232) UMLS(C0002895) MESH(D000755) name(sickle cell anemia)',
 'type': 'Disease'}

In [11]:
filt = [{'name': 'CoOccurrence', 'count': 30}]
fc = FindConnection(input_obj=input_obj,
                    output_obj=output_obj,
                    intermediate_nodes=[],
                    filters=filt)
fc.connect()

API 8.1 pharos failed


We see again that the source node isn't labeled

In [12]:
fc.fc.G.nodes(data=True)['sickle cell anemia']

{'type': 'Disease',
 'identifier': 'name',
 'level': 1,
 'equivalent_ids': defaultdict(set,
             {'MONDO': ['MONDO:0011382'],
              'DOID': ['DOID:10923'],
              'UMLS': ['C0002895'],
              'name': ['ANEMIA, SICKLE CELL', 'SICKLE CELL ANEMIA'],
              'MESH': ['D000755'],
              'OMIM': ['603903'],
              'ORPHANET': ['232']})}

However the target nodes are ranked from 1-30 and are labeled with their ngd_overall score with the source node. As you can see, the lower the ngd score is, the more closely that node is related to the source node, and the rank is higher.

In [13]:
ranks = []
for i,node in fc.fc.G.nodes(data=True):
    if i != 'sickle cell anemia':
        ranks.append([node['rank'], node['ngd_overall'], node['co_occur_with'], i])
ranks.sort()
ranks

[[1, 0.25220838036931303, 'sickle cell anemia', 'AGENTS, ANTISICKLING'],
 [2, 0.2755828689683291, 'sickle cell anemia', 'HYDROXYUREA'],
 [3, 0.45253362434228495, 'sickle cell anemia', 'ISOANTIBODIES'],
 [4, 0.4741015107491786, 'sickle cell anemia', 'DEFERASIROX'],
 [5, 0.4860822793121338, 'sickle cell anemia', 'DEFEROXAMINE'],
 [6, 0.4956393477798813, 'sickle cell anemia', 'PNEUMOCOCCAL VACCINES'],
 [7, 0.4958887012552885, 'sickle cell anemia', 'BILIRUBIN'],
 [8, 0.5070849553658439, 'sickle cell anemia', 'AZACITIDINE'],
 [9, 0.5193080925958051, 'sickle cell anemia', 'ANALGESICS, OPIOID'],
 [10, 0.5193080925958051, 'sickle cell anemia', 'OPIOIDS'],
 [11, 0.5215748216397025, 'sickle cell anemia', 'PHOSPHATIDYLSERINE'],
 [12, 0.5261892846095962, 'sickle cell anemia', 'HEME'],
 [13, 0.5264381290501562, 'sickle cell anemia', 'DEFERIPRONE'],
 [14, 0.5365210410927528, 'sickle cell anemia', 'MEPERIDINE'],
 [15, 0.5462445169123916, 'sickle cell anemia', 'OXYGEN'],
 [16, 0.5500483936795468, 'sic

**Additional Note about this filter:**

In some nodes, the keys 'rank', 'ngd_overall', and 'co_occurs_with' may be list objects. This only occurs when the CoOccurrence filter is used further in the query process (**see last example**); sometimes a single node will be returned twice, each with a different 'co_occurs_with' attribute and different 'ngd_overall' scores. Because the example above is only the first query from a single source node, all returned target nodes have the same 'co_occurs_with' the source node.

## Filter 4: UniqueAPIs
This filter takes the graph and ranks target nodes based on the number of unique APIs supporting the association between it and the source node. The greater the number of APIs supporting the association between the source and target nodes, the higher the ranking.

In [14]:
input_obj = ht.query('CDK8')['Gene'][0]

filt = [{'name': 'UniqueAPIs'}]
fc = FindConnection(input_obj=input_obj,
                    output_obj=output_obj,
                    intermediate_nodes=[],
                    filters=filt)
fc.connect()

As always, source node isn't labeled with anything

In [15]:
fc.fc.G.nodes(data=True)['CDK8']

{'type': 'Gene',
 'identifier': 'SYMBOL',
 'level': 1,
 'equivalent_ids': defaultdict(set,
             {'NCBIGene': ['1024'],
              'name': ['CYCLIN DEPENDENT KINASE 8'],
              'SYMBOL': ['CDK8'],
              'UMLS': ['C1413289'],
              'HGNC': ['1779'],
              'UNIPROTKB': ['P49336'],
              'ENSEMBL': ['ENSG00000132964']})}

The target nodes are labeled with the rankings.  Since there was no "count" specified, it is defaulted to 50 target nodes.

In [16]:
ranks = []
for i,node in fc.fc.G.nodes(data=True):
    if i != 'CDK8':
        ranks.append([node['rank'], i])
ranks.sort()
ranks

[[1, 'AP24534'],
 [2, 'SORAFENIB'],
 [3, 'RONICICLIB'],
 [4, 'PHA-793887'],
 [5, 'AZD-5438'],
 [6, 'AT-7519'],
 [7, 'ALVOCIDIB'],
 [8, 'PHOSPHORYL'],
 [9, 'MAGNESIUM'],
 [10, 'LINIFANIB'],
 [11, 'ISOCHAMAEJASMIN'],
 [12, 'DINOPROSTONE'],
 [13, 'CORTISTATIN A'],
 [14, 'CHEMBL3828689'],
 [15, 'CHEMBL3828637'],
 [16, 'CHEMBL3828575'],
 [17, 'CHEMBL3828572'],
 [18, 'CHEMBL3828553'],
 [19, 'CHEMBL3828523'],
 [20, 'CHEMBL3828503'],
 [21, 'CHEMBL3828458'],
 [22, 'CHEMBL3828370'],
 [23, 'CHEMBL3828286'],
 [24, 'CHEMBL3828221'],
 [25, 'CHEMBL3828209'],
 [26, 'CHEMBL3828120'],
 [27, 'CHEMBL3828116'],
 [28, 'CHEMBL3828084'],
 [29, 'CHEMBL3828071'],
 [30, 'CHEMBL3828003'],
 [31, 'CHEMBL3827983'],
 [32, 'CHEMBL3827944'],
 [33, 'CHEMBL3827904'],
 [34, 'CHEMBL3827874'],
 [35, 'CHEMBL3827799'],
 [36, 'CHEMBL3827774'],
 [37, 'CHEMBL3827758'],
 [38, 'CHEMBL3827678'],
 [39, 'CHEMBL3827664'],
 [40, 'CHEMBL3827605'],
 [41, 'CHEMBL3827586'],
 [42, 'CHEMBL3827327'],
 [43, 'CHEMBL3827118'],
 [44, 'CHEMBL38270

## Filter functionality with intermediate nodes
When using intermediate nodes, the filter parameter must have a length of the number of steps of the query process (one more than the number of intermediate nodes), specifying what filters should be applied at what stage of the query.

### Using an intermediate node with two filters:
Two different filters may be used, and the number of target nodes returned at each stage of the query process in independent of the previous (a count of 30 in the first filter will not constrain the number of results of the second filter). The graph returned will have all the returned target nodes from each stage of the query process.

This example uses two different filters. If two of the same filters are used, then the results can be differentiated by 'type' if they're different, or soon the 'level' attribute will differentiate which step of the query process each node belongs to (not yet implemented).

In [17]:
filt = [{'name': 'NodeDegree', 'count': 30}, \
        {'name': 'UniqueAPIs', 'count': 60}]
fc = FindConnection(input_obj=input_obj,
                    output_obj=output_obj,
                    intermediate_nodes=['Gene'],
                    filters=filt)
fc.connect()

Including the source node, there should be a total of 91 nodes returned:

In [18]:
fc.fc.G.number_of_nodes()

91

And we can differentiate between the results returned by the first and second steps in the query:

In [19]:
q1,q2 = [],[]
for i,node in fc.fc.G.nodes(data=True):
    if 'filteredBy' in node.keys():
        if node['filteredBy'] == 'NodeDegree':
            q1.append([node['rank'],i,node['type']])
        elif node['filteredBy'] == 'UniqueAPIs':
            q2.append([node['rank'],i,node['type']])
q1.sort()
q2.sort()
print('----------------------QUERY 1 RESULTS----------------------')
print('FILTER: NodeDegree')
print(q1)
print('----------------------QUERY 2 RESULTS----------------------')
print('FILTER: UniqueAPIs')
print(q2)

----------------------QUERY 1 RESULTS----------------------
FILTER: NodeDegree
[[1, 'E2F1', 'Gene'], [2, 'UBE2L3', 'Gene'], [3, 'STAT5A', 'Gene'], [4, 'CCNH', 'Gene'], [5, 'CCNC', 'Gene'], [6, 'C0031686', 'Gene'], [7, 'ZZZ3', 'Gene'], [8, 'ZSCAN21', 'Gene'], [9, 'ZNRD2', 'Gene'], [10, 'ZNF830', 'Gene'], [11, 'ZNF281', 'Gene'], [12, 'ZNF131', 'Gene'], [13, 'ZMYM4', 'Gene'], [14, 'YEATS2', 'Gene'], [15, 'XPO1', 'Gene'], [16, 'XAB2', 'Gene'], [17, 'WWP1', 'Gene'], [18, 'WIZ', 'Gene'], [19, 'WDR77', 'Gene'], [20, 'WDR61', 'Gene'], [21, 'WDR5', 'Gene'], [22, 'WDHD1', 'Gene'], [23, 'WAPL', 'Gene'], [24, 'USP7', 'Gene'], [25, 'UBR2', 'Gene'], [26, 'UBL4A', 'Gene'], [27, 'TSPAN7', 'Gene'], [28, 'TRRAP', 'Gene'], [29, 'TP53BP1', 'Gene'], [30, 'TP53', 'Gene']]
----------------------QUERY 2 RESULTS----------------------
FILTER: UniqueAPIs
[[1, 'ZINC CHLORIDE', 'ChemicalSubstance'], [2, 'TEMOZOLOMIDE', 'ChemicalSubstance'], [3, 'TAMOXIFEN', 'ChemicalSubstance'], [4, 'SELINEXOR', 'ChemicalSubstance

### Using intermediate nodes with one filter
If you want to query using intermediate nodes, but only want the filter applied once (either at an intermediate node or at the end of the query process), the length of the filter parameter must still be equal to the number of steps in the query process, using empty dictionaries as placeholders to indicate no filter at a step.

Here I demonstrate applying only the EdgeLabel filter at the end of the query. 

In [20]:
filt = [{}, {'name':'EdgeLabel', 'label':'negatively_regulates', 'count':25}]
fc = FindConnection(input_obj=input_obj,
                    output_obj=output_obj,
                    intermediate_nodes=['Gene'],
                    filters=filt)
fc.connect()

When a filter isn't applied, the number of items returned at that step may be large as there isn't a default cut-off at 50. *(these nodes are also NOT labeled with 'filteredBy' or 'rank')*

In [21]:
fc.fc.G.number_of_nodes()

279

In [22]:
results = []
for i,node in fc.fc.G.nodes(data=True):
    if 'filteredBy' in node.keys():
        results.append([node['rank'],i])
results.sort()
print('----------------------QUERY FINAL RESULTS----------------------')
print('FILTER: EdgeLabel, LABEL: negatively_regulates')
print(results)

----------------------QUERY FINAL RESULTS----------------------
FILTER: EdgeLabel, LABEL: negatively_regulates
[[1, 'MESSENGER RNA'], [2, 'C1328819'], [3, 'PHARMACEUTICAL PREPARATIONS'], [4, 'SIROLIMUS'], [5, 'INTERFERING RNA, SMALL'], [6, 'MICRORNA'], [7, 'C0086860'], [8, '3-METHYLADENINE DNA'], [9, '.BETA.-D-GLUCOPYRANOSE'], [10, 'ANDROGENS'], [11, 'ESTROGEN'], [12, 'CHEMOTACTIC FACTORS, MACROPHAGE'], [13, '3,4 METHYLENEDIOXYAMPHETAMINE'], [14, 'CARCINOGENS'], [15, 'OXYGEN SPECIES, REACTIVE'], [16, 'NITRIC OXIDE'], [17, 'LIPIDS'], [18, 'GLUCOCORTICOID'], [19, 'BINDING SITE'], [20, 'UCN 01'], [21, 'TRIGLYCERIDES'], [22, 'MINERALS'], [23, 'EXTRACTS, TISSUE'], [24, 'ANTIOXIDANTS'], [25, 'ZD1839']]


### Using intermediate nodes with CoOccurrence
Using CoOccurrence as the first filter in the query doesn't produce any collisions, but using CoOccurrence at a later step in the query process can sometimes result in a node being returned twice. This is due to the fact that later in the query process, there may be a larger number of source nodes, and there's the possibility of getting multiple hits of the same node from different source nodes. When this happens, that node is labeled with its 'rank', 'ngd_overall', and 'co_occur_with' attributes as a list of values.

In [23]:
input_obj = ht.query('D000755')['Disease'][0]

filt = [{'name': 'CoOccurrence'}, \
        {'name': 'CoOccurrence', 'count': 30}]
fc = FindConnection(input_obj=input_obj,
                    output_obj=output_obj,
                    intermediate_nodes=['Gene'],
                    filters=filt)
fc.connect()

We would expect there to be 81 nodes returned from this query (1 source node, 50 from step 1, 30 from step 2); however, there are only 79 nodes in the graph.

In [24]:
fc.fc.G.number_of_nodes()

79

The results from the first query are 50, as expected, and the 'co_occur_with' attribute is the original source node in all of the nodes:

In [25]:
q1 = []
for i,node in fc.fc.G.nodes(data=True):
    if 'filteredBy' in node.keys():
        if node['type'] == 'Gene':
            q1.append([node['rank'],i,node['ngd_overall'],node['co_occur_with']])
q1.sort()
print('----------------------QUERY 1 RESULTS----------------------')
print('FILTER: CoOccurrence')
print(q1)

----------------------QUERY 1 RESULTS----------------------
FILTER: CoOccurrence
[[1, 'HPX', 0.5112235575925955, 'sickle cell anemia'], [2, 'G6PD', 0.5325824584711736, 'sickle cell anemia'], [3, 'HBD', 0.5553791540387418, 'sickle cell anemia'], [4, 'C0538674', 0.5567833684638404, 'sickle cell anemia'], [5, 'MTHFR', 0.5721842571302228, 'sickle cell anemia'], [6, 'C0034348', 0.5810876857696439, 'sickle cell anemia'], [7, 'C0003762', 0.5814807673615766, 'sickle cell anemia'], [8, 'C0041560', 0.5817780755057473, 'sickle cell anemia'], [9, 'C0022917', 0.5978365030406605, 'sickle cell anemia'], [10, 'C0017337', 0.6118640200533215, 'sickle cell anemia'], [11, 'C0002085', 0.6249431726478171, 'sickle cell anemia'], [12, 'CD34', 0.6305284964409009, 'sickle cell anemia'], [13, 'C0004002', 0.6309184775651431, 'sickle cell anemia'], [14, 'MYB', 0.6357187273191623, 'sickle cell anemia'], [15, 'C0678933', 0.6563331310991332, 'sickle cell anemia'], [16, 'C0597336', 0.6594965221768638, 'sickle cell ane

The second query only returned 28 unique nodes rather than 30, which is because some nodes are returned multiple times in relation to different source nodes. In this case, "HYDROGEN PEROXIDE" has a ranking of both 2 and 30, and "REPEAT, SHORT TANDEM" has a ranking of both 11 and 14. For those two nodes, they are labeled with lists of two 'ngd_overall' and 'co_occur_with' keys whose index in the list correspond respectively with the rankings.

In [26]:
q2 = []
for i,node in fc.fc.G.nodes(data=True):
    if 'filteredBy' in node.keys():
        if node['type'] == 'ChemicalSubstance':
            if not isinstance(node['rank'],list):
                node['rank'] = [node['rank']]
            q2.append([node['rank'],i,node['ngd_overall'],node['co_occur_with']])
q2.sort()
print('----------------------QUERY 2 RESULTS----------------------')
print('FILTER: CoOccurrence')
for i in q2:
    print(i[0],'\t',i[1:])

----------------------QUERY 2 RESULTS----------------------
FILTER: CoOccurrence
[1] 	 ['MALONALDEHYDE', 0.21251560775008138, 'CAT']
[2, 30] 	 ['HYDROGEN PEROXIDE', [0.24003656287789266, 0.33731308976984803], ['CAT', 'MPO']]
[3] 	 ['GLUTATHIONE', 0.24501186079620108, 'CAT']
[4] 	 ['ETHENO-NADP', 0.2510543137449706, 'G6PD']
[5] 	 ['NADIDE PHOSPHATE', 0.2510543137449706, 'G6PD']
[6] 	 ['FOLATE', 0.26233217293567596, 'MTHFR']
[7] 	 ['FOLIC ACID', 0.26233217293567596, 'MTHFR']
[8] 	 ['ANTIOXIDANTS', 0.2689715013147896, 'CAT']
[9] 	 ['THIOBARBITURIC ACID REACTIVE SUBSTANCES', 0.2758508988246635, 'CAT']
[10] 	 ['CYTARABINE', 0.2979298796499426, 'CSF3']
[11, 14] 	 ['REPEAT, SHORT TANDEM', [0.30302170157600283, 0.31252505373380207], ['C0002085', 'C0597336']]
[12] 	 ['HYPOCHLOROUS ACID', 0.3077517981418732, 'MPO']
[13] 	 ['FREE RADICALS', 0.30788463327790966, 'CAT']
[15] 	 ['OXYGEN SPECIES, REACTIVE', 0.31656277135111954, 'CAT']
[16] 	 ['CPG ISLAND', 0.32108993304705813, 'C0017429']
[17] 	 ['AM