# Results analysis of the TransE on MIND-CtD model
* The goal of this notebook is to create a sample analysis of the model
* We will also highlight how to use some of the functions in score_utils
* Finally, we will calculate the MRR and the Hits@k, as well as extract the filtered top k results

In [1]:
import os
import pandas as pd
import score_utils as scu

os.chdir("..")

In [None]:
!bash run.sh train TransE MIND_CtD 0 megha 256 96 275 48.0 1.0 0.0015 1000000 16

1.12.1+cu102
Start Training......
2023-03-29 17:09:54,376 INFO     Model: TransE
2023-03-29 17:09:54,376 INFO     Data Path: data/MIND_CtD
2023-03-29 17:09:54,376 INFO     #entity: 249605
2023-03-29 17:09:54,376 INFO     #relation: 83
2023-03-29 17:09:59,970 INFO     #train: 9651042
2023-03-29 17:09:59,973 INFO     #valid: 537
2023-03-29 17:09:59,974 INFO     #test: 537
2023-03-29 17:10:00,729 INFO     Model Parameter Configuration:
2023-03-29 17:10:00,729 INFO     Parameter gamma: torch.Size([1]), require_grad = False
2023-03-29 17:10:00,729 INFO     Parameter embedding_range: torch.Size([1]), require_grad = False
2023-03-29 17:10:00,729 INFO     Parameter entity_embedding: torch.Size([249605, 275]), require_grad = True
2023-03-29 17:10:00,729 INFO     Parameter relation_embedding: torch.Size([83, 275]), require_grad = True
2023-03-29 17:10:53,811 INFO     Ramdomly Initializing TransE Model...
2023-03-29 17:10:53,811 INFO     Start Training...
2023-03-29 17:10:53,811 INFO     init_ste

#### Results
```
2023-03-29 22:08:44 INFO     Test MRR at step 999999: 0.119655
2023-03-29 22:08:44 INFO     Test MR at step 999999: 863.434823
2023-03-29 22:08:44 INFO     Test HITS@1 at step 999999: 0.003724
2023-03-29 22:08:44 INFO     Test HITS@3 at step 999999: 0.151769
2023-03-29 22:08:44 INFO     Test HITS@10 at step 999999: 0.353818
2023-03-29 22:08:44 INFO     Test head-batch MRR at step 999999: 0.098189
2023-03-29 22:08:44 INFO     Test head-batch MR at step 999999: 1345.301676
2023-03-29 22:08:44 INFO     Test head-batch HITS@1 at step 999999: 0.005587
2023-03-29 22:08:44 INFO     Test head-batch HITS@3 at step 999999: 0.122905
2023-03-29 22:08:44 INFO     Test head-batch HITS@10 at step 999999: 0.284916
2023-03-29 22:08:44 INFO     Test tail-batch MRR at step 999999: 0.141122
2023-03-29 22:08:44 INFO     Test tail-batch MR at step 999999: 381.567970
2023-03-29 22:08:44 INFO     Test tail-batch HITS@1 at step 999999: 0.001862
2023-03-29 22:08:44 INFO     Test tail-batch HITS@3 at step 999999: 0.180633
2023-03-29 22:08:44 INFO     Test tail-batch HITS@10 at step 999999: 0.422719
```

#### Get predictions for the test file
* Run the predictions on `test.txt`.. Results should export with the `--do_predict` flag
* For `--do_test and --do_predict` flag, output file `test_scores.tsv`

In [2]:
!python -u codes/run.py --do_predict --do_test -init models/TransE_MIND_CtD_megha --cuda

2023-03-30 09:54:06,384 INFO     Model: TransE
2023-03-30 09:54:06,385 INFO     Data Path: data/MIND_CtD
2023-03-30 09:54:06,385 INFO     #entity: 249605
2023-03-30 09:54:06,385 INFO     #relation: 83
2023-03-30 09:54:11,434 INFO     #train: 9651042
2023-03-30 09:54:11,436 INFO     #valid: 537
2023-03-30 09:54:11,437 INFO     #test: 537
2023-03-30 09:54:12,099 INFO     Model Parameter Configuration:
2023-03-30 09:54:12,099 INFO     Parameter gamma: torch.Size([1]), require_grad = False
2023-03-30 09:54:12,099 INFO     Parameter embedding_range: torch.Size([1]), require_grad = False
2023-03-30 09:54:12,099 INFO     Parameter entity_embedding: torch.Size([249605, 275]), require_grad = True
2023-03-30 09:54:12,099 INFO     Parameter relation_embedding: torch.Size([83, 275]), require_grad = True
2023-03-30 09:54:14,351 INFO     Loading checkpoint models/TransE_MIND_CtD_megha...
2023-03-30 09:54:22,620 INFO     Start Training...
2023-03-30 09:54:22,620 INFO     init_step = 999999
2023-03-30

## Create the score input as tail-batching.
* I should of wrote the function to remove all "head-batch" entities if choosing "tail-batch" and remove all "tail-batch" when mode is "head-batch"

In [4]:
raw = scu.ProcessOutput(
    data_dir="./data/MIND_CtD/",
    scores_outfile="./models/TransE_MIND_CtD_megha/test_scores.tsv",
    mode="tail-batch",
)

In [5]:
raw.df.head()

Unnamed: 0,h,r,t,preds,batch
0,71951,61,183664,"[-20.830551147460938, -7.808071136474609, -17....",head-batch
1,184021,61,183664,"[-20.83055877685547, -7.808067321777344, -17.9...",head-batch
2,117007,61,27517,"[-18.655738830566406, -11.907047271728516, -16...",head-batch
3,234163,61,26686,"[-23.04248809814453, -10.350418090820312, -19....",head-batch
4,163877,61,46731,"[-24.690773010253906, -15.080657958984375, -19...",head-batch


## Generate the true answer for tail-batch.
* True answer is anything that shows up as a "t" for a combination of "h-r" in the graph
* Can also do the inverse for head-batch

In [6]:
raw.get_true_targets()

Unnamed: 0,h,r,t
0,CHEBI:100,activates_CaG,[NCBIGene:2100]
1,CHEBI:100,affects_CafG,[NCBIGene:2100]
2,CHEBI:100,part_of_CpoBP,"[GO:0009701, GO:0046289, GO:0046290]"
3,CHEBI:100,treats_CtD,"[DOID:883, DOID:11476, DOID:8692]"
4,CHEBI:10001,palliates_CplD,[DOID:11968]
...,...,...,...
612847,WP:WP75,associated_with_PWawD,"[DOID:0070315, DOID:0080169, DOID:11950, DOID:..."
612848,WP:WP75,associated_with_PWawP,[HP:0010774]
612849,WP:WP78,associated_with_PWawD,"[DOID:0050575, DOID:0050771, DOID:0050773, DOI..."
612850,WP:WP80,associated_with_PWawD,[DOID:0110705]


## Format the raw scores to embedded values
* Initial scores datframe has some value ranging from (-,+).
* uses torch function `argsort()` to sort from high to low. Highest value becomes 1, next highest 2 ... to n highest.

In [7]:
# res is in place
raw.format_raw_scores_to_df()

Unnamed: 0,h,r,t,preds,batch
0,71951,61,183664,"[183664, 231830, 141546, 248163, 25908, 216637...",head-batch
1,184021,61,183664,"[183664, 231830, 141546, 248163, 25908, 216637...",head-batch
2,117007,61,27517,"[27517, 36433, 238892, 59471, 105557, 16942, 1...",head-batch
3,234163,61,26686,"[26686, 170249, 110580, 149641, 234163, 205563...",head-batch
4,163877,61,46731,"[46731, 149183, 14229, 75068, 203495, 17159, 8...",head-batch
...,...,...,...,...,...
1069,196421,61,125397,"[196421, 96052, 35202, 233910, 54162, 125397, ...",tail-batch
1070,42491,61,125397,"[42491, 30319, 219764, 82601, 134556, 80291, 4...",tail-batch
1071,136067,61,219821,"[136067, 179391, 25595, 81409, 62377, 219764, ...",tail-batch
1072,26499,61,125397,"[26499, 41681, 135838, 33567, 30084, 1920, 516...",tail-batch


## Now we have our embedded values, can we get the actual names?
* The conversion of embedding to values are "in-place"
* note the method has a variable `direction` where it can be "from" or "to". The default is "to", meaning (value TO embedding).

In [8]:
# in place
raw.translate_embeddings(direction="from")

Unnamed: 0,h,r,t,preds,batch
0,CHEBI:135735,indication,DOID:10763,"[DOID:10763, CHEBI:6061, MESH:D000806, MESH:D0...",head-batch
1,CHEBI:135738,indication,DOID:10763,"[DOID:10763, CHEBI:6061, MESH:D000806, MESH:D0...",head-batch
2,CHEBI:135876,indication,DOID:11054,"[DOID:11054, CHEBI:28748, CHEBI:35456, MESH:C0...",head-batch
3,CHEBI:135923,indication,DOID:14499,"[DOID:14499, IKEY:OTUWNTGDKASFHL-PKLMIRHRSA-N,...",head-batch
4,CHEBI:135925,indication,DOID:1094,"[DOID:1094, CHEBI:31236, IKEY:OFCJKOOVFDGTLY-Q...",head-batch
...,...,...,...,...,...
1069,CHEBI:9667,indication,NCBIGene:367,"[CHEBI:9667, DOID:3042, DOID:1575, DOID:7148, ...",tail-batch
1070,CHEBI:41423,indication,NCBIGene:367,"[CHEBI:41423, DOID:813, KEGG:hsa05214, DOID:10...",tail-batch
1071,CHEBI:9168,indication,NCBIGene:7490,"[CHEBI:9168, DOID:4347, DOID:2648, DOID:6501, ...",tail-batch
1072,CHEBI:41879,indication,NCBIGene:367,"[CHEBI:41879, CHEBI:43253, DOID:11459, DOID:11...",tail-batch


In [9]:
# note the mode you imported is important. Need to reimport the dataframe as mode = "tail-batch" if you want accurate tail batch predictions
# also this calculation is of ALL true targets of the compounds, so the mrr is a bit unfair. use calculate_individual_rr() to evaluate only
# the test scenario and not the test scenario and every possible answer of the test case
raw.calculate_mrr()

Unnamed: 0,h,r,target,preds,batch,true_t,mrr
0,CHEBI:135735,indication,DOID:10763,"[DOID:10763, CHEBI:6061, MESH:D000806, MESH:D0...",head-batch,"[DOID:10591, DOID:10824, DOID:10825, DOID:1113...",0.167007
1,CHEBI:135738,indication,DOID:10763,"[DOID:10763, CHEBI:6061, MESH:D000806, MESH:D0...",head-batch,"[DOID:10591, DOID:10824, DOID:10825, DOID:1113...",0.167007
2,CHEBI:135876,indication,DOID:11054,"[DOID:11054, CHEBI:28748, CHEBI:35456, MESH:C0...",head-batch,"[DOID:11593, DOID:11811, DOID:11812, DOID:1181...",0.111330
3,CHEBI:135923,indication,DOID:14499,"[DOID:14499, IKEY:OTUWNTGDKASFHL-PKLMIRHRSA-N,...",head-batch,[DOID:14499],1.000000
4,CHEBI:135925,indication,DOID:1094,"[DOID:1094, CHEBI:31236, IKEY:OFCJKOOVFDGTLY-Q...",head-batch,"[MESH:D056912, DOID:1094]",0.500354
...,...,...,...,...,...,...,...
1069,CHEBI:9667,indication,NCBIGene:367,"[CHEBI:9667, DOID:3042, DOID:1575, DOID:7148, ...",tail-batch,"[DOID:417, DOID:9074, HP:0010562, DOID:4481, D...",0.075758
1070,CHEBI:41423,indication,NCBIGene:367,"[CHEBI:41423, DOID:813, KEGG:hsa05214, DOID:10...",tail-batch,"[DOID:7147, DOID:7148, DOID:8398, WD:Q3281303,...",0.019037
1071,CHEBI:9168,indication,NCBIGene:7490,"[CHEBI:9168, DOID:4347, DOID:2648, DOID:6501, ...",tail-batch,"[DOID:3963, DOID:4450, DOID:4451, DOID:4454, D...",0.000676
1072,CHEBI:41879,indication,NCBIGene:367,"[CHEBI:41879, CHEBI:43253, DOID:11459, DOID:11...",tail-batch,"[DOID:0050745, DOID:0060058, DOID:0060060, DOI...",0.025852


In [10]:
head_batch = raw.calculate_mrr()

In [11]:
head_batch = head_batch.query('batch=="head-batch"')

In [12]:
head_batch.shape

(537, 7)

In [13]:
sum(head_batch["mrr"]) / len(head_batch["mrr"])

0.3221142307954507

In [14]:
tail_batch = raw.calculate_mrr()

In [15]:
tail_batch.head()

Unnamed: 0,h,r,target,preds,batch,true_t,mrr
0,CHEBI:135735,indication,DOID:10763,"[DOID:10763, CHEBI:6061, MESH:D000806, MESH:D0...",head-batch,"[DOID:10591, DOID:10824, DOID:10825, DOID:1113...",0.167007
1,CHEBI:135738,indication,DOID:10763,"[DOID:10763, CHEBI:6061, MESH:D000806, MESH:D0...",head-batch,"[DOID:10591, DOID:10824, DOID:10825, DOID:1113...",0.167007
2,CHEBI:135876,indication,DOID:11054,"[DOID:11054, CHEBI:28748, CHEBI:35456, MESH:C0...",head-batch,"[DOID:11593, DOID:11811, DOID:11812, DOID:1181...",0.11133
3,CHEBI:135923,indication,DOID:14499,"[DOID:14499, IKEY:OTUWNTGDKASFHL-PKLMIRHRSA-N,...",head-batch,[DOID:14499],1.0
4,CHEBI:135925,indication,DOID:1094,"[DOID:1094, CHEBI:31236, IKEY:OFCJKOOVFDGTLY-Q...",head-batch,"[MESH:D056912, DOID:1094]",0.500354


In [16]:
tail_batch = tail_batch.query('batch=="tail-batch"')

In [17]:
tail_batch.shape

(537, 7)

In [18]:
tail_batch[["h", "r", "mrr"]].drop_duplicates(["h", "r"]).shape

(387, 3)

In [19]:
# Calculate overall MRR vs the 'wrong' head-batch setting.
sum(tail_batch[["h", "r", "mrr"]].drop_duplicates(["h", "r"])["mrr"]) / len(
    tail_batch[["h", "r", "mrr"]].drop_duplicates(["h", "r"])["mrr"]
)

0.11725969626818494

In [20]:
# Get hits at K
tb_hits = raw.calculate_individual_hits_k(hits=[1, 3, 10]).query('batch=="tail-batch"')
tb_hits.head()

Unnamed: 0,h,r,target,preds,batch,true_t,position,ind_rank,hits_1,hits_3,hits_10
537,CHEBI:135735,indication,DOID:10763,"[CHEBI:135735, DOID:9654, UMLS:C0221155, WD:Q2...",tail-batch,"[DOID:10591, DOID:10824, DOID:10825, DOID:1113...","[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, ...",5,False,False,True
538,CHEBI:135738,indication,DOID:10763,"[CHEBI:135738, DOID:10763, DOID:9654, WD:Q2530...",tail-batch,"[DOID:10591, DOID:10824, DOID:10825, DOID:1113...","[0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, ...",2,False,True,True
539,CHEBI:135876,indication,DOID:11054,"[CHEBI:135876, DOID:11813, DOID:5432, DOID:118...",tail-batch,"[DOID:11593, DOID:11811, DOID:11812, DOID:1181...","[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, ...",8,False,False,True
540,CHEBI:135923,indication,DOID:14499,"[CHEBI:135923, DOID:14499, DOID:0111633, DOID:...",tail-batch,[DOID:14499],"[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",2,False,True,True
541,CHEBI:135925,indication,DOID:1094,"[CHEBI:135925, DOID:8986, DOID:12139, DOID:503...",tail-batch,"[MESH:D056912, DOID:1094]","[0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, ...",7,False,False,True


In [21]:
tb_hits[["h", "r", "hits_1", "hits_3", "hits_10"]].drop_duplicates(["h", "r"]).shape

(387, 5)

In [22]:
short_tb_hits = tb_hits[["h", "r", "hits_1", "hits_3", "hits_10"]].drop_duplicates(
    ["h", "r"]
)

In [23]:
# calculate overall hits at K
print(
    f"""Dropduplicated results
hits@1: {sum(short_tb_hits['hits_1'])/len(short_tb_hits['hits_1']):.4f}
hits@3: {sum(short_tb_hits['hits_3'])/len(short_tb_hits['hits_3']):.4f}
hits@10: {sum(short_tb_hits['hits_10'])/len(short_tb_hits['hits_10']):.4f}
"""
)

Dropduplicated results
hits@1: 0.0026
hits@3: 0.1990
hits@10: 0.4780



In [24]:
# These results match the tail batch results
print(
    f"""Unduplicated results
hits@1: {sum(tb_hits['hits_1'])/len(tb_hits['hits_1']):.4f}
hits@3: {sum(tb_hits['hits_3'])/len(tb_hits['hits_3']):.4f}
hits@10: {sum(tb_hits['hits_10'])/len(tb_hits['hits_10']):.4f}
"""
)

Unduplicated results
hits@1: 0.0019
hits@3: 0.1806
hits@10: 0.4227



In [25]:
tail_batch_rr = raw.calculate_individual_rr()

In [28]:
tail_batch_rr

Unnamed: 0,h,r,target,preds,batch,true_t,position,ind_rank,rr
0,CHEBI:135735,indication,DOID:10763,"[DOID:10763, CHEBI:6061, MESH:D000806, MESH:D0...",head-batch,"[DOID:10591, DOID:10824, DOID:10825, DOID:1113...","[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",1,1.000000
1,CHEBI:135738,indication,DOID:10763,"[DOID:10763, CHEBI:6061, MESH:D000806, MESH:D0...",head-batch,"[DOID:10591, DOID:10824, DOID:10825, DOID:1113...","[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",1,1.000000
2,CHEBI:135876,indication,DOID:11054,"[DOID:11054, CHEBI:28748, CHEBI:35456, MESH:C0...",head-batch,"[DOID:11593, DOID:11811, DOID:11812, DOID:1181...","[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",1,1.000000
3,CHEBI:135923,indication,DOID:14499,"[DOID:14499, IKEY:OTUWNTGDKASFHL-PKLMIRHRSA-N,...",head-batch,[DOID:14499],"[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",1,1.000000
4,CHEBI:135925,indication,DOID:1094,"[DOID:1094, CHEBI:31236, IKEY:OFCJKOOVFDGTLY-Q...",head-batch,"[MESH:D056912, DOID:1094]","[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",1,1.000000
...,...,...,...,...,...,...,...,...,...
1069,CHEBI:9667,indication,NCBIGene:367,"[CHEBI:9667, DOID:3042, DOID:1575, DOID:7148, ...",tail-batch,"[DOID:417, DOID:9074, HP:0010562, DOID:4481, D...","[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",6,0.166667
1070,CHEBI:41423,indication,NCBIGene:367,"[CHEBI:41423, DOID:813, KEGG:hsa05214, DOID:10...",tail-batch,"[DOID:7147, DOID:7148, DOID:8398, WD:Q3281303,...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",26,0.038462
1071,CHEBI:9168,indication,NCBIGene:7490,"[CHEBI:9168, DOID:4347, DOID:2648, DOID:6501, ...",tail-batch,"[DOID:3963, DOID:4450, DOID:4451, DOID:4454, D...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",1047,0.000955
1072,CHEBI:41879,indication,NCBIGene:367,"[CHEBI:41879, CHEBI:43253, DOID:11459, DOID:11...",tail-batch,"[DOID:0050745, DOID:0060058, DOID:0060060, DOI...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",16,0.062500


In [31]:
print(
    f'MRR: {sum(tail_batch_rr[tail_batch_rr["batch"]=="tail-batch"].rr)/len(tail_batch_rr[tail_batch_rr["batch"]=="tail-batch"].rr)}'
)

MRR: 0.14112169321544563


In [32]:
tb_hits

Unnamed: 0,h,r,target,preds,batch,true_t,position,ind_rank,hits_1,hits_3,hits_10
537,CHEBI:135735,indication,DOID:10763,"[CHEBI:135735, DOID:9654, UMLS:C0221155, WD:Q2...",tail-batch,"[DOID:10591, DOID:10824, DOID:10825, DOID:1113...","[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, ...",5,False,False,True
538,CHEBI:135738,indication,DOID:10763,"[CHEBI:135738, DOID:10763, DOID:9654, WD:Q2530...",tail-batch,"[DOID:10591, DOID:10824, DOID:10825, DOID:1113...","[0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, ...",2,False,True,True
539,CHEBI:135876,indication,DOID:11054,"[CHEBI:135876, DOID:11813, DOID:5432, DOID:118...",tail-batch,"[DOID:11593, DOID:11811, DOID:11812, DOID:1181...","[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, ...",8,False,False,True
540,CHEBI:135923,indication,DOID:14499,"[CHEBI:135923, DOID:14499, DOID:0111633, DOID:...",tail-batch,[DOID:14499],"[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",2,False,True,True
541,CHEBI:135925,indication,DOID:1094,"[CHEBI:135925, DOID:8986, DOID:12139, DOID:503...",tail-batch,"[MESH:D056912, DOID:1094]","[0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, ...",7,False,False,True
...,...,...,...,...,...,...,...,...,...,...,...
1069,CHEBI:9667,indication,NCBIGene:367,"[CHEBI:9667, DOID:3042, DOID:1575, DOID:7148, ...",tail-batch,"[DOID:417, DOID:9074, HP:0010562, DOID:4481, D...","[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",6,False,False,True
1070,CHEBI:41423,indication,NCBIGene:367,"[CHEBI:41423, DOID:813, KEGG:hsa05214, DOID:10...",tail-batch,"[DOID:7147, DOID:7148, DOID:8398, WD:Q3281303,...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",26,False,False,False
1071,CHEBI:9168,indication,NCBIGene:7490,"[CHEBI:9168, DOID:4347, DOID:2648, DOID:6501, ...",tail-batch,"[DOID:3963, DOID:4450, DOID:4451, DOID:4454, D...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",1047,False,False,False
1072,CHEBI:41879,indication,NCBIGene:367,"[CHEBI:41879, CHEBI:43253, DOID:11459, DOID:11...",tail-batch,"[DOID:0050745, DOID:0060058, DOID:0060060, DOI...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...",16,False,False,False


In [33]:
tb_hits['preds'] = tb_hits.preds.apply(lambda x: x[0:1000])
tb_hits.to_csv('./Notebooks/top1000preds_transe.tsv',sep='\t', header = True, index = False)

## Generate the top n filtered results

In [49]:
raw.filter_predictions(top=100).query('batch=="tail-batch"').head()

Unnamed: 0,h,r,preds,batch,true_t,filt_preds
527,CHEBI:34385,indication_CiD,"[DOID:2377, CHEBI:34385, DOID:8869, DOID:2378,...",tail-batch,"[DOID:2378, DOID:2377, DOID:0050784, DOID:0050...","[CHEBI:34385, DOID:8869, DOID:3393, DOID:1824,..."
528,CHEBI:34829,indication_CiD,"[DOID:2474, DOID:2457, DOID:11204, DOID:13452,...",tail-batch,"[DOID:11204, DOID:2457, DOID:2474]","[DOID:13452, CHEBI:34829, DOID:8881, DOID:9383..."
529,CHEBI:3441,indication_CiD,"[DOID:3393, DOID:10763, DOID:4248, DOID:6000, ...",tail-batch,"[DOID:10591, DOID:6432, DOID:11130, DOID:10824...","[DOID:3393, DOID:4248, DOID:6000, DOID:5844, D..."
530,CHEBI:34730,indication_CiD,"[DOID:11729, DOID:3482, DOID:0040083, DOID:004...",tail-batch,"[DOID:11104, DOID:13034, DOID:13035, DOID:1327...","[DOID:11729, DOID:3482, DOID:0040083, DOID:004..."
531,CHEBI:3437,indication_CiD,"[DOID:916, MESH:D056486, MONDO:0005043, MESH:D...",tail-batch,"[DOID:6432, DOID:13544, DOID:11130, DOID:10825...","[DOID:916, MESH:D056486, MONDO:0005043, MESH:D..."
